Kaggle: Solving Titanic challenge using logistic regression in Spark

In this post, I will show how to tackle Kaggle’s entry level challenge called Titanic. In this challenge, you are given training and test dataset. Your goal is to use the training dataset to build and train a model and then use it to predict whether a passenger will survive or not listed in test …

Top 5 metrics for evaluating regression models

In my previous posts, I have covered some regression models (simple linear regression, polynomial regression) and classification models (k-nearest neighbors, support vector machines). However, I haven’t really discussed in-depth different ways to evaluate these models. Without proper metrics, not only can you not claim the accuracy of your models confidently but you also cannot compare …

Implementing Support Vector Machine (SVM) algorithm in python

As you have probably noticed by now, there are several machine learning algorithms available at your disposal. In my previous post, I covered a very popular classification algorithm called K-Nearest Neighbors. In today’s post, I will cover another very common and powerful classification algorithm called Support Vector Machine (SVM). What is SVM and how does …

Implementing k-nearest neighbors in python

Last time, we looked into one of the simplest classification algorithms in machine learning called binomial logistic regression. In this post, I am going to cover another common classification algorithm called K Nearest Neighbors, otherwise known as KNN. To recap, we have mostly discussed regression models such as simple and multivariate linear regression and polynomial …

Implementing a Binomial Logistic Regression model in Python

Note: You can now subscribe to my blog updates here to receive latest updates. So far, we have only discussed regression modelling. However, there is another type of modelling called classification modelling. The primary difference between regression models and classification models is that while regression models are used to predict a quantity, classification models are used to …

Implementing a Polynomial Regression Model in Python

So far, we have looked at two types of linear regression models and how to implement them in python using scikit-learn. To recap, we began with a simple linear regression (SLR) model where we have one independent variable (feature) and one dependent variable (label). We then expended it slightly to a more general use case …

Setting up Apache Spark on an AWS EC2 instance

I am currently learning Apache Spark and how to use it for in-memory analytics as well as machine learning (ML). Scikit-learn is a great library for ML but when you want to deploy an ML model in prod to analyze billions of rows (‘big data’), you want to be working with some technology or framework …

Implementing a Multiple Linear Regression model in python

Earlier, I wrote about how to implement a simple linear regression (SLR) model in python. SLR is probably the easiest model to implement among the most popular machine learning algorithms. In this post, we are going to take it one step further and instead of working with just one independent variable, we will be working …

Implementing Simple Linear Regression Model in Python

So far, I have discussed some of the theory behind machine learning algorithms and shown you how to perform vital steps when it comes to data preprocessing such as feature scaling and feature encoding. We are now ready to start with the simplest machine learning algorithm which is simple linear regression (SLR). Remember, back in …

Feature scaling in python using scikit-learn

In my previous post, I explained the importance of feature encoding and how to do it in python using scikit-learn. In this post, we are going to talk about another component of the preprocessing step in applying machine learning models which is feature scaling. Very rarely would you be dealing with features that share the …