Analyzing NYC motor vehicle data in Spark

A while back I wrote about analyzing NYC’s traffic (motor vehicle) data in q/kdb+. Then, soon afterwards, I showed how to analyze that data in python using pandas library. Now, I would like to again analyze the same dataset but this time, in Apache Spark. As I mentioned in my last post, I am currently …

Setting up Apache Spark on an AWS EC2 instance

I am currently learning Apache Spark and how to use it for in-memory analytics as well as machine learning (ML). Scikit-learn is a great library for ML but when you want to deploy an ML model in prod to analyze billions of rows (‘big data’), you want to be working with some technology or framework …

Implementing a Multiple Linear Regression model in python

Earlier, I wrote about how to implement a simple linear regression (SLR) model in python. SLR is probably the easiest model to implement among the most popular machine learning algorithms. In this post, we are going to take it one step further and instead of working with just one independent variable, we will be working …

Implementing Simple Linear Regression Model in Python

So far, I have discussed some of the theory behind machine learning algorithms and shown you how to perform vital steps when it comes to data preprocessing such as feature scaling and feature encoding. We are now ready to start with the simplest machine learning algorithm which is simple linear regression (SLR). Remember, back in …

Feature scaling in python using scikit-learn

In my previous post, I explained the importance of feature encoding and how to do it in python using scikit-learn. In this post, we are going to talk about another component of the preprocessing step in applying machine learning models which is feature scaling. Very rarely would you be dealing with features that share the …

Feature encoding in python using scikit-learn

Note: You can now subscribe to my blog updates here to receive latest updates. A key step in applying machine learning models to your data is feature encoding and in this post, we are going to discuss what that consists of and how we can do that in python using scikit-learn. Not all the fields in your …

Python api for getting market and financial data from IEX

Most of you have probably heard about IEX: The Investors Exchange. IEX is the exchange started by Brad Katsuyama who was the protagonist of Michael Lewis’s famous book Flash Boys (review). Just last year, IEX scored a major win when SEC approved its application to register as a national securities exchange. As time passes by, IEX …

Understanding list, set and dict comprehensions

Just few days ago, you were having a good time with your friends and counting down to 2017. Few days have passed and you are left with a typical cold snowy day in January. You are busy writing code for a high profile project at work. Suddenly, a situation arises where you need to create …

10 python idioms to help you improve your code

If you have ever tried to learn a new language (not a programming language), you know that we always think in our native language before we translate it to the new language. This can lead to you forming some sentences that don’t make sense in the new language but are perfectly normal in your native …