Python is a very popular language for many reasons and one of them is the ability to use it for quick scripting or for an enterprise application. Professionally, I have used python for writing many scripts; some that are quick and temporary, and others that are more complex and long-term. Whatever the purpose of the …
Author Archives: Himanshu
Why documentation is more important than code
Taking a break from the machine learning heavy posts, I would like to talk about something slightly different but very important: documentation. Which is more important: code or documentation? If asked, most developers would say that code is more important than documentation. As a developer, you are given some business requirements and are asked to …
Continue reading “Why documentation is more important than code”
Implementing a Binomial Logistic Regression model in Python
Note: You can now subscribe to my blog updates here to receive latest updates. So far, we have only discussed regression modelling. However, there is another type of modelling called classification modelling. The primary difference between regression models and classification models is that while regression models are used to predict a quantity, classification models are used to …
Continue reading “Implementing a Binomial Logistic Regression model in Python”
2017: Year in Review
And just like that, 2017 is almost gone. Last year, around this time, I wrote a post through which I reflected what I did or did not accomplish in 2016. It is now time to do the same for 2017. A lot happened in 2017 and I would like to take a step back and …
Implementing a Polynomial Regression Model in Python
So far, we have looked at two types of linear regression models and how to implement them in python using scikit-learn. To recap, we began with a simple linear regression (SLR) model where we have one independent variable (feature) and one dependent variable (label). We then expended it slightly to a more general use case …
Continue reading “Implementing a Polynomial Regression Model in Python”
Analyzing NYC motor vehicle data in Spark
A while back I wrote about analyzing NYC’s traffic (motor vehicle) data in q/kdb+. Then, soon afterwards, I showed how to analyze that data in python using pandas library. Now, I would like to again analyze the same dataset but this time, in Apache Spark. As I mentioned in my last post, I am currently …
Continue reading “Analyzing NYC motor vehicle data in Spark”
Setting up Apache Spark on an AWS EC2 instance
I am currently learning Apache Spark and how to use it for in-memory analytics as well as machine learning (ML). Scikit-learn is a great library for ML but when you want to deploy an ML model in prod to analyze billions of rows (‘big data’), you want to be working with some technology or framework …
Continue reading “Setting up Apache Spark on an AWS EC2 instance”
Implementing a Multiple Linear Regression model in python
Earlier, I wrote about how to implement a simple linear regression (SLR) model in python. SLR is probably the easiest model to implement among the most popular machine learning algorithms. In this post, we are going to take it one step further and instead of working with just one independent variable, we will be working …
Continue reading “Implementing a Multiple Linear Regression model in python”
q/kdb+ api for getting market and financial data from IEX
Few months ago, I wrote an api for getting market and financial data from IEX in python. As discussed earlier, IEX makes a lot of its data available to the public through its webservice api (link). In this post, I will show you how to use the api I wrote in q/kdb+. Let’s get started. …
Continue reading “q/kdb+ api for getting market and financial data from IEX”
Implementing Simple Linear Regression Model in Python
So far, I have discussed some of the theory behind machine learning algorithms and shown you how to perform vital steps when it comes to data preprocessing such as feature scaling and feature encoding. We are now ready to start with the simplest machine learning algorithm which is simple linear regression (SLR). Remember, back in …
Continue reading “Implementing Simple Linear Regression Model in Python”