Kaggle: Solving Titanic challenge using logistic regression in Spark

In this post, I will show how to tackle Kaggle’s entry level challenge called Titanic. In this challenge, you are given training and test dataset. Your goal is to use the training dataset to build and train a model and then use it to predict whether a passenger will survive or not listed in test …

Subscribing to a message broker in q/kdb through embedPy

Update: As of May 2020, there is now an official Solace interface for kdb+ which can be used, instead of embedPy, to interact with Solace’s PubSub+ broker. You have undoubtedly heard of messaging platforms such as kafka, solace and rabbitMQ that allow applications to communicate with each other.  Messaging platforms have existed for several years …

Book Review: Head First Python by Paul Berry

I have never had any official training in Python and I am sure there are many other developers out there like me who came from a different language and easily picked up Python on their own. A very intuitive way of learning a new language is by using them to solve real world problems one …

Installing kdb+, jupyterq and embedPy using conda!

Note: You can now subscribe to my blog updates here to receive latest updates. Kx announced during their Kx25 event last week that you can now download and install kdb+, embedPy and jupyterq via conda. For those who don’t know, conda is a platform and language agnostic tool for installing packages and managing environments. It’s mainly used …

My (awesome) experience at PyCon 2018 in Cleveland!

It’s been almost 3 years that I have been programming in Python. I picked up Python as a useful tool to solve some critical problems at work and quickly started appreciating all that it had to offer. I used different mediums to learn python. I sought guidance from other senior python developers. I read blog …

Top 5 metrics for evaluating regression models

In my previous posts, I have covered some regression models (simple linear regression, polynomial regression) and classification models (k-nearest neighbors, support vector machines). However, I haven’t really discussed in-depth different ways to evaluate these models. Without proper metrics, not only can you not claim the accuracy of your models confidently but you also cannot compare …

Implementing Support Vector Machine (SVM) algorithm in python

As you have probably noticed by now, there are several machine learning algorithms available at your disposal. In my previous post, I covered a very popular classification algorithm called K-Nearest Neighbors. In today’s post, I will cover another very common and powerful classification algorithm called Support Vector Machine (SVM). What is SVM and how does …

Parsing command line arguments in python

Python is a very popular language for many reasons and one of them is the ability to use it for quick scripting or for an enterprise application. Professionally, I have used python for writing many scripts; some that are quick and temporary, and others that are more complex and long-term. Whatever the purpose of the …

Implementing a Binomial Logistic Regression model in Python

Note: You can now subscribe to my blog updates here to receive latest updates. So far, we have only discussed regression modelling. However, there is another type of modelling called classification modelling. The primary difference between regression models and classification models is that while regression models are used to predict a quantity, classification models are used to …

Implementing a Polynomial Regression Model in Python

So far, we have looked at two types of linear regression models and how to implement them in python using scikit-learn. To recap, we began with a simple linear regression (SLR) model where we have one independent variable (feature) and one dependent variable (label). We then expended it slightly to a more general use case …