Running kdb+ on Google Cloud (GCP)

Note: You can now subscribe to my blog updates here to receive latest updates.

Update: This post is now available as a video tutorial on our YouTube channel.

A lot has changed in the kdb+ world recently and as the world moves to the cloud, it was only a matter of time that kdb+ did as well. Few months ago, Kx announced that users will now be able to spin up an instance of kdb+ on Google Cloud (GCP) and pay per usage. This is drastically different than how we have used kdb+ in the past. If I remember correctly, you usually had to get an enterprise license and/or pay per core. With kdb+ on GCP, you have the option to only pay for kdb+ when you use it. You can also scale it easily and not have to worry about burying yourselves in piles of paperwork trying to get approvals for new hardware. We all know how difficult and time consuming that can be. And especially these days with so much volatility in the market, it’s necessary for your infrastructure to be able to scale on demand.

However, if you are a hardcore kdb+ developer, you might question whether kdb+ can perform just as well on a cloud infrastructure. Well, Google recently published the results of latest STAC benchmark on their blog that confirm that kdb+ can indeed be as powerful on cloud. I am not going to go into the details of the results but I will show you how you can spin up a kdb+ instance quickly on GCP.

Continue reading “Running kdb+ on Google Cloud (GCP)”

Kaggle: Solving Titanic challenge using logistic regression in Spark

In this post, I will show how to tackle Kaggle’s entry level challenge called Titanic. In this challenge, you are given training and test dataset. Your goal is to use the training dataset to build and train a model and then use it to predict whether a passenger will survive or not listed in test dataset. Once you have your predictions, you need to submit the results to Kaggle which will evaluate your model’s performance.

As part of this challenge, we will:

  1. Load data
  2. Explore and clean data
  3. Train our model
  4. Predict values using our model

As a hormone, oxytocin is not only frustrating your sexual partner levitra tab 20mg but also to you. The challenging question, but even more buy cheap cialis http://appalachianmagazine.com/category/appalachian-eats/page/2/?filter_by=popular challenging answer is what will rectify past and current wrongs committed by white America against black America. These levitra 40 mg herbs not only has good antibiotic action, but also has good effects on anti-inflammatory. Try sweet potatoes for combatting high free viagra in canada top drugshop blood pressure. I will be using Apache Spark and use it’s machine learning library, MLlib, to build a logistic regression model. Keep in mind, the purpose of this post is not to build a sophisticated machine learning model but to show how you can solve Kaggle challenges using MLlib in Apache Spark.

Continue reading “Kaggle: Solving Titanic challenge using logistic regression in Spark”

Subscribing to a message broker in q/kdb through embedPy

Update: As of May 2020, there is now an official Solace interface for kdb+ which can be used, instead of embedPy, to interact with Solace’s PubSub+ broker.

You have undoubtedly heard of messaging platforms such as kafka, solace and rabbitMQ that allow applications to communicate with each other.  Messaging platforms have existed for several years now but with recent rise in streaming data and remote cloud applications, messaging platforms have become extremely helpful in both capturing this data and communicating with cloud applications.

When it comes to messaging, I have seen very few applications or libraries available in q/kdb+ that do so and the reason is probably because it’s so easy to open a handle to host/port and send a message. Additionally, a lot of existing kdb+ setups are local, meaning that they are hosted in local data centers and don’t require too much communication with other applications. With more and more applications being deployed to the cloud (AWS, Google Cloud, Azure, etc), it has become increasingly important for applications to communicate with each other. This coupled with multiple applications (especially IoT devices) streaming data in real-time to other applications means that we need to be able to push and receive messages from popular message brokers.

Continue reading “Subscribing to a message broker in q/kdb through embedPy”

Book Review: Head First Python by Paul Berry

I have never had any official training in Python and I am sure there are many other developers out there like me who came from a different language and easily picked up Python on their own. A very intuitive way of learning a new language is by using them to solve real world problems one at a time. Whenever I saw a challenge I could tackle at work, I used Python and in the process, learnt a bit more each time.

While this approach has many pros, the main con is that you miss out on the fundamentals. You might eventually learn them later but these fundamentals are the foundations of everything so it’s best to learn them in the beginning. This is why I picked up a beginner’s book called Head First Python by Paul Berry. While I was familiar with the Head First series, this is the first time I actually read one of its books. And I must say, I loved the approach that the book takes to teaching someone a new language. There are a lot of visuals so you don’t feel like you are just reading plain text. Additionally, there is a lot of repetition of core concepts so you don’t forget them easily.

Continue reading “Book Review: Head First Python by Paul Berry”

Installing kdb+, jupyterq and embedPy using conda!

Note: You can now subscribe to my blog updates here to receive latest updates.

Kx announced during their Kx25 event last week that you can now download and install kdb+, embedPy and jupyterq via conda. For those who don’t know, conda is a platform and language agnostic tool for installing packages and managing environments. It’s mainly used by python developers and was created by Anaconda.

To avoid any conflict with my existing installation of kdb+, I am going to start a new AWS EC2 instance for this demo. If you would also like to start an instance of AWS EC2, please follow instructions from an earlier post of mine.

Continue reading “Installing kdb+, jupyterq and embedPy using conda!”

Kx celebrates 25 years of kdb

Note: You can now subscribe to my blog updates here to receive latest updates.

Last Friday, on May 18th, Kx celebrated 25 years of kdb with a full day conference at New York Academy of Sciences in downtown Manhattan, NY. You can find the agenda here.

In the last year or so, kdb+ has become increasingly ‘open’ to other technologies and is seeing a lot more adoption in non-financial industries such as pharmaceuticals, energy, car racing etc. Events like these are helpful because they provide a good insight into what Kx is planning for kdb+’s future and it is also a good opportunity for me to see all my ex-colleagues!

Unfortunately, I wasn’t able to attend all the tasks. I went after lunch and was able to attend almost half of the sessions. Before I discuss the different session I attended, I want to highlight some of the exciting announcements from the event.

Continue reading “Kx celebrates 25 years of kdb”

My (awesome) experience at PyCon 2018 in Cleveland!

It’s been almost 3 years that I have been programming in Python. I picked up Python as a useful tool to solve some critical problems at work and quickly started appreciating all that it had to offer. I used different mediums to learn python. I sought guidance from other senior python developers. I read blog posts. I took online courses. I did side projects in python and blogged about them. Finally, I watched as many PyCon videos I could find on youtube from previous years.

PyCon always stood out to me as one of the few conferences where you can actually learn a lot from the talks. Many conferences are too high level and are primarily used as an advertising platform by the sponsors. PyCon, on the other hand, offers a great opportunity for python developers from different regions and companies to get together and learn from each other as well as from experts of the industry.

For those who aren’t familiar with PyCon, it’s the most popular python conference which started in 2003 and was first held in Washington D.C. It grew from 200 attendees in its first year to almost ~4000 attendees. It’s held in a different city in US every two years. PyCon 2016 and PyCon 2017 were in Portland and this year’s PyCon (2018) was held in Cleveland, Ohio.

Continue reading “My (awesome) experience at PyCon 2018 in Cleveland!”

Top 5 metrics for evaluating regression models

In my previous posts, I have covered some regression models (simple linear regression, polynomial regression) and classification models (k-nearest neighbors, support vector machines). However, I haven’t really discussed in-depth different ways to evaluate these models. Without proper metrics, not only can you not claim the accuracy of your models confidently but you also cannot compare different models to pick the most accurate one.

In this post, I want to focus on some of the most popular metrics that are used to evaluate regression models. These metrics are (in no particular order):

  • Explained Variance Score (EVS)
  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • R Squared Score (R2 Score)
  • Adjusted R Squared Score

The medicine works by increasing the blood flow to the penis, but can cause nasal congestion, headache, upset http://cute-n-tiny.com/tag/monkey/ order viagra usa stomach, and vision changes. Both sexes use buy soft cialis to increase their sexual desire. viagra should be taken an hour prior to a sexual act and remain in the human body for more than thirty six hours and still be effective. As it makes your body gradually healthier in uk viagra online the meantime can be used to improve sexual function and desire. Any of those issues may very well be due to viagra prices canada many causes.
These metrics were calculated in my post (except for adjusted R2 score) about implementing polynomial regression model.

Continue reading “Top 5 metrics for evaluating regression models”

Implementing Support Vector Machine (SVM) algorithm in python

As you have probably noticed by now, there are several machine learning algorithms available at your disposal. In my previous post, I covered a very popular classification algorithm called K-Nearest Neighbors. In today’s post, I will cover another very common and powerful classification algorithm called Support Vector Machine (SVM).

What is SVM and how does it work?

Just like KNN, SVM is a supervised learning model which means that it learns from the training set that we feed it. It can be used for both classification and regression problems but it’s mostly used for classification. In this post, we will focus on using SVM for classification.

SVM consists of picking support vectors and then using them to define a decision boundary for classifying features into different classes. The decision boundary is more formally known as hyperplane. Points on different sides of the plane belong to different classes. However, different sets of points can be segregated by numerous hyperplanes so how do you decide which hyperplane to select? That’s where the support vectors come into the picture.

Continue reading “Implementing Support Vector Machine (SVM) algorithm in python”

Implementing k-nearest neighbors in python

Last time, we looked into one of the simplest classification algorithms in machine learning called binomial logistic regression. In this post, I am going to cover another common classification algorithm called K Nearest Neighbors, otherwise known as KNN.

To recap, we have mostly discussed regression models such as simple and multivariate linear regression and polynomial regression which are used for predicting a quantity. On the other hand, classification models are used for predicting a category such as yes/no, will buy car/scooter/truck, will turn pink/green/red etc.

Continue reading “Implementing k-nearest neighbors in python”