enlist[q]

Challenge#1 – Partial sort

One of my colleagues recently introduced me to Project Euler. Project Euler is a great way for you to solve mathematical problems using your preferred language and then compare your solution with others. However, I noticed that most of the problems there were not practical for q. Many were designed to use loops which are frowned up on in q. This is why I am going to start posting challenges here that are q/kdb+ related.

Our first challenge is about partial sort.

A developer asked on q/kdb+ google groups page about how to partially sort a list. Let’s say I have a list:

53 66 59 30 85 89 23 60 6 52 39

I want to get:

30 53 59 66 85 89 23 60 6 52 39

Note that it found the middle number (89) and then sorted the list up to that point.

Post your solutions in the comments section.

Generating the fibonacci sequence

I am going to assume that you all know what is a fibonacci sequence (FS). If you don’t, you can read up about it here. In my last post, I talked about using the adverb over. In this post, I will show you how over can be used to generate FS.

Main thing about FS is that you need to take sum of the current value and previous value to get the next value in the sequence. The trick to tackling this task is starting with a pre-defined list with values 0 and 1 since those are always the first two values in FS. Once you have that, you can add new values to it as you run your function. You can use over to specify how many times you want your function to be run over the initial list.

Scan and Over – iterating through lists

One of the first things I was told by my ex boss was to never use for/while loops unless it’s a matter of life and death. And he was dead serious. I had just taken two c++ classes in college so wasn’t sure how I would be able to replace for/while loops. They are easy to use and make your life easier. BUT that’s not the case in q. There are better alternatives there. Thing to remember with q is that it loves lists. If there is any operation where you can use lists, then do that!

In this post, I am going to cover two adverbs, scan and over, that you can use instead of for/while loops.

Functional queries

I used to hate functional queries. Part of me still does. They are hard to write, look like gibberish and not as simple to understand as qsql queries. I can’t believe I am saying this but as you get more familiar with q, you start seeing their advantages and point of their existence. q is a functional language and the best way to unlock its full potential is via functions.

For me, functional queries are most useful when I am trying to run them on a remote process/db. Instead of wrapping a qsql query in string and then sending it via IPC, I prefer to use functional queries. This is especially important when you want to pass arguments to a query…yes, basically treating it as a function. That is when having a functional query helps because it allows you to easily pass locally defined parameters to a remote process. Anyways, that’s a discussion for another time. Today, we will look at how to actually write them.

OOP concepts in q

Many of my friends (almost 90%) are programmers…object oriented programmers. They also have much more experience in programming than I do. Many of them have at least 4+ years of experience than me. I always hear them engaging in discussions about OOP concepts and to be honest, I feel left out sometimes. I have never programmed in an object oriented language professionally. I read some book on java before I started working and took 2 c++ classes in college. That’s pretty much it. I have no professional experience with them. But I am familiar with the basic concepts.

I thought I will cater this post to those programmers that have an object oriented background. It is usually said that functional languages are much easier to learn if you don’t have much programming experience with an object oriented language. In this post, I am going to take some basic OOP concepts and find their respective match in q.

Continue reading “OOP concepts in q”

qSQL queries for performing analytics

I realize that there are many developers out there that are not looking to get into q completely and are simply using q/kdb+ along with qsql to perform analytics (i.e. quants). My job requires all of this so I have some good experience running qsql queries. Of course, the type of query you need to run really depends on what kind of data you are looking to retrieve so I can’t possibly cover them all in this post. But I will mention some common queries you can run.

All the examples will focus on these two tables:

q)q1:([];time:5?.z.t;sym:5?`AAPL`MSFT`IBM;ask:5?100;bid:5?100)
q)t1:([];time:5?.z.t;sym:5?`AAPL`MSFT`IBM;price:5?100;size:5?200)
q)q1
time sym ask bid
-------------------------
02:59:16.636 IBM 40 2
14:35:31.860 AAPL 88 39
16:36:29.214 AAPL 77 64
08:31:52.958 IBM 30 49
07:14:12.294 AAPL 17 82
q)t1
time sym price size
----------------------------
10:25:30.322 AAPL 8 36
14:17:41.480 AAPL 97 12
08:50:31.645 MSFT 52 45
15:20:08.925 AAPL 66 83
09:01:27.840 MSFT 24 94

Joins in kdb+

If you have spent some time working with kdb+, there is no doubt that you have had to use some sort of join in your query. Joins are essential to data analyzing. kdb+ query language is called qsql which, as you probably guessed by name, is based on popular query language, SQL. I don’t have much experience with SQL so I can’t tell you about how joins work in SQL but I can discuss joins in qsql.

There are different types of joins offered in qsql…some of them are very similar so make sure to pay attention to details. I will try my best to provide some good examples to make the difference clear. The available joins are: left join, plus join, inner join, equi-join, union join, asof join and window join.

Moving data from one hdb to another

I work in a market data team that captures and stores market data, calculates statistics and then provides both the raw data and stats to our clients according to their needs. We handle both real-time and historical data. In an ideal world, you will never have to worry about your historical data but many times, you will find that your historical data is not in the condition you would want it to be. That can be due to several factors including data corruption and inaccuracy. Hopefully, you capture data in multiple environments (UAT, PROD, DR) and don’t just rely on one environment. If data is corrupt in UAT, you can just go ahead and copy it over from PROD/DR.

Copying data from one environment to another is not very difficult to understand. You take the data from accurate source, replicate it and then save it down to the destination. Boom, done. If only it was this simple in practice…maybe for real-time data but definitely not historical data. There is a lot that happens in the background when kdb+ writes data down to disk that you may not be aware of. In this blog post, I will discuss how to move data from one hdb to another.

How data is saved in a historical database

Real-time databases are easy to understand as everything is happening in memory. Historical databases are a different story. Recently, I have been getting much more involved with hdbs at work and just the way they work is incredible and fascinating to me. If you are serious about kdb+, you should make sure to know how hdbs work under the hood. It will also help you understand the reason behind writing qsql queries a certain view (specifying date parameter first). It will help you understand that moving data from one hdb to another hdb is not as simple as copying it from an rdb to another rdb.

In this blog post, I will cover how data is stored on disk (splayed/partitioned tables) and why that makes data retrieval fast. I will also touch attributes briefly.

Overview of kdb+ architecture

So far, I have covered quite a few intermediate topics without covering the basics. In this post, I would like to take a step back and talk about general architecture of kdb+. A simple kdb+ setup includes a feed handler (fh), a ticker plant (tp), a real-time database (rdb), and a historical database (hdb). These processes work together to manage the data flow.

Feed Handler

Before you can start capturing data, you first need a feed handler to get you that data. In case of stock data, there are ssl feed handlers that can be used. A feed handler’s job is to parse incoming data from the source (i.e. Reuters) and push it to a ticker plant.