Joins in kdb+

If you have spent some time working with kdb+, there is no doubt that you have had to use some sort of join in your query. Joins are essential to data analyzing. kdb+ query language is called qsql which, as you probably guessed by name, is based on popular query language, SQL. I don’t have much experience with SQL so I can’t tell you about how joins work in SQL but I can discuss joins in qsql.

There are different types of joins offered in qsql…some of them are very similar so make sure to pay attention to details. I will try my best to provide some good examples to make the difference clear. The available joins are: left join, plus join, inner join, equi-join, union join, asof join and window join.

For all my examples below, I will be using these two tables:

q)t1:([]sym:`IBM`MSFT`AAPL;price:3?100)
q)t2:([]sym:`AAPL`IBM;price:2?100;size:2?500)
q)t1
sym price
----------
IBM 12
MSFT 10
AAPL 1
q)t2
sym price size
---------------
AAPL 90 346
IBM 43 73

Left Join (lj)

This is probably the most popular join out there and very simple to understand too. The syntax is:

q)t1 lj 1!t2

Note that t2 must be keyed or else you will get a `mismatch error.

Left join uses t2 as a lookup dictionary for each row in t1. So, the final table will have all the rows of t1 but with updated values from t2.

q)t1 lj 1!t2
sym  price size
---------------
IBM  43    73
MSFT 10
AAPL 90    346

As you can see, our resultant table has updated values (including a new column, size) from t2 for each sym in t1. If it can’t find a value for a sym, it will simply return null…like it did for `MSFT.

Inner Join (ij)

Inner join is very similar to left join. Inner join only returns data for syms that are available in the lookup table (t2). So, in our case, we should get back pretty much the same table like we did with lj except for the `MSFT row since it’s not there in t2.
Just like lj, t2 needs to be keyed for ij as well.

q)t1 ij 1!t2
sym price size
---------------
IBM 43 73
AAPL 90 346

Plus Join (pj)

A plus join does the same job as an lj…except that for updating the values from the lookup table, it ADDS them.
Just like lj, t2 needs to be keyed for pj as well.

q)t1 pj 1!t2
sym  price size
---------------
IBM  55    73
MSFT 10    0
AAPL 91    346

Equi-Join (ej)

Equi-join is same as inner join but it lets you specify the columns you want to apply it on in the syntax. For example,
ej[`sym;t1;t2] is same as t1 ij 1!t2

q)ej[`sym;t1;t2]
sym price size
---------------
IBM 43 73
These suppliers are well-known for delivering world-wide shipping, on-time delivery, great prices, http://www.icks.org/html/04_publication.php?cate=FALL%2FWINTER+2008 cialis 5 mg and other lucrative benefits on purchase. They viagra online in canada  can intake Eriacta pills to enhance ejaculation ability with moderate use of the medicine as directed by the physician. Below is a list of all the spam emails that you receive there are plenty buy cialis have a peek at these guys of potential clients; however there are more useless leads. buy generic viagra  Some top bloggers write one long post a week and two smaller posts. AAPL 90 346

The first argument of ej is where you can specify the columns you want to apply the join on.

Union Join (uj)

Union join is a very generic join. If both tables, t1 and t2, are unkeyed, it will simply append records from t2 to t1. If both tables are keyed, it will update the values of t1 from t2 (just like left join). If only one table is keyed, you will get an error.

q)t1 uj t2
sym price size
---------------
IBM 12
MSFT 10
AAPL 1
AAPL 90 346
IBM 43 73

q)(1!t1)uj 1!t2
sym | price size
----| ----------
IBM | 43 73
MSFT| 10
AAPL| 90 346

For the asof join and window join, we will use these tables as examples:

q)t1:([]time:3?.z.t;sym:`AAPL`IBM`MSFT;price:3?100)
q)t2:([]time:3?.z.t;sym:`AAPL`MSFT`AAPL;price:3?100)
q)t1
time sym price
-----------------------
00:45:40.134 AAPL 63
05:12:49.761 IBM 93
04:54:11.685 MSFT 54
q)t2
time sym price
-----------------------
05:47:49.777 AAPL 88
02:50:04.026 MSFT 77
00:34:28.887 AAPL 30

Asof Join (aj)

Asof join is a join that is primarily meant to join tables along the time column. In a nutshell, it will take data from one table and find the last values (i.e. give me the price of `AAPL as of 3pm).

The syntax is similar to ej syntax. The common columns specified in aj must be of the same type and the last column specified in the syntax (`time in this case) must be present in both tables.

Let’s anticipate what will happen if we did an aj on t1 and t2. The join will first get the time values from t1 and as of those times, will try to find the last value for each sym. So, let’s look at 'AAPL. Our join will first take the time value of 00:45:40.134 and look for all 'AAPL updates before that time and get us the last one. In this case, the last update occurs at 00:34:28.887 with a price of 30. For 'IBM, we don’t have a matching entry in t2 so there are no updates to consider. For `MSFT, the last update occurs at 02:50:04.026 with price of 77 as of 04:54:11.685. Let’s do the join and see the result.

q)aj[`sym`time;t1;t2]
time sym price
-----------------------
00:45:40.134 AAPL 30
05:12:49.761 IBM 93
04:54:11.685 MSFT 77

Boom! Nailed it!

Window Join (wj)

Window join is a generic version of aj because it can handle aggregations as well. Instead of just getting the last value, it will do an aggregation (you specify what type) on the rows that fall into the time window.

Window join can be very helpful in analytics. You can use it for doing transaction cost analysis by joining trade and quote data to see whether the trade you just made was above or below market price.

Syntax: wj[w;c;t;a] where w is for time windows, c is for columns, t is for table and a is for aggregation functions.

Sample call (taken from kx reference page):

wj[w;`sym`time;trade;(quote;(max;`ask);(min;`bid))]

Make sure you familiarize yourself well with all these joins because most of them are commonly used in data analytics. If you have experience with SQL then it shouldn’t be too difficult as there is a lot of overlapping.

Leave a comment

Your email address will not be published. Required fields are marked *