Using Attributes in kdb+

One of the biggest advantages of kdb+ is fast data retrieval. As a client, you simply run a qsql command (or some API) and you are given thousands and thousands of rows of data within seconds.

How does kdb+ do that?

There are many factors that contribute to high retrieval speed and one of them is table attributes. Attributes don’t actually perform any operations. They are simply a way of tagging your data (metadata) so that kdb+ knows how it is arranged before doing any lookup.

There are 4 types of attributes:

Sort (`s#)

The default lookup method in q is linear so looking up data for AAPL in a trade table that contains millions of rows can be time-consuming. However, if the data is sorted, q uses binary search which speeds up the lookup process.

Here is how you apply the attribute to a list.

q)a:`s#1 5 9 10 20

Sorting the data using one of the built-in functions will add the attribute automatically.

q)asc 1 10 9 20 5
`s#1 5 9 10 20

Unique (`u#)

This attribute tells kdb+ that the content is unique. However, unlike the sort attribute, unique attribute keeps a mapping in the background which can lead to some overhead.

q)`u#1 4 3 2
`u#1 4 3 2

Part (`p#)

This is a very important attribute as it is applied to tables when they are stored in a historical database on disk. If I gave you a trade table with a million rows and asked you to get me all the rows for AAPL where price > 400, you would have to scan the table linearly which is very time-consuming. What if I told you, before you start searching, AAPL data is stored between rows 10 and 15? This will drastically speed up the process because now you only have to look at 6 rows. This is the whole point of part attribute. If a table has the part attribute on the sym column, kdb+ knows that the data is arranged together by sym i.e. all the data for IBM is in one block, then AAPL, and then some other sym.

In the background, kdb+ is keeping a track of the first appearance of each sym. This leads to a bit of overhead but the benefits outweigh the cost for huge tables easily.

In terms of a list,

q)`p#1 2 2 2 5 5 5 6 9 9 10 10
`p#1 2 2 2 5 5 5 6 9 9 10 10

Group (`g#)

Group is a last resort type of attribute. I would not recommend using it often because it has the most overhead. It is similar to part attribute in the sense that it keeps track of location of data to speed up lookup. But unlike part, which only stores the index of first appearance, group stores indices of ALL the appearances.

q)a:1 2 2 2 5 5 5 6 9 9 10 10 5
q)group a
1 | ,0
2 | 1 2 3
5 | 4 5 6 12
6 | ,7
9 | 8 9
10| 10 11
q)`g#a
`g#1 2 2 2 5 5 5 6 9 9 10 10 5

The group attribute is mainly used for realtime data to speed up lookup a bit. At the end of the day, when realtime data is persisted to a historical database, the data is rearranged and part attribute is applied.

How to check attributes on a table/list?

Given a list, you can easily get its attribute by using attr function.

q)attr `s#1 3 5 10
`s

For tables, you can also see the attribute by using meta and looking at the a column.

meta trade

Additional resources

The way it does this is to send a signal to the smooth purchase levitra muscles to relax; smooth muscles such as those found in veins and arteries, so resulting in vasodilation therefore allowing increased blood flow to get a powerful erection. This is why natural male enhancers have become a popular solution cialis professional effects to erectile dysfunction. Anti-depressants can help with your loss Go Here purchase generic cialis of muscle control. This medicine is safe to have. viagra online amerikabulteni.com is available in the form of 25mg, 50mg and 100mg blue pills.

Join the Conversation

1 Comment

Steve Blasen says:

2015-04-12 at 2:11 pm

This is really interesting, You are a very skilled blogger. I have joined your feed and look forward to seeking more of your wonderful post. Also, I have shared your website in my social networks!