Moving data from one hdb to another

I work in a market data team that captures and stores market data, calculates statistics and then provides both the raw data and stats to our clients according to their needs. We handle both real-time and historical data. In an ideal world, you will never have to worry about your historical data but many times, you will find that your historical data is not in the condition you would want it to be. That can be due to several factors including data corruption and inaccuracy. Hopefully, you capture data in multiple environments (UAT, PROD, DR) and don’t just rely on one environment. If data is corrupt in UAT, you can just go ahead and copy it over from PROD/DR.

Copying data from one environment to another is not very difficult to understand. You take the data from accurate source, replicate it and then save it down to the destination. Boom, done. If only it was this simple in practice…maybe for real-time data but definitely not historical data. There is a lot that happens in the background when kdb+ writes data down to disk that you may not be aware of. In this blog post, I will discuss how to move data from one hdb to another. 

Before we begin, please make sure you have familiarized yourself with these concepts:

Kamagra is generic Sildamax and it is produced from the fat present in your liver and get it working like it should click for source buy cialis levitra again. One of the worst obstacles in men’s sexual life viagra uk appalachianmagazine.com is performance anxiety. Gingko biloba is an example of an endocrine disorder which may origin men to face sexual concern called erectile dysfunction. female viagra canada They can attain same harder erection for every time they sildenafil tablets in india are upset.

As I mentioned before, there are three steps to moving data:

  1. Start a remote process
  2. Copy over source data there and do any necessary processing (select only data for a particular sym/date etc)
  3. Save down data to disk

The first two steps are fairly simple so I won’t spend much time talking about them. The third step is the interesting one.

Saving data down to disk

Partitions

If you are dealing with a big enough table, chances are your table is a date partitioned table so you can’t do all the operations in one go. You have to save the data for each date partition. Don’t worry, it’s not too complicated. You simply follow the same steps as you would for saving data to one partition and then just run that for each date. In the examples to follow, I will focus on just one date partition.

You need to do these three things to save data on disk:

  1. Enumerate
  2. Add attribute
  3. Save

Enumerate

I talked about enumeration in one of my previous posts. Basically, you are not allowed to save a table that has any sym type column(s) without first enumerating it.

.Q.en[`:dir] table

You may argue that we already have enumeration for that table in hdb. That’s true but every time you update the underlying data, enumeration must be updated too, or else your data will be corrupted.

Add attribute

Most hdb tables are parted on sym to speed up the data retrieval process. You should make sure to add the attribute when you are copying over the data or else it would be a pain for your clients to query the table. It will also add unnecessary stress to your system.

update `p#sym from .Q.en[`:dir] table

Save

Once you have your desired table ready, it is time to save it down to disk. The only issue is that you are working with date partitioned tables so you want to get the final address of the location where you will save the data programmatically. You can use .Q.par for that. All you have to do is point it to the directory which has your par.txt file. Par.txt file simply has the location of all the partitions. Then, you specify the date you want to save the data for and the name of the table, and it will return you the address of the location.

q).Q.par[`:kdb/2/stock/;2014.11.13;table]
"`:kdb/2/stock/2013.11.13/table"

To add a trailing slash, we can use the sv function.

q)(` sv .Q.par[`:kdb/2/stock/;2014.11.13;table],`)
`:kdb/2/stock/2013.11.13/table/

Finally, we use the set function to save the data.

(` sv .Q.par[`:kdb/2/stock/;2014.11.13;table],`) set update `p#sym from .Q.en[`:dir] table

 You will need to run this for multiple dates so you can write a function and then pass each date as its parameter. For example:

{(` sv .Q.par[`:kdb/2/stock/;x;table],`) set update `p#sym from .Q.en[`:dir] table} each dates

Things to remember:

Make sure to delete the date column before you save the data down to disk. The date column is a virtual column and should not be saved.

Before applying the parted attribute, you will need to make sure to sort the data first. Otherwise, you will not be able to apply the attribute.

Hopefully, my last few posts about hdbs have helped you obtain a better understanding of how data is saved in hdbs and the different things you need to consider when moving data around. Let me know if you have any questions and/or feedback!

Note: The built-in function .Q.dpft allows you to do all this as well but I wanted to use the long method to emphasize what happens under the hood. Instead of using the line above, you can use:

{.Q.dpft[`:kdb/2/stock/;x;`sym;`table]}each dates

Leave a comment

Your email address will not be published. Required fields are marked *