Enabling a modern hybrid cloud kdb+ stack with PubSub+

Note: You can now subscribe to my blog updates here to receive latest updates.

Things used to be much simpler just a few years ago when you had all your applications deployed in your on-prem datacenter. Sure, you had to manage all that yourself, but it was easy to deploy your applications on your finite number of servers. Things are much different now. Cloud computing has really taken off and you don’t have to worry about managing your own datacenter anymore, at least not to the extent you used to earlier. Many companies, especially startups, have decided to embrace the cloud fully. However, if you are a large enterprise, you still have your on-prem datacenter for your critical applications managing sensitive data, but everything else has either already migrated to the cloud or is in the process of.

Similarly, your kdb+ stack used to be fully on-prem, running on multiple powerful servers spread across the world to capture market data globally. But slowly, you are realizing that maybe there is an alternate way to manage your kdb+ stack. Maybe not all components of your kdb+ stack need to be on-prem. Maybe other applications in your organization might benefit from having access to the data in your kdb+ database.

However, there is a problem. Not only has your kdb+ stack evolved, but other application stacks have also evolved over time and are now flexibly deployed on-prem, or in a hybrid/multi-cloud setup. How do you manage data transfer between your q applications running locally on-prem and on the public cloud? How do you then make this data available to other applications in hybrid/multi-cloud?

I told you life was much simpler before.

But worry not because, in this post, I am going to pull together different modules I have been working on in the last few weeks/months and show you how you can easily stitch your applications together in a robust, uniform, and secure manner.

We are going to look at a market data flow that consists of three different components. I have already written individual posts about each of these components that go into detail on how they work and how to set them up. Please have a look at them to get a better understanding of each component. These components are:

  • Feed handler (Java, locally deployed) – responsible for connecting to market data vendor and publishing that data it to internal apps
  • Stats/analytics process (q/kdb+, deployed on AWS) – responsible for ingesting raw market data updates and generating minutely stats
  • Data warehouse (BigQuery, deployed on GCP) – responsible for capturing and storing all the stats updates in real-time.

Note that in this setup, not only are we using different languages such as q and Java or different databases such as kdb+ and BigQuery but they are also deployed in different environments such as on-prem, AWS, and GCP.

To stitch these different applications and platforms together, we will be using Solace’s PubSub+ event broker.

Overall, this is what our architecture will look like:

By far the most buy levitra online common of these that result in neck pain radiating into the shoulder, arm, wrist, and hand or lower back pain radiating into the buttocks Pain in the hamstrings, or back of the thighs Pain while sitting or sleeping in certain positions Pain during or after exercise Testicular pain in men Lower back numbness A cold sensation in the lower part of your body Back stiffness, especially after. Defined as ‘inability to discount buy viagra gain and keep sufficiently hard erection’, impotence is a big slap on your manhood. If the procedure is affected because of it. buy canada viagra Another reason physicians avoid recommending beta blockers is because they conflict cialis super viagra with treatments for other conditions.

Event Mesh

While I had the option to just have a single PubSub+ broker deployed on any of the major cloud providers via Solace Cloud and have all three processes use the same broker, it is not how you would implement in production. In a production environment, you would have multiple deployments of the broker in different environments and regions. Hence, I decided to have three deployments of PubSub+ broker:

  • PubSub+ software broker locally deployed via docker
  • PubSub+ broker deployed via PubSub+ Cloud in AWS
  • PubSub+ broker deployed via PubSub+ Cloud in GCP

That’s all great but how do we connect all these brokers together? The answer to that is: Event Mesh through Dynamic Message Routing (DMR). PubSub+ comes with a powerful feature called DMR which allows you to link your brokers together dynamically to form an event mesh.

With our brokers linked together, our applications can continue to publish to their local instance but they can now subscribe to messages being published to topics on other brokers. This would allow our stats process to consume raw market data in AWS that is being published to a local broker.


Putting it all together

I have gone ahead and connected the three brokers and started each of the three processes.

Feed Handler (market data simulator)

My feed handler is publishing simulated market data for a handful of securities from different exchanges:

=======================================================
Publishing to topic: EQ/marketData/v1/UK/LSE/BARC
Data: {"date":"2020-06-09","symbol":"BARC","askPrice":90.35925,"bidSize":510,"tradeSize":160,"exchange":"LSE","currency":"GBP","time":"12:17:29.291393-04:00","tradePrice":90.021675,"askSize":340,"bidPrice":89.6841}
=======================================================
Publishing to topic: EQ/marketData/v1/UK/LSE/TED
Data: {"date":"2020-06-09","symbol":"TED","askPrice":136.32913,"bidSize":640,"tradeSize":360,"exchange":"LSE","currency":"GBP","time":"12:17:29.292771-04:00","tradePrice":135.48236,"askSize":320,"bidPrice":134.63559}
=======================================================
Publishing to topic: EQ/marketData/v1/US/NASDAQ/AAPL
Data: {"date":"2020-06-09","symbol":"AAPL","askPrice":273.38898,"bidSize":400,"tradeSize":200,"exchange":"NASDAQ","currency":"USD","time":"12:17:30.295731-04:00","tradePrice":272.02884,"askSize":480,"bidPrice":270.6687}
=======================================================
Publishing to topic: EQ/marketData/v1/US/NASDAQ/FB
Data: {"date":"2020-06-09","symbol":"FB","askPrice":198.85513,"bidSize":500,"tradeSize":30,"exchange":"NASDAQ","currency":"USD","time":"12:17:30.301617-04:00","tradePrice":196.88628,"askSize":650,"bidPrice":194.91742}
=======================================================
Publishing to topic: EQ/marketData/v1/US/NASDAQ/INTC
Data: {"date":"2020-06-09","symbol":"INTC","askPrice":66.58829,"bidSize":0,"tradeSize":490,"exchange":"NASDAQ","currency":"USD","time":"12:17:30.306857-04:00","tradePrice":65.929,"askSize":650,"bidPrice":65.269714}
=======================================================
Publishing to topic: EQ/marketData/v1/US/NYSE/IBM
Data: {"date":"2020-06-09","symbol":"IBM","askPrice":98.59332,"bidSize":60,"tradeSize":10,"exchange":"NYSE","currency":"USD","time":"12:17:30.31108-04:00","tradePrice":97.859375,"askSize":460,"bidPrice":97.12543}
=======================================================
Publishing to topic: EQ/marketData/v1/US/NYSE/BAC
Data: {"date":"2020-06-09","symbol":"BAC","askPrice":22.801601,"bidSize":130,"tradeSize":470,"exchange":"NYSE","currency":"USD","time":"12:17:30.315562-04:00","tradePrice":22.603819,"askSize":400,"bidPrice":22.406036}
=======================================================
Publishing to topic: EQ/marketData/v1/US/NYSE/XOM
Data: {"date":"2020-06-09","symbol":"XOM","askPrice":46.533016,"bidSize":80,"tradeSize":230,"exchange":"NYSE","currency":"USD","time":"12:17:30.31798-04:00","tradePrice":46.072292,"askSize":410,"bidPrice":45.61157}
=======================================================
Publishing to topic: EQ/marketData/v1/UK/LSE/VOD
Data: {"date":"2020-06-09","symbol":"VOD","askPrice":86.92494,"bidSize":40,"tradeSize":410,"exchange":"LSE","currency":"GBP","time":"12:17:30.320502-04:00","tradePrice":85.95792,"askSize":350,"bidPrice":84.99089}
=======================================================
Publishing to topic: EQ/marketData/v1/UK/LSE/BARC
Data: {"date":"2020-06-09","symbol":"BARC","askPrice":91.83111,"bidSize":530,"tradeSize":140,"exchange":"LSE","currency":"GBP","time":"12:17:30.32268-04:00","tradePrice":90.92189,"askSize":280,"bidPrice":90.01267}
=======================================================

As you can see, currently, our feed handler is publishing data for securities from US and UK exchanges (since I am running the feed handler during their market hours). However, I am only interested in generating stats for securities traded in US. So, in my market_data queue, I have used PubSub+’s wildcard filtering capability to subscribe to this topic: EQ/marketData/v1/US/>. This only enqueues US market data into my queue and saves me the trouble of having to filter these records myself in my q stats process.

Stats generation

In parallel, I have my q stats process running on AWS and connected to a different broker deployed in AWS. Here is the output of my stats process for each symbol:

AAPL| "[{\"date\":\"2020-06-09\",\"sym\":\"AAPL\",\"time\":\"13:02\",\"lowAskSize\":0,\"highAskSize\":790,\"lowBidPrice\":316.5098,\"highBidPrice\":330.9588,\"lowBidSize\":0,\"highBidSize\":780,\"lowTradePrice\":318.2236,\"highTradePrice\":333.0169,\"lowTradeSize\":0,\"highTradeSize\":490,\"lowAskPrice\":319.8147,\"highAskPrice\":335.9308,\"vwap\":235.2322}]"
BAC | "[{\"date\":\"2020-06-09\",\"sym\":\"BAC\",\"time\":\"13:02\",\"lowAskSize\":0,\"highAskSize\":790,\"lowBidPrice\":55.14443,\"highBidPrice\":63.4184,\"lowBidSize\":20,\"highBidSize\":780,\"lowTradePrice\":55.70145,\"highTradePrice\":64.21822,\"lowTradeSize\":0,\"highTradeSize\":500,\"lowAskPrice\":56.25846,\"highAskPrice\":65.02095,\"vwap\":238.9565}]"
FB  | "[{\"date\":\"2020-06-09\",\"sym\":\"FB\",\"time\":\"13:02\",\"lowAskSize\":10,\"highAskSize\":790,\"lowBidPrice\":139.3889,\"highBidPrice\":146.585,\"lowBidSize\":0,\"highBidSize\":720,\"lowTradePrice\":140.2678,\"highTradePrice\":148.2529,\"lowTradeSize\":10,\"highTradeSize\":500,\"lowAskPrice\":140.6184,\"highAskPrice\":149.9207,\"vwap\":225.4108}]"
IBM | "[{\"date\":\"2020-06-09\",\"sym\":\"IBM\",\"time\":\"13:02\",\"lowAskSize\":0,\"highAskSize\":730,\"lowBidPrice\":72.99904,\"highBidPrice\":79.32771,\"lowBidSize\":10,\"highBidSize\":800,\"lowTradePrice\":73.54964,\"highTradePrice\":79.32771,\"lowTradeSize\":0,\"highTradeSize\":500,\"lowAskPrice\":73.73595,\"highAskPrice\":79.93908,\"vwap\":227.7111}]"
INTC| "[{\"date\":\"2020-06-09\",\"sym\":\"INTC\",\"time\":\"13:02\",\"lowAskSize\":0,\"highAskSize\":790,\"lowBidPrice\":81.64793,\"highBidPrice\":87.0667,\"lowBidSize\":0,\"highBidSize\":780,\"lowTradePrice\":82.36865,\"highTradePrice\":87.17567,\"lowTradeSize\":10,\"highTradeSize\":500,\"lowAskPrice\":83.08938,\"highAskPrice\":87.61224,\"vwap\":228.8886}]"
XOM | "[{\"date\":\"2020-06-09\",\"sym\":\"XOM\",\"time\":\"13:02\",\"lowAskSize\":20,\"highAskSize\":800,\"lowBidPrice\":18.55276,\"highBidPrice\":19.46045,\"lowBidSize\":0,\"highBidSize\":790,\"lowTradePrice\":18.73785,\"highTradePrice\":19.55809,\"lowTradeSize\":0,\"highTradeSize\":500,\"lowAskPrice\":18.83445,\"highAskPrice\":19.71465,\"vwap\":273.4346}]"

These stats are computed every minute and are published to new dynamic topics with the following topic heirarchy: EQ/stats/v1/<sym>. For example, INTC’s stats are published on EQ/stats/v1/INTC.

We have a separate queue called stats which is subscribed to EQ/stats/> so it is capturing all stats messages that our stats process is publishing.

Data warehousing in BigQuery

My third process is a Beam/Dataflow pipeline running in GCP which is consuming all stats messages from our stats queue, parsing them, and then writing them to BigQuery. Here is what that pipeline looks like in Dataflow:

We can see that our stats are being written into BigQuery:

Viola! We now have our stats data inserted into BigQuery. You can also see that we only have stats for US stocks because we used PubSub+’s wildcard filtering earlier in the q stats process.


Wrap Up

As other application architectures have evolved in the last few years, kdb+ architecture has also evolved. Applications running in hybrid/multi-cloud environments and with different runtime environments can communicate with each other using PubSub+ event broker. In this post, I showed how a java feed handler running on-prem can send raw market data prices to a q stats process which in turn sends it to BigQuery through PubSub+.

This architecture can evolve further to add multiple different components such as a machine learning algorithm in AWS or a visualization application running in Azure on top of data streaming through PubSub+.

I hope you enjoyed this post. Feel free to leave a comment if you have any questions.

Leave a comment

Your email address will not be published. Required fields are marked *