I think of kdb+ as the swiss army knife of technologies. Not only is it a database, but with its built-in webserver and own programming language you can do almost anything that you want with it. But with great power comes great responsibility – just because you can do something doesn’t mean that you should.
I always think back to one architectural review I was called into where the client was having reliability issues. They were using kdb+ to analyse performance stats (cpu/mem util/etc) across their trading infrastructure (30+ machines) and loved being able to analyse these stats during spikes and trading abnormalities, as well as allow them to profile hardware utilisation and plan for growth. The issue turned out to be a simple TCP/IP configuration issue but because the developer had written the data collection agent in kdb+ and had moved on, the client was having trouble investing with inhouse resources. We recommended rewriting the agent in java which meant that they could also install the agent on the rest of their infrastructure (100+ machines) without having to worry about increased license costs. I love the simplicity of this review:
- It demonstrates how kdb+ can be utilised outside the financial data domain – the structure of the data and the type of analysis the client wanted to do was ideally suited to kdb+.
- Such a simple fix gave the client a lot more comfort supporting the application with their inhouse team. The capture, storage and analyse components within a kdb+ implementation are so well established at this stage and there is a lot of collateral around them – but there was no reason why the data collection agent should be kdb+.
- In addition to resolving the client’s reliability issue, we were able to add value by extending the application to monitor the clients entire infrastructure, at no additional licensing, development or support costs.
Common Characteristics of kdb+ Implementation
So – what are the common characteristics of a kdb+ implementation then, when should you use it? Most good kdb+ implementations have a few common requirements:
Fast Capture – The ability to reliably ingest a large number of events, typically 20K+ per sec with sustained peaks of 1M+ per sec. The events are usually small (~100-300 bytes) and have 5-30 fields but can vary by use case. Batching would be common with higher ingest rates to ensure maximum throughput and optimal data processing.
Real/Near Time Analysis – Ability to analyse events as they happen and generate aggregations and other ESP (Event Stream Processing) functionality such as windowing, upserts, correlations and as-of joins. Typically, this functionality will be performed while the working datasets are in-memory for fast calculations and best performance.
Historical Data Access – Provide access to vast amounts of historical data for pattern and trend analysis. Data which has been archived from memory into persisted historical data should be accessible in line with the current streaming data to provide a continuous time series if necessary.
Do you need the Swiss Army Knife?
There are a number of questions that should be asked before choosing kdb+. These can be summarised by Volume, Structure, Performance and Cost.
Do you have data volumes that require you to consider kdb+?
While it is regarded as the premium time-series database of choice, especially for financial services, you still need to ensure the volume of data justifies the implementation overhead compared with a more mainstream choice.
Is the data suited to be stored in a column-oriented database?
Kdb+ gained its reputation storing time-series financial data because that data is ideally suited to columnar/vector-based storage. You should make sure that the data you intend to store in kdb+ will be utilised in a similar manner. There are many sectors outside of finance where it does make sense to store data like this, such as hardware performance metrics in the example above or sensor data in IoT applications.
Do you have the performance requirements to justify the investment costs of a kdb+ implementation?
While kdb+ will probably be the fastest solution for storing and accessing your data, especially if the volume and structure requirements are satisfied, we still need to ask if the performance requirements justify the overhead. If you need a report overnight, or access to a historical source of data for a short period of time to test a new trading strategy, then you may not need the performance of kdb+ and another solution might be more suitable.
Can you justify the cost of implementation and maintenance?
The final requirement to consider is cost, not just the financial cost of the product but implementation and maintenance costs associated. In recent years we have seen more and more specific time-series databases come to the market, many of which are open source and have reduced implementation costs and should always factor into any equation.
One of the most important requirements for any solutions architect is to ensure that the right technology choice is made for each component within the stack, and it is no different with kdb+.
I’m passionate about kdb+ and have been lucky enough to witness first-hand the enormous benefits that it has afforded to clients, but I’m even more passionate about using the correct technology for the occasion. In my last blog, I outlined some the questions/factors that should be considered when designing a good kdb+ solution, but the first question any good solutions architect should answer is – is kdb+ the right technology?
Get your free kdb+ health check
Neueda are offering a free, no obligation health check to any company wanting to understand and ensure that kdb+ is being utilised to it’s full potential, and in the correct manner. More information is available here or you can reach out to one of Neueda’s kdb+ specialised at email@example.com.