blog image

What Is Apache Druid And What Can It Do For Your Business

There are many databases in the world, all offering different features and often serving different use cases.

Relational databases such as Oracle, MySQL and SQL Server obviously have the lions share of the market, but there are now hundreds if not thousands of databases including NoSQL, Time Series and Graph Databases. These more niche databases are all growing rapidly in adoption as data volumes and requirements for processing increase.

Though our professional services are technology agnostic, for many of our use cases, our go-to datastore is Apache Druid, a platform which is well suited for high volume, interactive and real time analytics at large scale.

We also use Druid underneath our fully managed Timeflow platform, having settled on it after an exhaustive search of the market, and trying 50+ database products to achieve the performance and query characteristics that we need.

Druid has the following characteristics:

  • It is open source, making it free to deploy and modify. Commercial support and addons are also provided by Imply where that is required;
  • It has very high performance characteristics across very large datasets. For instance, Druid is the database which underlies Netflix IOT monitoring solution, ingesting up to 200 million events per second!;
  • This performance extends to the types of slice and dice analytics queries where we want to interactively explore our real time datasets and get immediate feedback in a very exploratory manner;
  • It is very powerful for time series analytics, with all data being keyed and organised by time;
  • It provides a model which allows you to run analytics over a combination of up to the minute real time data and very long term historical data;
  • Long term historical data can be stored on a cheaper storage tier whereas more recent data can be held on high performance servers or in memory for faster interactive querying;
  • It is very easy to integrate with over HTTP APIs;
  • It is cloud native, allowing you to create instances across a large cluster and add or remove capacity as you need to scale. If one of your servers breaks, full resilience is also guaranteed.

What Can Druid Do For Your Business?

So returning to the question above, what can Druid do for your business? Here are the main opportunities as we see them:

  • Arm your people or your systems with up the minute information about what is happening in your business, in situations where time is of the essence;
  • Run complex analytics over either real time or historical data, particularly in an interactive exploratory environment;
  • Process huge volumes of real time event data;
  • Work with time oriented data, understanding what happened over time using time series analysis;
  • Move away from heavyweight properietary databases to something open source, cloud native and easy to integrate with;
  • Perform data science type work over high volume datasets.

To learn more about Apache Druid and for help with deploying, managing and extracting value from it, please get in touch with us for an informal conversation.