blog image

Why Real Time Analytics Is More Than Just Faster Business Intelligence

In many situations, the earlier we respond to incoming data the better.  This might be in a genuinely real time situation such as a self driving car, a trading system or a fraud check, or a more vanilla business scenario such as a product out of stock which we hope to inform our users about as soon as possible.  

The value of data is said to decay over time.  The sooner we can respond as a business, the sooner we can use the data to improve the customer experience, operate more efficiently or capture revenues.  If too much time passes after capturing the data, these opportunities fall away exponentially.

For this reason, many companies are looking to process their data much faster, if not in real time, as part of their digital transformation ambitions.  

This can however be technically challenging with traditional approaches to data engineering and business intelligence, which are more based around periodic delivery of batch data and relatively simple slice and dice analysis once it’s received.  

The first thing companies need to do is refresh and re-engineer their data platforms to deliver data faster.  This could involve something simple like more frequent extract, transform and load from source systems, or something more complex such as moving to a streaming architecture.  This data would then commonly be ingested into storage such as a data warehouse or data lake, and made visible through reports and dashboards earlier than it has been historically.  

For many companies and business scenarios, slightly faster delivery of data into the hands of business users might be enough.  If you have a few tens of thousands of rows in a relational database, putting a dashboard over the top and getting a relatively real time view of the business is feasible.  It’s effectively a minimum maturity level for real time analytics though.  

To deliver on many business outcomes, real time analytics becomes harder, and the associated technology becomes more complex.  For instance, we need to deliver:

  • Low Latency Event Sourcing – The first step towards real time analytics is to capture data in real time from source systems or provide a real time view of data where it lives.  This could include websites, enterprise applications, databases, third party APIs etc.  Ideally we need to move to a situation where data is captured in response to a user action, and made available immediately to our processing and analytical systems;
  • Low Latency Ingestion – Once events are published or made available, we need to bring them into the processing and storage engine as quickly as possible so they are ready for analysis.  Ingesting large volumes of low level business events, accounting for variability in volumes is not a straightforward task;
  • Low Latency Queries – Once data is in our storage tier, we need to be able query this data quickly and continuously, including recently ingested hot data.  This data can be structured or unstructured and probably will not be normalised ready for reporting as we would have done with a traditional business intelligence stack;
  • Big Data – Real time analytics frequently need to happen over large quantities of data such as machine data or streaming updates.  The sheer volume of data makes the task possible and may mean that we need to move towards distributed solutions for stream or data processing;
  • Multiple Queries – We may need to ask many questions of our current data frequently and in parallel, for instance to render a dashboard or to derive the numbers we need.  For this reason, we could be firing thousands of queries at our databases, again with variable load and requirements for low latency lest our data processing backs up;
  • Process Data In Flight – Because of the latency requirements, the volume of the data or the nature of the questions we want to ask, we may need to move towards processing the data in flight and in memory before it hits a database storage tier.  Real time stream processing is a field all in itself with a different programming model.  

This adds up to a much more complex picture than simply doing faster business intelligence.  

As with many techniques, cloud makes this problem more tractable.  There are number of higher level technologies we can use for streaming and then processing the data in flight and at rest.  We can get access to the compute horsepower we need to continually ask complex questions, and can use scale up automatically in response to changing patterns in the data.  The pricing model is all consumption based rather than needing to provision for a worst case scenario making the cost profile better too.  

We believe that intelligent Real Time analytics are going to be a significant differentiator for business going forward.  We are seeing many businesses implementing this capability in cloud environments already, and expect it to be a major theme for data teams over the coming decade.