blog image

Build Vs Buy For Real Time Event Streaming Platforms

Over the next few years, many businesses are going to be building real time streaming data platforms.

The aim will be for data to stream in from many sources and be available earlier for analysis or to power user experiences or business processes in real time.

Most of these event streaming architectures will follow a similar pattern such as this:

The pillars are:

  • Some mechanism for extracting data from sources and turning them into timestamped events;
  • Some stream processing mechanism, which ~7 times out of 10 will be Kafka and the remaining 3 times something like Kinesis or PubSub;
  • A stream processing component, such as Kafka Streams, Flink or Spark Streams to process, analyse and aggregate the streaming data;
  • A data lake or database to store the streaming data and make it available for consumption;
  • Various means of accessing and analysing the data, including notebooks, application APIs and reporting front-ends.

The benefits of these platforms are huge, including faster time to insight, automation of business processes, lower storage costs, and simplifying the data estate away from batch ETL.

This said, the engineering effort isn’t trivial, and I think much of it will be reinventing the wheel with architectures such as the above being deployed over and over again. Fortunately, many of these building blocks are ‘cloud native’, so the engineering effort will be minimised, but there is still a lot of heavy lifting that needs to take place.

For this reason, we tend to propose alternatives such as using Databricks to provide much of the solution as a service, or services such as our Streaming Data Platform which exposes more of the underlying open source technologies. This way, the effort can be spent in deploying the business processes and analytics which actually move the needle for your business rather than building data infrastructure.