Over the next few years, many businesses are going to be building real time streaming data platforms.
The aim will be for data to stream in from many sources and be available earlier for analysis or to power user experiences or business processes in real time.
Most of these event streaming architectures will follow a similar pattern such as this:
The pillars are:
The benefits of these platforms are huge, including faster time to insight, automation of business processes, lower storage costs, and simplifying the data estate away from batch ETL.
This said, the engineering effort isn’t trivial, and I think much of it will be reinventing the wheel with architectures such as the above being deployed over and over again. Fortunately, many of these building blocks are ‘cloud native’, so the engineering effort will be minimised, but there is still a lot of heavy lifting that needs to take place.
For this reason, we tend to propose alternatives such as using Databricks to provide much of the solution as a service, or services such as our Streaming Data Platform which exposes more of the underlying open source technologies. This way, the effort can be spent in deploying the business processes and analytics which actually move the needle for your business rather than building data infrastructure.