Timeflow is a software platform for real time ‘complex event processing’. This involves listening to ‘streams’ of events, and identifying combinations of events, patterns and occurences as they occur in real time.
When a situation of interest arises, actions can then be taken either by the platform or by a human to intervene. This is more impactful than traditional business intelligence and analytics as it allows you to act in the moment whilst the insight is most relevant in order to improve customer service or change a business process rather than just seeing it on a dashboard or report.
Complex Event Processing has applications in almost every industry, as detailed on our example use case page. Many of the detailed use cases involve looking for patterns in multiple event streams over different windows of time. They are also complex and dynamic situations with potentially high volumes of data which need to be processed with low latency
The field of Complex Event Processing has been studied since the 1970s. Our feeling however was that nobody offers a user experience for developers and power users to be able to create these processors in a simple and agile manner. We therefore set out to provide this platform to support event processing for power users in all kinds of industries.
Events are pushed or pulled into Timeflow from various diverse source systems and websites, and then processed as an unbounded stream. The aim is to perform analysis on these streams, such as the following:
Filtering for events which match certain pre-defined conditions, such as a large credit card transaction or a speeding car;
Transforming and enriching the events with additional data for subsequent processing or analysis;
Routing events to different processors and endpoints dependent on their content;
Mapping data for lookups and extract, transform and load scenarios between different systems;
Maintaining history to identify when a certain class of event has occured within a certain time window, such as three failed payment attempts in an hour;
Joining and analysing events across data streams to understand when a number of correlated events have happened in a time window;
Analysing events and groups of events in real time for their properties and variation over time;
Identifying events and combinations of events which together represent an anomolous situation which requires subsequent investigation by an automated process or human;
Adding AI/ML capabilities such as sentiment analysis, forecasting, decision trees, clustering, regression and statistical functions to analyse the stream in real time;
There are various integration technologies which can perform basic ‘stateless’ processing and message transfers. However, Timeflow adds much more capability beyond this which allows us to move into the realm of complex event processing and perform sophisticated real time analytics.
Successfully running sophisticated stateful computations with full accuracy, resilience and low latency is not a trivial task. The Timeflow platform aims to wrap this complexity and make it easy for the power users across the busniess to build these applications.
Open Source & Open Architecture
Though Timeflow can be used by semi-technical business people, it is built around leading open source technologies which can be integrated with in various ways. For instance, you may wish to integrate with Kafka, the messaging streaming technology which underlies Timeflow, or Apache Druid, the analytical database we use to store and query event data in real time.
We expect that this integration would be a natural evolution as more use cases and applications are built on top of Timeflow within your organisation. We encourage and support this.
Data Event Store Based On Apache Druid
Many of the stream processing services in the market attempt to store state in memory, and sometimes back them by a non clustered data store. This means that state is spread across the cluster and is not easy to query, adding complexity for developers to get the right data to the right nodes, and adding considerable complexity in failure situations to ensure that exactly once processing is maintained.
Typically, stream processing solutions also do not attempt to be a store or data warehouse for your events, leaving that to the user to provide resilient long term storage for anallysis of event history.
Timeflow takes a different approach. We store data more centrally in a unique database called Apache Druid. Druid is is uniquely suited for event storage and complex event stream processing for the following reasons:
Druid is designed from the ground up for real time analytics against highly changing datasets;
The primary column by which Druid queries is time, making it highly efficient for stream analytics over recent and time ordered events;
Data is stored in column format within Apache Druid, meaning that only columns that need to be queried are loaded into memory. This is highly efficient;
The concept of a continually evolving set of append only events is core to the Druid architecture and philosophy;
Druid is freely available and open source for the standard version. Druid has a thriving and growing community supporting it’s adoption and knowledge sharing;
Imply are the leading commercial sponsors of Apache Druid providing a range of products, services and support;
Druid is highly suitable to be ran in a cloud environment, for instance by being resilient to failure and having ability to scale up and out in line with demand;
The power and stability of Apache Druid allows us to build a much simplified user experience for building correct stream processors and analytical applications. It also provides a highly performant event store upon which we can analyse and build applications over our streaming event data.
Maintaining Correct Event Processing Semantics
Building stream processing applications sounds simple in theory, but building applications which are highly accurate and resilient with low latency and scalability, even in the case of failing components is very complex. The following elements need to be accounted for:
If we are processing a stream of events it is important to never drop a message, and never double send a message;
It is relatively simple to develop stateless processors which do things such as filter out, route, or add detail to messages. However, the complexity grows when we want to look for historical patterns such as “3 failed credit card transactions in the last hour.”;
We need to handle time with correct semantics, ensuring that ‘event time’ is dealt with rather than ‘wall clock time’. This needs to take into account out of order and late arriving events;
Failure is a reality in software. If you were to lose a server, a processor, a data node, the system needs to ensure that the correct semantics are followed;
Event processing can create enormous volumes of data which is difficult to ingest and analyse. In the realm of stateful complex event processing, we also need to store some of this in memory and at processing nodes;
It is important to maintain complete security around personally identifiable and commercially sensitive data.
The Timeflow platform places very high value on correctness and accuracy of our event processors, even where this means sacrificing latency in order to perform additional integrity checks. Our aspiration was designed such that we would be confident enough to let fly an airplane, with the development team sat on it.
Scalability, Low Latency and Security
Stream Processing is a data intensive task, potentially requiring thousands of events to be processed in parallel and with low latency. We therefore designed for massive scalability from day 1. All components of TImeflow auto-scale, so as load increases the application responds instantly. This is acheived through technologies such as cloud autoscaling and container orchestration.
Timeflow has been designed to be low latency, typically responding in single digit milliseconds. This is suitable for the vast majority of business scenarios.
Finally security is of course fundamental to everything we do. Where possible we obfuscate personally identifiable information within the system. We also encrypt all data in flight and at rest.
From Batch ETL To Real Time
Moving away from point to point integration and batch ETL of data to real time intelligent analytics that actually impact your business processes has huge potential impact for businesses looking to build better customer experience.
By making stream processing available to more organisations and to citizen developers within those organisations, Timeflow is truly a game changing platform.