Message in a Bottle-Stream Processing

3d illustration of a constricted and narrowed artery and the blood cannot flow properly called arteriosclerosis

I couldn’t resist the reference to “a message in a bottle”. The name is good but a better reference would be data flow as a life blood flow. Unfortunately, that is probably not a great title.

In 2018, I had the opportunity to be hosted in Sri Lanka by WS02 for a dive into their technology stack (both present and future). One of the applications that caught my attention at that time was their open source WSO2 Stream Processor. That led me into investigating more deeply into Stream Processing technology, methods and use cases.

Stream Processing is a big data method and technology. It also falls into the category of “event driven design”. A continuous stream of data is queried, identifying and reacting to defined conditions within a range of time (milliseconds). This is WSO2’s definition which is a pretty accurate encapsulation.

Here is another useful definition:

Stream processing is the processing of data in motion, or in other words, computing on data directly as it is produced or received.

This post is not meant to be a deep dive but to be an overview of the theme.

Use Case Overview

Recently I have had the opportunity to work in the health care space and in the logistic space; a key part of both projects leveraged this technology.

Predictive patient care for healthcare and in the logistics company; supply chain optimization, traffic monitoring and route optimization. Both of these domain-driven designs had streaming processing at their core.

Other use cases that I have been involved with:

  • Oil Company Spot Trading
  • IOT Equipment Monitoring
  • Auction Applications
  • Manufacturing Production Line Monitoring and Optimization

Additionally, it has cross cutting services application, from complex system security and intrusion monitoring to predictive performance monitoring.

Deploying Stream Processing

One of our solution architects and I were introduced into a client who had “rolled their own” streaming process application. They had routed streams into RabbitMQ as events, coded event topics and published the results. This is a simplification of what was involved but it worked, however it was labor intensive to design, deploy and maintain.

Now in 2021 it is simplified because there is a rise in the adoption of platforms for stream processing. These products capture the data, route it to its domain specific logic (called actors), orchestrate the flow, handle performance scaling and provide error handling.

So what is the point of streaming data through a processor as events? There is a whole catalog of ever expanding tools and techniques for consuming this data in useful ways. Central to them is the concept of Streaming SQL.

Streaming SQL language runs streaming queries against the streaming data. This is a continuously running process whose output is itself a stream. This output stream is filtered for “events” configured to be acted upon. Those resulting events can either trigger a service or be published for subscribers. The event streams are processed directly, and only a meaningful subset from the data is subsequently persisted.

 Each stream process maintains its own database and state this result in a decoupled and more cohesive design. For example:

  1. A certain product is viewed online and a second related product is then examined, those events are added to a persistent layer so that the merchandizing analytics can have access to that correlation and an advertisement can be targeted at the purchaser.
  2. A traffic accident causes a slow down of a delivery vehicle the data could be used for tracking the frequency and timing of that type of event and to handle a series of late deliveries.

The elimination of reliance on a heavy centralized database coupling infrastructure and data handlers in an anti-pattern for microservices.

Stateful stream processing joins the database or key/value tables and the event-driven application or analytics logic into an efficient unit.

If you would like to dive deeper into this subject just comment below and let me know.

David deBoisblanc is the managing partner at Duczer East. David has 25 year history in the software and systems engineering domain.