Untangling the Streaming Landscape: The Rise of Unified Real-time Platforms
Manish Devgan, Roy Schulte, and Sanjeev Mohan
In our previous article, Unified Real-time Platforms, the authors collaborated to bring attention to an important, expanding category of software for event-based, real-time systems. In this research, we go deeper into the essential capabilities of a URP and how it differentiates from event stream processing (ESP) platforms, streaming DBMSs, and stream-enabled analytical DBMSs. Figure 1 shows a representative list of products that span these categories.
This research document consists of two parts — the first one is on our views regarding URPs and the second part pertains to an evaluation criteria to select a URP. Through our research, we hope to raise the awareness and profile of systems that make better real-time decision-making possible. For additional information, contact us at urpfeedback@gmail.com.
Where URPs are Relevant
URPs are motivated by escalating business demands for smarter operational applications that can leverage up-to-the-second data to make faster and better decisions. Streaming data from sensors, vehicles and other machines, cameras, mobile devices, news feeds, other external web sources, and transaction processing applications is proliferating. Real-time use cases create new revenue opportunities, help counter threats, and mitigate risks. For more discussion of where to use URPs and some examples of URP applications, see Four Kinds of Software to Process Streaming Data in Real Time.
In the past, organizations often had to settle for non-real-time batch or micro-batch approaches because achieving real-time latency was cost-prohibitive. The decreasing costs of processors and networks; the widespread adoption of streaming messaging middleware such as Kafka; and improvements in software, such as URPs, have combined to make real-time latency practical for a growing set of applications.
URP applications perform complex transformations and analytics on real-time streaming data to be used in conjunction with older data of record and reference data that can serve as contextual information including real-time features. They provide decision support to augment human intelligence or implement full decision automation with no humans in the decision process.
Generative AI has expanded the kinds of decision intelligence that can be applied. Gen AI can be used along with traditional predictive machine learning (ML) and symbolic AI techniques such as rule processing. The combination of decision techniques is “hybrid AI” (sometimes called composite AI). With URPs, we take it a step further into “real-time hybrid AI” because at least some of the data is current (real-time).
URPs address use cases that require either or both of:
- Low latency: the system must respond in milliseconds or a few seconds because the business value of the action decreases with delays.
- High volume: streams with many events per second and/or systems that continuously keep track of many entities because the volume of streaming data is typically much higher than that in traditional systems.
URPs are not relevant for offline data science or business intelligence applications that solely target non-real-time decisions.
URP Characteristics
By definition, all URPs provide the following three capabilities, as shown in Figure 2.
Application enablement
URPs incorporate application platforms that let developers build and execute real-time, analytically-enhanced, operational business applications. These may (1) execute request-driven business transactions (such as payments, orders, or customer inquiries); or (2) monitor and manage large networks of devices (such as trucking fleets, cell phone networks, manufacturing lines, or supply chains).
URPs include build-time development tools and run-time infrastructure for backend (data-facing) business logic with real-time analytics that may include rule processing, predictive and generative ML/AI, including retrieval augmented generation (RAG) capable orchestration.
Application enablement critical capabilities include:
- Pull-based: Synchronous, user-defined APIs, request-driven operations, including queries, updates, and other business logic (i.e., they are not limited to DBMS DML commands) through interfaces, such as RESTful APIs. An example includes looking up the customer profile when an IVR receives a call.
- Push-based: Asynchronous, user-defined, event-driven operations on data received through adapters to Kafka, Kafka-like, or other messaging systems; or webhooks, WebSockets, Server-Sent Events (SSE) or other asynchronous communication mechanisms. An example includes analyzing a new log file that gets added to an Amazon S3 bucket for PII detection to meet regulatory compliance.
- Asynchronous batch operations (optional): Examples include batch payments or processing order shipments from a warehouse.
- Data integrity (optional) mechanisms: These operations include support for atomic inserts, updates, and deletes of data of record with strong or eventual consistency, isolation, durability and other transaction semantics. These tasks may be performed by a combination of the messaging system and the URP’s application and data management capabilities.
- Process orchestration (optional): Engines to manage multistep processes that are triggered by input requests or by detecting threats or opportunities in input event streams. An example includes identification of anomalies in the input event stream that require repairs to be scheduled to handle a failing mining operation.
- End-user interface tools (optional): For the front-end of business applications, including reports, dashboards, or alerts on browsers or mobile or other devices.
Stream processing
Applications that are built on URPs are situationally aware because they ingest one or more kinds of continuous (unbounded) streaming data and perform real-time analytics on those streams. A URP supports many, and in some cases all, of the streaming capabilities of an ESP platform, such as window-based functions and adjusting for out-of-order and late-arriving messages, depending on the URP.
Data comes to a predefined query (typically a directed acyclic graph (DAG) data flow), in contrast to the traditional data system approach where a query comes to data after it has landed. The URP can implement this stream processing in an engine that is separate from the application enablement platform, but a separate engine is not required as long as the following stream processing critical capabilities are supported:
- Ingest high-volume streams from Kafka, Kafka-like, or other messaging systems, or other event-capable interface mechanisms.
- Process streaming data immediately after it arrives through user-defined operations that may include computing aggregates, joins, pattern detection, and general business logic.
- Sense conditions and trigger responses, using rules and other analytics on incoming streams to detect threats and opportunities, and then triggering responses to be executed within the URP or by an external actor.
Data Management
URPs store and provide access to real-time streaming data and other kinds of data using in-memory stores, persisted stores, or both. URP data stores can range from object stores and data grids to key-value stores, document stores, graph DBMSs, row-oriented transactional relational DBMSs, and column-optimized OLAP stores. A variety of indexing strategies to support high performance may be supported
Data management critical capabilities include:
- Manage new streaming data in memory to support stateful operations. Usually, URPs offer durability by persisting data on a nonvolatile medium. Some RDBMS-based products choose a combination of row and column stores to achieve the tradeoff of performance and durability between in-memory and on-disk stores.
- Store and provide access to historical reference data, state data, and old, previously streamed data (i.e., data from previous hours, days, months or years).
- Integrate closely with the application enablement and stream processing components (this is essential to the end-to-end high scalability and low latency of URP applications).
- Support hybrid or data tiering (optional) for better price-performance needs.
Positioning Relative to Other Streaming Infrastructure Products
There is considerable confusion in the market regarding the different kinds of products that support real-time analytics on streaming data, including (1) URPs, (2) event stream processing (ESP) platforms, (3) streaming DBMSs, and (4) stream-enabled analytical DBMSs. These products have overlapping capabilities and can be substituted for one another for some applications. All of these products are becoming more-widely used because of the proliferation of streaming data and the increasing business requirements for smarter real-time applications. Next we will drill down into the four categories to compare and contrast their characteristics.
URPs
As described above, a URP is an end-to-end platform, combining an application engine and data management capabilities, with provisions for handling real-time streams. Commercial URP products can be categorized as either general-purpose infrastructure software or domain-specific software solutions (which are built on embedded general-purpose, domain-independent, URP infrastructure).
A solution is a set of features and functions, an application template, or a full (tailorable) commercial off-the-shelf (COTS) application or SaaS offering that is focused on a particular vertical or horizontal domain. URP solutions are available for various aspects of customer relationship management (CRM); supply chain management; (IoT) asset management; transportation operations (trucks, planes, airlines, maritime shipping); capital markets trading; AIOps; and other application areas.
Infrastructure products offer general-purpose URP capabilities suitable for use in many industries and applications. They are technically a subset of solutions because the user company or a third-party partner must build the application from scratch, whereas solutions should need less customization.
ESP Platforms
The purpose of an ESP platform is specifically to process streaming data as it arrives, so it does not support interactive, request/reply application logic like an application platform would, nor general purpose long-term storage of streaming data, reference data, or business data of record. Nevertheless, it supports almost any kind of logic that a developer might want to apply to streaming data.
ESP platforms perform incremental computation on streams while the data is in motion and before it is stored in a separate database or file. ESP platforms keep some recent data in internal buffers (state stores) temporarily to support multistage real-time data flow pipelines (sometimes called jobs or topologies). They apply calculations on moving time windows of records (typically minutes or hours in duration), and may take checkpoints to enable faster restarts.
ESP platforms include Flink (from Aiven, Amazon, Apache, Confluent, Cloudera, and many other vendors), Arroyo (from Apache and Arroyo Systems), Axual KSML, Espertech Esper, Google Cloud Dataflow, Kafka Streams (from Apache and Confluent), Microsoft Azure Stream Analytics, SAS Event Stream Processing, Spark Streaming (from Apache, Databricks, and many others), TIBCO Streaming, and similar products.
Streaming DBMSs
Streaming DBMSs, such as DeltaStream, Materialize, and RisingWave, focus on the data management (storage and retrieval) aspect of applications. However, to accomplish this with streaming data, they also need to implement multistep internal pipelines to transform incoming streams in a manner somewhat similar to ESP platforms.
Some products are based on Differential DataFlow (DDF) concepts that incrementally materialize table views that are always current. As with ESP platforms, streaming DBMSs aren’t designed to support general purpose interactive, request/reply business logic or transaction processing applications like an application platform would.
Stream-enabled analytics DBMSs
Stream-enabled analytics DBMSs, such as Clickhouse, Druid, Imply, FeatureBase, Kinetica, Rockset, Startree (Apache Pinot), and Tinybird support OLAP-type analytics on streaming data immediately after it has been stored. They also often manage large sets of historical data. They support canned or ad hoc interactive queries with very low latency by leveraging a variety of data models and indexing techniques.
Stream-enabled analytics DBMSs are used for real-time and near-real-time operational decisions, often using clickstreams, IoT sensor data, or other common high-volume streams. Some of these products scale to very high volumes of queries per second (QPS). As with streaming DBMSs and ESP platforms, they are not a complete platform for interactive, transaction processing applications.
Summary of Streaming Landscape Options
Each of the four different real-time streaming options serve a different use case and hence are actively used across organizations. In this section, we will summarize how they differ using two approaches.
We summarize the key differences between these four product categories by mapping rows to the URP capabilities described in the section “URP Characteristics” above. Note that we are generalizing here, and some products in each category may have capabilities or may lack capabilities that are not reflected in our table. If your application needs a specific capability, you should validate this rather than relying on the table.
Do You Need a URP?
URPs are unique as they “unify” across the following three dimensions:
- Architecture consisting of ESP + RT DBMS + app enablement
- Async (ESP) + sync (DBMS)
- RT on data-in-motion + data-at-rest
If an organization has identified a business requirement for real-time streaming analytics in an operational application, the most common alternative to a URP is to assemble a DIY collection of multiple piecemeal technologies that approximates the services of a URP. A DIY infrastructure may combine an ESP platform with a high-performance streaming DBMS and application servers or frameworks such as Spring Boot, Quarkus, or Micronaut that enable self-hosting Java backend services that run in containers without application servers.
Compared to URPs, DIY generally has drawbacks such as
- Expertise, technical debt, time to value: Few organizations have the expertise to prepare DIY infrastructure for highly demanding workloads. DIY systems are complicated to design, develop, tune, and maintain because there are so many moving parts. There tend to be multiple independently configured and managed clusters so DIY projects generally have longer time-to-solution and incur more technical debt than URPs.
- Risk, latency, cost: Architects struggle to deliver very high volume or predictable very low latency because there are many places where network, memory space, and other technical boundaries are crossed. Each link across boundaries adds code path and latency. By contrast, URPs minimize processing overhead and latency by tightly integrating the components into a single offering. In high-volume cases, URP infrastructure often requires less hardware, and thus lower cost, than DIY architectures.
- Complexity, security: DIY projects that combine software components from multiple vendors tend to be even more difficult than DIY projects that acquire all of the software from one vendor. Furthermore, complexity is the enemy of security. Hyperscalers, including Amazon AWS, Google, Microsoft Azure, and a few other large vendors do provide all of the pieces for ”one-stop shopping” and have partly integrated their relevant software products through the use of common tooling and metadata management facilities. They have also generally tested their multiple software products together which helps. For example, Google’s Dataflow (based on Flume with Apache Beam SDK) integrates with other services like BigQuery, Vertex AI, Bigtable, Cloud Functions, and Looker to serve real-time-application building. Single-vendor suites partly ameliorate DIY development challenges although they still generally fall short of well-integrated, off-the-shelf URPs.
Nevertheless, more custom-built, streaming operational applications are currently in production on DIY infrastructures than on URPs. However, we expect that URP adoption will continue to expand, particularly for situations where off-the-shelf URP application solutions are available (see below) and for extreme high-volume/low-latency problems.
It is important to note that URP products are quite diverse in their internal architectures and in their intended purposes. Although all URPs have the same essential characteristics described above, URPs are best understood as a general architectural pattern, not as a single class of product. URP products compete in many disparate markets, serving disparate vertical and horizontal applications. Large organizations will eventually have multiple different URPs for the same reason that they already have multiple kinds of DBMSs for different applications.
How to Select a Unified Real-time Platform
In the second part of this research document we describe thirteen essential capabilities of URPs. The purpose of this section is to help organizations find the URP that is most appropriate for their needs after they determine that a URP is relevant to their application. We begin by outlining the tradeoffs between using a URP versus building the application on a comparable “do it yourself” (DIY) assembly of separate software products or (cloud) services. We then explain seven criteria that apply to your URP product selection:
- Availability of off-the-shelf URP applications (“solutions”)
- Programming model and ease of application development
- Performance — throughput and latency
- Interaction patterns — event-driven and request/reply
- Process orchestration for situation responses
- End user interface capabilities
- Commercial acquisition considerations
URP Selection Criteria
As with any product category, there is no “best” URP, there are only URPs that are better or worse for any particular project. Figure 1 summarizes the evaluation criteria used in this document.
Before evaluating URP products, analysts and architects must develop a clear understanding of the project goals, business requirements, and constraints because every situation is different. The criteria below are roughly in order of importance for typical projects, but you need to make adjustments to reflect your unique situation.
1. Solutions or Infrastructure
As mentioned above, URP products can be categorized as either general-purpose infrastructure software or domain-specific software solutions (which are built on embedded general-purpose, domain-independent, URP infrastructure).
A solution is a set of features and functions, an application template, or a full (tailorable) commercial off-the-shelf (COTS) application or SaaS offering that is focused on a particular vertical or horizontal domain. For example, URP solutions related to CRM include Evam Marketing, Joulica Customer Experience Analytics, Scuba Analytics’ Collaborative Decision Intelligence Platform, Snowplow Behavioral Data Platform (BDP), Unscrambl Qbo, and ZineOne Customer Engagement Hub.
Infrastructure products offer general-purpose URP capabilities suitable for use in many industries and applications. URP infrastructure is offered by Gigaspaces, Gridgain, Hazelcast, KX, NStream, Pathway, Radicalbit, Scaleout, Timeplus, Vantiq, Volt Active Data, and other vendors.
Some vendors primarily sell solutions, although their underlying URP platform could technically be used for other applications. Other vendors sell platform infrastructure into multiple vertical or horizontal markets but may also sell URP solutions or partial solutions into one or two particular domains.
Wherever practical, you should buy a solution if you can find one that suits your situation. The age-old advice applies: buy before build. URP solutions generally require less work, have faster time-to-production, and have lower technical risk. Solution vendors have staff expertise in their respective vertical or horizontal domains and often have adapters to external applications, databases, or industry-specific data formats and protocols.
However, there may be no URP solution that fits your requirements. Also, in some cases, it actually takes more work to tailor a solution template or COTS application to your needs than it would take to implement the application from scratch on a URP infrastructure product (and, in turn, a URP infrastructure product will generally involve less time, expertise, and risk than a DIY infrastructure if your application really needs high scalability, low latency, or complicated analytics). Note that some applications have such extreme scale or latency requirements that an otherwise suitable COTS URP solution does not work, so URP infrastructure is the most practical approach.
2. Programming model and development tools
Most organizations give ease-of-application-development and analytics very heavy weight in their selection process because developer productivity is crucial to project cost and time-to-production. It is hard to find good developers for these kinds of applications, so many URPs offer tools at multiple levels of abstraction. For example, Cogility, Evam, KX, and Vantiq, among others, have invested heavily in multiple levels of tools. Some URPs support graphical tools or domain-specific languages (DSL) so that rules or similar parts of the application can be developed by business analysts or other less technical builders. Many URPs support Python or integration with Tensorflow or other libraries for implementing analytic logic that may include inferencing AI (predictive ML inference) or retrieval-augmented generation (RAG) generative AI-based logic. Most high-value use cases leverage real-time inferencing using ML models to continuously make autonomous decisions using real-time features. A few URP applications support continuous ML model training as the system runs. We see the same paradigm already being applied to fine-tuning language models. Many URPs provide SQL interfaces for implementing transformations and query retrievals.
Certain URPs predominantly support programming languages such as Java, whereas others favor rapidly growing languages like Python. The reason for the focus on richer support of a particular language may vary — it is sometimes driven by the strategic intention to serve the developers within a given market segment, rather than solely being determined by the language used to build the URP itself.
Virtually any URP can be used to implement a digital twin design pattern which is relevant for numerous URP applications. However, some URP vendors provide special tooling to make this better and easier. Examples of explicit digital twin support are offered by Aveva (PI), NStream, XMPro, and Scaleout, among others.
Most URP projects involve integration with previously running applications and databases, such as SaaS, packaged applications, or legacy applications (sometimes even mainframe
systems). Most URP solution vendors have multiple off-the-shelf adapters for integration, as do some platform infrastructure vendors, including Hazelcast, Gigaspaces, Gridgain, Radicalbit, Vantiq, and Vitria, among others.
Some products have introduced the ability to create vector embeddings on real-time streaming data to support retrieval augmented generation (RAG) pipelines and AI assistants or chatbots.
3. Performance
All URPs (solutions and infrastructure) provide good scalability and low latency because that is intrinsic to the URP value proposition. If a URP product can meet your foreseeable requirements for message rates (e.g, 50,000 events per second), the number of entities being tracked (e.g., 1,000 trucks, 1,000 cell towers, or 100,000 customers), and latency (e.g., 99% of responses in less than 400 milliseconds) then it doesn’t matter much if there are faster or more-scalable URPs on the market.
However, some applications, particularly in telcos (e.g., network monitoring), financial services (e.g., fraud detection), and certain asset management (“IoT”) scenarios, are dealing with extreme volume (millions or tens of millions of events per second) and/or require a single digit or low double-digit millisecond latency. These require URP products that are purpose-built with in-memory data management, business logic processing co-located in the same address space with data, one-thread-per-core architecture, or other high-performance design concepts. A related benefit may be that they require fewer cores and less memory than conventional systems. Hazelcast, Gigaspaces, Gridgain, NStream, and Scaleout are among the URPs known to offer extreme performance.
4. Interaction
Some URP applications support externally-triggered, request-driven business transactions (e.g., OLTP), while others focus on event-driven monitoring of big, complex operations to provide situation awareness and sense-and-respond interventions. It would be TL;DR to explain this difference in detail here, but it suffices to note that some request-driven use cases require transaction semantics such as exactly-once processing, concurrency control, isolation, persistence, strong or eventual consistency, or checkpoints for rapid restarts after failure. Some URP vendors, notably Hazelcast, Gigaspaces, Gridgain, and Volt Active Data, among others, support multiple integrity features because transaction processing plays a major role in most of their applications.
5. Process Orchestration
Projects that focus on monitoring operations detect current or predicted threats and opportunities and then use URP features to emit alerts, update end-user dashboards, or trigger automated responses through messages or RESTful calls. A few URP vendors, including XMPro, Vantiq, and Vitria, among others, go even further by also providing internal process managers that orchestrate longer-running multistep workflows, i.e., response sequences consisting of automated activities and/or human steps. URP projects that service transactional requests may also require multi-step sequences. If a URP does not supply process orchestration natively but multiple steps are required, developers can use an external business process management (BPM) or choreography tool to manage the actions.
6. End User Interfaces
URPs are primarily server-side, back-end platforms that receive event streams (typically through Kafka, Kafka-like, or other messaging subsystems) and service requests (typically through Restful API calls) from other applications. However, many URP vendors, including Cogility, Deephaven Data Labs, Evam, Joulica, NStream, Pathway, Scuba Analytics, Snowplow, Unscrambl, Vitria, XMPro, and ZineOne, among others, also supply front-end, end-user-facing analytical applications or dashboards, or tools to build such front ends. Alternatively, developers can build a front-end application using their preferred programming tools.
7. Commercial Acquisition Considerations
Of course, acquiring URP software or subscribing to a URP Platform-as-a-Service (PasS) involves the same considerations as any other software project. Your URP selection process must consider the viability of the vendor and its support practices, price, and terms and conditions. You may prefer self-managed software (on-premises or in a private cloud) or a vendor-supported PaaS, including the Bring-Your-Own-Cloud (BYOC) model where the vendor provisions the solution in the customer’s infrastructure. You should look at each vendor’s capabilities to ensure that your choice is available because many pure-play URP vendors are relatively small and don’t offer all the options.