Accelerate Workloads on Diverse Hardware And Analytical Engines: Embracing the Universal Data Processing Engine
Inspired by a recent episode of the “It Depends” podcast featuring Rajan Goyal, founder of DataPelago, this blog explores his vision for achieving accelerated computing that seamlessly integrates with existing data pipelines. This blog explains how DataPelago transparently leverages modern hardware infrastructures, including GPUs and FPGAs, to enhance performance without disrupting current workflows.
Open-source compute engines like Apache Spark and Trino have become indispensable in data processing. However, they were originally designed for simpler data landscapes than today’s AI-driven ecosystems. The emergence of real-time data processing, large-scale model training, and advanced analytics has revealed scaling and performance limitations in these traditional tools, which often struggle to meet the demands of modern, data-intensive workloads.
Replacing these foundational technologies entails significant costs and risks. Wholesale refactoring is neither practical nor desirable for most organizations. Instead, the focus should be on accelerating and optimizing these tools to keep pace with evolving requirements.
When performance becomes a bottleneck, vendors frequently turn to proprietary solutions while retaining compatibility with open-source APIs. A notable example is Databricks SQL, powered by the Photon engine. Acknowledging the performance constraints of Spark, Databricks reengineered critical components in C++, achieving significant performance gains. However, this approach introduces a degree of vendor lock-in, limiting community-driven innovation and flexibility.
To tackle these challenges, a Universal Data Processing Engine (UDPE) is proposed. This engine aims to:
- Harness the capabilities of modern, faster processors including GPUs and FPGAs, and future processor types.
- Enable transparent performance acceleration without requiring changes to application logic.
- Offer flexibility to leverage diverse hardware platforms for optimized processing.
The UDPE acts as a virtualization layer above the physical compute infrastructure, decoupling analytical engines from specific hardware implementations. This approach aligns with the broader industry trend of separating compute from storage, delivering a more efficient and adaptable solution for organizations navigating complex data-processing demands.
Introducing Universal Data Processing Engine
In the near future, every server will likely be equipped with chips capable of executing AI workloads cost-effectively, whether at the edge, in the cloud, or within private data centers. While NVIDIA has been the dominant player driving the adoption of GPUs, it is far from the only one innovating in this space. Traditional CPU manufacturers like Intel and AMD are advancing rapidly, while hyperscalers such as AWS and Google have developed custom AI chips like Trainium, Inferentia, and Tensor Processing Unit (TPU). Meanwhile, new startups are pushing the boundaries with specialized hardware, such as Groq’s LLM-native Language Processor Units (LPUs), offering ASIC-based solutions optimized for large language models.
The UDPE is designed to deliver:
- Lower latency by providing interactive and high-performance processing of all types of data — structured, semistructured, and unstructured, and for analytics and GenAI workloads.
- Enable new workloads such as model training and inference in a cost-effective manner.
- Lower total cost of ownership (TCO) by rightsizing clusters and improving utilization of the underlying infrastructure.
Figure 1 shows the role of UDPE in an intelligent data platform.
Figure 1. The universal data processing engine is a new approach for accelerated computing.
Benefits of the Universal Data Processing Engine
The concept of the UDPE is becoming prominent because of the seismic shifts in the data space due to open table formats that decouple storage and compute, driving the rapid adoption of lakehouses. A lakehouse is a data architecture that combines the best of data warehouses and cloud data lakes, providing a centralized repository for both structured and unstructured data while offering scalable analytics and fine-grained governance.
The rise of lakehouses is largely attributed to the adoption of open table formats, such as Apache Iceberg, Delta and Apache Hudi due to their support for ACID transactions, schema evolution, and time travel capabilities. This enables organizations to manage and analyze large-scale datasets efficiently while maintaining data integrity and reliability.
Open table formats provide a vendor-neutral metadata layer that manages information about tables, schemas, and partitions, opening up the data to be processed and analyzed by anyone in the organization with any engine. With the removal of this lock-in, the center of gravity shifts away from proprietary engines and towards diverse analytical engines. This paradigm shift enables insertion of UDPE without refactoring the whole data stack.
How does the Universal Data Processing Engine (UDPE) work?
To understand how the UDPE works, we turn to DataPelago, a company that recently came out of stealth. It recognized that organizations are already adopting a multi engine strategy that serves the needs of various personas. The personas may use SQL, PySpark, Ray, Presto/Trino, or a natural-language chatbot like ChatGPT to access their data.
While this is good news because it makes data accessible to anyone with the right authorization, it also raises the bar on providing exceptional user experience with interactive performance. Users expect their queries and tasks to return results fast and accurately. And business leaders expect improved price performance from their data infrastructure.
They designed their UDPE offering using these design principles:
- Minimize disruption to the existing investment. In the multi engine analytics approach, a new accelerated computing tooling should obviate the need for writing custom code that matches the underlying processor architecture. DataPelago UDPE abstracts the processor internals.
- Interoperable standards. The new engine should not require rewriting of reports and business logic. Supporting standards allows the new compute tools to integrate with the existing environment.
- Avoid data movement and migration. Accelerated compute should operate on the existing data and not require any migration or movement.
- Maximize processor utilization. While processors are getting ever more powerful, the problem is that engines like Spark are unable to fully address their potential. DataPelago’s goal is to utilize 80% to 90% of the capacity of the processors.
- Reduce TCO. Reduce the size of clusters needed to run the jobs and improve throughput.
Figure 2 shows DataPelago’s implementation of the UDPE.
Figure 2: Internals of DataPelago’s Universal Data Processing Engine (UDPE)
The UDPE is composed of three components that integrate within an existing data processing stack to provide maximum acceleration and cost savings with no disruption or changes. These components are implemented as pluggable extensions to the data processing stack, simplifying the insertion and operation of acceleration.
- DataApp extends the planning layer of common data processing engines such as Spark and Trino. DataApp transforms the physical plans generated by the parsing and optimization layers of these engines to a uniform intermediate representation (IR). This common IR is forwarded to the second component, DataOS.
- DataOS is responsible for efficiently executing concurrent tasks, each defined by its IR plan, by optimizing resource allocation and scheduling to best exploit the cost-performance characteristic of available heterogeneous computing elements. DataOS dynamically assigns individual data operations to specific computing elements by analyzing the efficiency of processing each individual operation on each element.
- DataVM is a virtual machine with a programmable instruction set architecture (ISA) designed for data processing on heterogeneous acceleration hardware including CPU, GPU, and FPGA. Data operations expressed in the IR are translated to a program of instructions in this industry-first domain-specific ISA by the DataOS.
DataVM executes these instructions by generating / compiling code and invoking software and/or hardware devices for the applicable computing element — CPU, GPU, or FPGA. Through these pluggable components, UDPE fundamentally enables data processing stacks to transparently execute on diverse accelerated hardware.
Adopting UDPE does not mandate any changes to the user experience. Users can continue to launch ETL jobs from a workflow engine, issue interactive queries from notebook clients, refresh dashboards with programmatic API calls, etc. as they normally do. Likewise, data platform operators can continue to use their existing tools to manage and govern the usage of data platform clusters.
DataPelago’s UDPE can be adopted by customers wherever they run their existing data processing platform and in their preferred deployment model. This allows customers to comply with their existing infosec security policies. The UDPE can be operated inside the customer’s account with their hyperscale cloud provider, where it runs as an integrated sidecar alongside managed Spark / Trino, etc. clusters. The UDPE supports virtual machine and container-based deployments. DataPelago UDPE can also be equivalently adopted for on-premises deployment.
Conclusion
The evolution of inference computing has demonstrated that smaller, domain-specific models can deliver more accurate results than their larger counterparts, which traditionally demanded substantial infrastructure. This shift is democratizing AI, leading to a significant increase in the adoption of generative AI workloads. Consequently, there’s a growing need to efficiently utilize a diverse array of hardware, including CPUs, GPUs, and FPGAs.
While AI cloud factories equipped with numerous advanced GPUs are rapidly emerging to meet the demands of new AI workloads, UDPE’s guiding principle is to efficiently handle these novel AI tasks. Simultaneously, it aims to enhance the performance of essential organizational analytics, including business intelligence (BI) dashboards, reports, data products, and data transformation processes for both BI and AI applications.
In response, the Universal Data Processing Engine (UDPE) offers data teams robust options to accelerate compute-intensive data and AI workloads, while providing vendor independence. Recognizing that existing data infrastructures were primarily designed for batch processing with relatively modest data volumes and scalability requirements, it’s imperative to rethink these architectures. However, this transformation must be executed seamlessly to avoid disrupting business users.
By integrating UDPE into their operations, organizations can achieve a balanced and efficient approach to managing both cutting-edge AI workloads and traditional data processing tasks, ensuring optimal performance across all facets of their data strategy.