DataOps’ Rising Importance: What Does Gartner’s Market Guide Tell Us?
Gartner™ first published its Market Guide for DataOps in December 2022 and recently refreshed it in August 2024. This blog explores key learnings from both reports. It provides my perspective to help businesses understand the significance of this category and how they can incorporate DataOps best practices in their organizations.
Note: The Gartner Market Guide is only available to subscribers, but you can download it from this location, courtesy of DataOps.live.
DataOps Defined (& Why It Matters)
Before we jump in, let’s start by defining DataOps by going straight to the source. According to Gartner:
“The collaborative data management practice focuses on improving communication, continuous integration, automation, observability and operations of data flows between data managers, data consumers, and their teams across the organization.”
This space is witnessing a meteoric rise because data pipelines are getting increasingly complicated, as the demand for data for new use cases by new data consumers is growing precipitously. Most data consumers crave fresh data for up-to-date decision-making. Also, the latest use cases often don’t just rely on structured data but involve a cornucopia of data sources with data in varying formats.
All this data must be integrated, transformed, enriched, tested, secured, and prepared for consumption. Complexity increases further as the various data teams operate in silos and use disjointed tools. This leads to the need to have automation, transparency, collaboration, and repeatability in delivering data outcomes. Hence, there is the rise in the importance of incorporating DataOps practices in data management.
The rise of the DataOps approach tacitly acknowledges that the focus on technology is not resolving data teams’ struggle to deliver trusted insights. Hence, the focus must shift to the processes, tools, and people acting upon data.
This may be why Gartner’s Strategic Planning Assumption stated, “By 2026, a data engineering team guided by DataOps practices and tools will be ten times more productive than teams that do not use DataOps”.
DataOps Capabilities
Software engineering has achieved an efficient software development lifecycle by closely integrating software development and IT operations teams through DevOps. Now, the same DevOps principles are being applied to data and AI projects in a space known as DataOps.
DataOps helps deliver projects faster and more cost-efficiently. Like DevOps’ “shift left” concept, DataOps helps move activities like testing, security, and quality assurance earlier in the development process. This approach finds and remediates problems earlier during the data production process so we don’t encounter errors in front of our customers.
Maturing DataOps practices help deliver “data products” better, faster, and with higher quality, which, in turn, makes data more accessible, understandable, trusted, and reusable. They also bring a product management philosophy to data teams and provide a single point of accountability. Akin to microservices, data products solve specific business problems and provide an unprecedented opportunity to measure data teams’ productivity. This has been an elusive goal thus far for many CDOs.
Gartner defines five essential DataOps capabilities in Figure 1 and, in the report, tags two of the five capabilities — orchestration and observability — as “must-haves.”
Figure 1: Key DataOps capabilities based on Gartner’s definition in the Market Guide for DataOps Tools
According to Gartner, the primary consumers of DataOps are technical roles like data engineers and data architects.
Let’s examine each capability.
- Data Pipeline Orchestration
Data pipeline orchestration is used to safeguard the complex journey of data from raw form to actionable insights. It constitutes a systematic approach to automating, coordinating, and monitoring the execution of data pipelines within a unified platform. By centralizing control over diverse data sources and transformations, this methodology streamlines the management of intricate data workflows, enhancing efficiency and reliability. Key functionalities encompass connector management, workflow impact analysis, and comprehensive audit trails, facilitating the delivery of accurate and timely data products.
- Data Pipeline Observability
Data pipeline observability is a critical component of modern data management that involves the continuous monitoring and analysis of data pipelines to ensure optimal performance and data quality. By leveraging real-time and historical metadata derived from orchestrated pipelines, organizations can effectively identify and address data anomalies, trace data lineage, and assess data usage patterns. This comprehensive approach, which incorporates monitoring, logging, and business rule detection, empowers data teams to optimize pipeline efficiency, prevent data quality issues, and ultimately deliver higher-quality data products.
- Environment Management
Environment management within the context of data pipelines involves the systematic creation, maintenance, and optimization of distinct environments for pipeline development, testing, staging, and production. By leveraging infrastructure-as-code principles, this process automates the deployment of resources, standardizes environment configurations, and ensures consistent execution across all pipeline lifecycle stages. Equally important, this approach minimizes manual intervention, reduces errors, and accelerates time-to-market for data products.
- Data Pipeline Test Automation
Data pipeline test automation encompasses a suite of practices designed to rigorously evaluate data pipeline functionality. This includes executing simulated pipeline runs (dry runs), verifying business rules, managing test scripts, creating regression test suites, and curating appropriate test data. These activities collectively ensure data pipeline reliability, accuracy, and performance.
- Data Pipeline Deployment Automation
Data pipeline deployment automation refers to the systematic application of CI/CD principles to the management of data-related components throughout their lifecycle. Pipelines deteriorate multiple times during their lifecycle. Upstream data changes, such as newer fields being added or logic being adjusted, may cause this deterioration. Tracking changes over time can provide insights into the data’s journey over a longer duration at both the individual data pipeline and dataset levels.
This capability encompasses the automated control of versioning, integration with development and operations processes, and the orchestrated release of data pipelines, subject to appropriate change management approvals. By streamlining these activities, organizations can accelerate time-to-market for data products while maintaining rigorous quality standards.
Vendors Mentioned
Gartner has introduced a taxonomy, shown in Figure 2, to classify the twenty vendors mentioned in the Market Guide. They repeatedly call out that the list is representative and “does not imply an exhaustive list.”
Like the more well-known Gartner Magic Quadrants, readers should use the Market Guide as a baseline resource but perform their own research to select vendors that meet their unique requirements.
It should also be noted that many platforms have built-in DataOps capabilities. Still, Gartner, presumably, has yet to include them in this list because those capabilities are not for general purposes. An example of one such exception is dbt Labs.
Figure 2: DataOps capabilities vendors and a representative list of vendors (in alphabetical order)
Market Guide Analysis
Interestingly, Gartner’s definitions of DataOps capabilities between 2022 and 2024 have stayed the same. In both reports, Gartner calls this space “emerging”. Contrast that with the adjoining space of AI, where the market reinvents itself every few months. The first report was released a few weeks after ChatGPT launched. Does it mean that the pace of data management isn’t keeping up with the rapid transformation in AI or that the data space needs to be less swayed by changes since it is so foundational?
In our view, data management disciplines, like DataOps, are about the “non-functional management and control” of data processes and pipelines. So, the capabilities are pretty constant and not subject to radical and continual change. This is in contrast to “functional capabilities”, like data transformation, machine learning, and AI which are witnessing a high pace of change. Having said that, AI capabilities are enriching DataOps products. For example, AI co-pilots are making DataOps capabilities available to a much larger user community, as less technical knowledge and programming skills are needed to use them.
Twelve vendors have maintained their placements in both reports. The vendors that were mentioned in the 2022 report, but have dropped out of the 2024 report include: Atlan, CluedIn, Cognite, Databricks, HighByte, Hitachi Vantara, Software AG, and Tengu. New entrants, Acceldata, AWS, Dagster, Datagaps, Google, Microsoft, Prefect, and RightData have replaced these eight vendors in the 2024 report.
These changes validate the critical role that comprehensive platforms play at the cost of niche players. Almost all (seven of eight) vendors in the generalist category are repeat entries. The reason is that they offer many of the five capabilities highlighted by Gartner and are the most comprehensive platforms.
The Orchestrator category consists almost entirely (seven of eight) of either cloud hyperscalers (AWS, Google, and Microsoft) or focused orchestration vendors (Astronomer, BMC, Dagster, and Prefect).
This tells us that the trend is moving away from accumulating point solutions to investing in tools that provide more comprehensive capabilities. Not all that surprising either, given these integrated tools help drive productivity, innovation, and operational efficiency.
The Future of DataOps
Social media has been rife with trite posts suggesting that companies that have failed to meet the goals of BI should not dabble in AI. However, this isn’t the right way of looking at the problem. Failure to meet BI goals is not an indicator of expected failure to meet AI objectives. The problem lies elsewhere — in data management and the foundational disciplines and processes that get embedded with the adoption of a DataOps way of working! This is an example of causation being confused with correlation.
DataOps as a discipline is a crucial cogwheel in improving the chances of success with both BI and AI. A longer-term razor-sharp focus on building data and AI discipline, using DataOps, might be the real deal breaker between success and failure. Similar to the original big data focus was on the 3Vs of data — volume, velocity, and variety — we now have to deal with the same 3Vs but for data pipelines.
Like every other category, AI is making its mark in improving DataOps capabilities. Large language models (LLMs) are excellent at inferring relations based on context. Context-based reasoning is only improving in LLMs, and they can make DataOps tasks for orchestration, observability, and automation much more relevant to the tasks. DataOps can play a significant role in bringing agentic architecture to life.
Generative AI can immensely help data engineers and DataOps engineers, from pipeline creation to detecting changes in the upstream source data system. It helps them understand the impact of those changes, and create a plan to modify or update the affected pipelines. They can use a DataOps agent to generate the code diff for review, test the changes, and propagate changes to the production system. It can help detect anomalies and apply auto-correction at run time based on user-defined rules and context. Finally, DataOps, in conjunction with LLMs, can help with planning, reasoning, and executing sub-tasks with guardrails.
Summary
The key findings from Gartner’s 2024 Market Guide for DataOps highlight the growing importance of DataOps in managing the complexity of modern data pipelines and ensuring the timely delivery of high-quality data products. As data becomes increasingly vital for decision-making across organizations, DataOps emerges as a critical practice for improving communication, automation, observability, reusability and speed of delivery among and across data teams.
Key takeaways from the reports include:
- Rising Complexity of Data Pipelines: As demand for fresh, diverse data grows, data pipelines become more intricate. The need for DataOps practices is driven by the complexity of integrating, transforming, and securing data from various sources, often in different formats.
- Automation and Collaboration: The reports underscore the importance of automation, transparency, and collaboration in delivering reliable data outcomes. DataOps helps bridge the gaps between siloed data teams and disjointed tools, ensuring a smoother, more efficient data management process.
- Productivity Gains: Gartner’s Strategic Planning Assumption emphasizes the productivity boost DataOps can offer. By 2026, data engineering teams using DataOps practices and tools are expected to be ten times more productive than those that do not.
- DataOps Capabilities: The reports identify five essential DataOps capabilities: data pipeline orchestration, data pipeline observability, environment management, data pipeline test automation, and data pipeline deployment automation. Orchestration and observability are considered “must-haves” for effective DataOps implementation.
- Shift Towards Comprehensive Platforms: The analysis of vendors in the 2022 and 2024 reports indicates a shift away from point solutions towards platforms that offer more comprehensive DataOps capabilities. This trend reflects the need for integrated tools that drive developer productivity, innovation, and operational efficiency.
- AI’s Role in DataOps: Generative AI and large language models (LLMs) are playing an increasingly significant role in enhancing DataOps capabilities. AI can assist with tasks such as pipeline creation, detecting changes in upstream data systems, and automating corrections, making DataOps more relevant and efficient.
To conclude, the evolution of DataOps is not just a trend but a necessary response to modern data management’s increasing complexity and demands. DataOps platforms are the missing component to deliver the non-functional excellence needed around the increasingly complex data pipelines organizations are developing. As organizations strive to harness the full potential of their data assets, embracing comprehensive DataOps platforms alongside leveraging AI-driven enhancements will be crucial. These developments will ensure the delivery of high-quality data and position organizations to stay competitive in an increasingly data-driven world. As DataOps continues to mature, it will redefine how data teams operate, driving productivity, innovation, and — ultimately — business success.
To learn more, download the full copy, compliments to DataOps.live here.