The sign of maturity for any new technology is when it goes from being a nice-to-have to a must-have. Data observability falls in that category. It serves the critical use cases of detecting anomalies for data quality and data pipeline reliability.
Data observability is a key component of DataOps which aims to improve the speed and quality of data delivery, enhance collaboration between IT and business teams, and reduce the time and costs associated with data management. DataOps helps organizations make data-driven decisions faster and with more confidence, as it provides them with a unified view of data across the entire organization and ensures that the data is accurate, up-to-date, and secure.
As we enter a new era defined by two major shifts — slowing economy and the rise of data products, new data observability use cases have emerged. These pertain to measuring and improving business productivity (DataBizOps) and controlling costs of data pipelines (DataFinOps).
As data observability matures, we expect it will include more use cases, like data security monitoring and protection which will lead to a comprehensive metadata platform. Figure 1 shows the current scope of this concept.
This paper focuses on DataFinOps and DataBizOps.
Why do we need DataFinOps?
Data is a victim of its own success. Data has gone from being derided as the “exhaust” of applications to be kingmaker. It has gone from being used by a few specialized analysts to a wide range of consumers who throw ever new use cases at it. In addition, the average number of sources that generate data has also increased significantly. Imagine, we have gone from a few dozen data sources, like ERP and CRM, all within our firewalls to an explosion of SaaS products. Recent studies show that enterprises on an average used 110 SaaS products and large companies now have close to 500.
On one hand, we are dealing with a larger number of data sources, more data consumers, and more use cases, but, on the other hand, we’ve also had a stampede of new tools and applications processing our data. These tools are no longer confined to our firewall-protected environments, but are found on the edge, in private clouds and in public clouds. In short, the data pipelines have grown complex and unwieldy. Needless to say, organizations are now keen to understand the financial metrics of building and operating data and analytics workloads.
This scenario is the very essence for the genesis of the data observability space. Data engineers monitor the data and the pipeline to quickly detect and analyze issues pertaining to quality and reliability issues. However, the cost to run the pipelines has never been in their scope. Until now.
The economic downturn of 2023 has hit the tech sector the hardest. As infrastructure costs rise, management is keen to prioritize their spend. But, how does one prioritize without a clear understanding of the costs incurred at various stages of a data pipeline? In the past, organizations estimated their budget and allocated capital expenditure (CAPEX) and operating expenditure (OPEX) towards their new IT initiatives. Cloud computing has upended this model by shifting most costs to OPEX. With this shift, many teams find out that the money they had allocated for the entire year has been consumed in just the first quarter. In summary, we are now dealing with two problems — rising costs and unpredictable costs.
Surprisingly, studies have shown that almost 30% of the cloud cost is simply wasted. This is low-hanging fruit for a team looking to pare down redundant expenses. It is relatively easy to use native cloud provider cost management tools to identify unused instances and shut them down. This itself can save significant amounts of expenses. But it gets harder to pinpoint inefficiencies in complex SQL queries, ML training and data transformation workloads. Hence, DataFinOps needs a more concerted effort.
Data engineers embracing the “shift-left” approach can yield a much more proactive approach to cost containment. Sometimes rising costs are a harbinger for incorrect data or pipeline defects. So, the DataFinOps use case of data observability complements its other more well-known use cases of data quality and pipeline reliability.
Some questions that the DataFinOps capabilities of your data observability product should uncover include:
- What parts of the pipeline are experiencing higher costs than the historical trends?
- What will be the impact of disrupting the job causing runaway cost?
- What are the cost trends, and when will we exceed the budget?
- Will the cost saving offset the cost of procuring and running the data observability tool?
- Will it provide fine-grained cost breakdown into jobs and their consumption?
Data observability tools often struggle to get traction from prospective buyers due to the difficulty of demonstrating a clear return on investment (ROI). The DataFinOps use case is a unique aspect of this category as it is easier to calculate its ROI nearly instantly.
What is DataFinOps?
The FinOps Foundation has defined best practices for cloud financial management (CFM) practices. When these principles are extended to data management, we call it DataFinOps. The financial metrics are tracked and monitored like data quality and performance metrics. And the data observability system provides recommendations on improving cost efficiency.
DataFinOps is a set of processes used to track costs incurred across the data stack and by different users and teams with the goal to optimize the overall spend. It provides a granular view into how costs are being incurred, ideally, before they impact budgets and forecasts. Using the data observability capabilities of monitoring, alerts, and notifications, proactive actions can be taken to ensure that data pipelines are running efficiently.
Figure 2 shows DataFinOps capabilities.
As the figure shows, cost consumption can be granular or aggregated for:
- Resource consumption, like Snowflake, Databricks, Google BigQuery, Hadoop, Kafka, etc.
- Usage (users and departments), like business analysts, data engineers and data scientists
- Workloads, like data preparation pipeline, data quality, data transformation or training models
A DataFinOps platform should provide a consolidated set of capabilities:
- Observability (or discovery) of the costs and spend with an ability to tag and categorize costs based on the organization’s desired taxonomy. Users should be able to track, visualize and allocate costs precisely with workload-aware context.
- Optimization capabilities include budgeting, forecasting and chargeback of costs. It helps to eliminate inefficiencies and optimize cost. Most tools today use AI to predict capacity rightsizing and automation to trigger preemptive corrections.
- Governance capabilities include recommendations on cost reduction strategies and opportunities. It helps to identify guardrails needed to control and prevent cost overruns. This could be across infrastructure resources, configuration, code or data. For example, datasets with low usage could be candidates for migrating to a cheaper storage tier.
When data observability is used for quality or pipeline reliability use cases, the personas are business analysts, data engineers or data scientists. For the DataFinOps use case, the primary personas typically were executives, such as the head of the cloud center of excellence or the CIO. These executives work with chief data officers (CDO) or the head of analytics, who are responsible for managing data assets.
Interestingly, the fiscal discipline is now being pushed down to data engineers and business partners. Using the concept of ‘shift left’, cost becomes a consideration for new applications as they are designed. DataFinOps practitioners inculcate a culture of financial discipline and facilitate collaboration between business, engineering, operations, and finance teams. They help improve cloud efficiency across various business teams through cost and efficiency metrics, such as resource utilization and trend analysis. FinOps, hence, becomes a part of data literacy and culture.
Challenges of DataFinOps
Cloud computing is celebrated for developing reports and dashboards infinitely faster. Within a minute, we can spin up a virtual warehouse in Snowflake and use its Zero Copy Clone to analyze the data. Similarly, we can use Terraform to spin up cloud resources on demand. But this ease results in a complex web of cloud costs.
Take the example of Accenture, which has over 1,000 teams using AWS. Its cloud bill runs into tens of millions of lines of billing data. The reason is that there are so many cost elements in the cloud, such as compute instances, compute transactions, storage, storage I/O, data transfer, support, networking, monitoring, disaster recovery, and various other costs.
The second reason for cost complexity is because of the vast expansion of choices in the cloud. For example, as of February 2023, AWS offers 536 instance types running Linux and 427 running Windows. This leads to the challenge of calculating the cloud’s total cost of ownership (TCO) and ROI. Hence, an automated tool, such as data observability is needed.
Another challenge in data concerns lack of ownership of data. As data journeys from the producer and goes through various transformation steps, it is unclear who owns the data. This lack of ownership leads to a lack of accountability for cost overruns.
Finally, cloud providers allow alerts to be set up, but sometimes by the time the administrator can take action, the damage is done and the high cost of the erroneous query is now on our books. Hence, a proactive DataFinOps product is needed. This tool should also intelligently manage alerts and avoid the problem of ‘alert fatigue.’
As seen in the first figure in this document, the purpose of DataBizOps is to demonstrate value from our data assets. It is a set of metrics that help in calculating productivity and reducing cost. For example, an organization may have built hundreds of data artifacts — reports, dashboard, views etc. However, by analyzing their usage, businesses can retire the whole set of processes leading up to unused artifacts and save cost.
The newest use case of data observability, DataBizOps, is often related to the rise in creating data products. “Data as a product” was introduced as one of the four principles of data mesh while Data products turns the principle into business-outcome-driven consumable entities like a report, a dashboard, a table, view, an ML model or a metric. DataBizOps gathers “data telemetry”, like the frequency of data product releases, usage of data products and anomalies in the use which can lead to other indicators, such as poor data quality.
Data products take a business-first approach as opposed to the prevailing technology-centric bent of producing data artifacts. In fact, this approach can help mitigate some challenges and frustrations of the modern data stack. For example, currently delivering outcomes involve a tedious and complex collection of pipelines that lead to increased cost, effort and time. These hard to debug processes cause reliability and downtime issues, thereby requiring data observability. Another example is that the poor adoption of data catalogs as we start by trying to collect and tag all the data available. This ‘boil the ocean’ approach often fails and reduces trust in the data governance initiative. If we instead catalog the data for the data products being built, we will have a better chance of a successful data governance outcome.
So, what is the role of DataBizOps?
DataBizOps can help establish data’s ROI through metrics, like the number of data-related goals and objectives met, and the level of data consumer and management buy-in. It can help support the data strategy within the organization. The best part is that the data strategy is now being driven by the business strategy. They can be in perfect alignment, thereby maximizing the potential of corporate data assets.
An organization can benefit from DataBizOps:
- Launch a data marketplace and collect metrics that will ensure that the data is being used for the intended purpose, by the intended consumers, and within the established guardrails. DataBizOps can augment the data producers’ data governance efforts by extending them to data shares and exchanges.
- Strengthen the DataOps processes by providing the necessary telemetry so that intelligent decisions can be made for automation, testing and orchestration. Today, many of the orchestrators are rule-based and act like a complex collection of CASE statements. However, DataBizOps can feed contextual information, so the orchestrators can make a fine-grained assessment of the next steps.
This is a brief overview of the potential of DataBizOps capabilities of data observability. Although some vendors are developing standalone products to provide the above-mentioned features, it should be part of the comprehensive data observability product.
DataFinOps refers to the set of practices, processes, and technologies used to assess and manage costs of the organizations’ data assets. It aims to provide a streamlined and efficient approach to managing the data lifecycle, from collection and storage to analysis and decision-making. As the case studies reflect, it helps in reducing costs and improving efficiency.
DataBizOps is the newest member of the data observability space. It has the potential to finally help establish data’s ROI.
As we end the chapter, we should note that various use cases of metadata should be unified, as mentioned before, into a common metadata plane. These use cases are depicted in Figure 3.
DataFinOps and DataBizOps metrics can help integrate the overall metadata management initiatives so that our assets have high quality, reliability and security, and developers are more productive. A comprehensive metrics-driven metadata management plane can reform a data engineer’s role from being reactive to becoming proactive. In addition, the data teams integrate into the rest of the business and deliver on the strategic imperatives. This will help with mainstream data observability adoption in an environment where the data and data and analytics workloads are constantly exploding.
The unified metadata plane will avoid metadata silos. As common metadata standards are developed, this vision becomes a reality.