Data Transformation Reboot: Use of Copilots in an Agentic Architecture

Sanjeev Mohan
6 min readJun 10, 2024

--

Generative AI shockwaves are disrupting the data and analytics landscape from top to bottom. From changes to the senior leadership of organizations to jobs and roles, to business processes. It then begs the question: could Gen AI also transform traditional data transformation processes?

The lifeblood of modern businesses is data. In today’s data-driven world, organizations are increasingly reliant on data transformation to derive actionable insights from raw data. However, the ever-growing scale and complexities of modern data infrastructure present a major challenge to delivering value from data fast. Traditional methods of data transformation, characterized by manual coding and rigid pipelines, are struggling to keep pace.

This report argues that the current approach to data transformation is outdated and broken, and proposes a new paradigm comprising data transformation copilots and agents — as the future to achieving intelligent data applications beyond traditional analytics. This report is inspired by Prophecy’s blog by Raj Bains and Maciej Szpakowski on the same topic.

The challenges of the legacy approach

The traditional data transformation approach relies heavily on custom code written by data engineers, which suffers from several limitations:

  • Slow and Error-Prone: Writing and debugging complex data transformation code can be a time-consuming and error-prone process. This leads to slow development cycles and delays in delivering timely insights to stakeholders.
  • Limited Reusability: Legacy code often becomes siloed and difficult to reuse for different transformations. This creates maintenance nightmares and hinders the development of standardized data pipelines.
  • Lack of Flexibility: Rigid, code-based pipelines struggle to adapt to new data sources and evolving business requirements. This inflexibility hinders efficient data management and exploration.
  • Skill Gap: The reliance on specialized data engineering skills creates a bottleneck, as the demand outpaces the available talent pool. This lack of expertise limits the scalability of data transformation efforts.

These limitations are amplified in today’s data-driven landscape, where businesses expect to leverage data for ever-increasing use cases. The sheer demand on data makes traditional approaches unsustainable and simply not equipped to handle the modern data deluge.

The future of data utilization is agentic

The realm of artificial intelligence (AI) is maturing rapidly, with new architectures emerging to tackle complex problems. Agentic architecture stands as a promising approach, employing a network of intelligent agents to achieve specific goals. Agents are autonomous entities capable of:

  • Perception: Gathering information from its environment through sensors or data feeds.
  • Reasoning: Processing information, drawing conclusions, and making decisions based on pre-programmed rules or learned models.
  • Action: Taking steps to influence its environment, potentially through actuators or by manipulating data.

These agents collaborate and communicate with each other to achieve a shared objective. Consider this example: An agentic system for e-commerce will consist of individual agents that deal with personalization, product data, customer behavior data, and purchase history. Disparate data needs to be integrated, aggregated, filtered, etc. Agents focus on core logic and are freed from the burden of data wrangling, allowing them to focus on reasoning, decision-making, and taking actions to achieve their goals.

In the 2024 Trends in Data and AI, the authors proposed an Intelligent Data Platform as a unified data and AI stack fronted by copilots and agents. The figure below shows the role of AI Copilots and agents.

Figure: Components of an intelligent data platform

In the next section, we explore the role of copilots that could automatically prepare, enrich, and transform data, providing agents with high-quality data to recommend personalized products to each customer.

What exactly is a copilot?

Copilots offer a novel approach to data transformation. Rather than replacing data engineers, they act as intelligent assistants, automating repetitive tasks and augmenting human expertise. Here’s how copilots are transforming the data landscape:

  • Automation & Efficiency: Copilots automate the generation of data pipelines by intelligently inferring transformations from data sources and business requirements. This frees up data engineers to focus on more strategic tasks, significantly boosting productivity.
  • Self-Learning & Improvement: Copilots learn from user interactions and historical data transformation efforts. This continuous learning allows them to generate increasingly accurate and efficient pipelines over time.
  • Declarative and Reusable: Copilots operate on a declarative paradigm, where users specify the desired outcome rather than the specific steps involved. This simplified approach leads to more maintainable and reusable code bases.
  • Increased Accessibility: The intuitive nature of copilots lowers the barrier to entry for data transformation. Less technical users can leverage their capabilities, democratizing access to data insights.
  • Reduced Talent Gap: Copilots empower existing data engineers to do more with less. They can handle routine tasks and guide users, alleviating the strain on a limited talent pool.

Copilots can learn from the interactions between agents and the environment, continuously improving their data transformation recommendations and refining the overall system’s performance. They can act as a communication layer between agents, facilitating data exchange and collaboration, ultimately leading to more sophisticated and effective agent behavior.

Benefits of using copilot

The value proposition of copilots extends beyond simply automating tasks. They offer several key advantages:

  • Faster Time to Insights: Automation and simplified workflows enable faster development cycles, leading to quicker delivery of data-driven solutions.
  • Improved Data Quality: Automated code generation and reduced human error contribute to higher data quality and accuracy.
  • Enhanced Agility: The ability to adapt pipelines automatically makes data transformation more flexible and responsive to changing needs.
  • Lower Costs: Improved efficiency and reduced development time translate to lower overall costs associated with data transformation.

The integration of data transformation copilots into agentic architecture presents a transformative opportunity. With the burden of data wrangling lifted, agents can reach their full potential, driving innovation and tackling complex problems in various fields.

Applying a traditional data transformation approach to this new paradigm is not tenable. Tracing the lineage of transformed data through multiple agents can become complex, hindering data governance and auditability. Here’s where data transformation copilots step in, offering a compelling solution.

Data transformation copilots can significantly enhance agentic architectures in several ways:

  • Automated Workflows: Copilots automate repetitive tasks such as data cleansing, schema mapping, and code generation. This frees developers to focus on agent design, logic, and strategic decision-making.
  • Democratization of Data Transformation: Copilots require less specialized expertise to operate, allowing domain experts and less experienced developers to contribute to data pipelines within the agentic system.
  • Improved Data Quality and Consistency: Copilots can identify and address data quality issues proactively, ensuring cleaner data for agents to work with and reducing the risk of errors.
  • Enhanced Scalability and Agility: By automating routine tasks, copilots enable agentic systems to handle growing data volumes more efficiently and adapt to changing data sources as needed.
  • Transparent Lineage and Governance: Copilots can automatically track the transformation steps applied to data, facilitating data lineage tracing and ensuring accountability within the agentic architecture.

By automating repetitive tasks, improving data quality, and fostering collaboration, copilots are ushering in a new era of efficiency, agility, and democratized data insights.

The future of copilots

The future of data transformation lies in collaboration — between humans and machines, between different data tools, and across diverse data ecosystems. Copilots should support open file and table formats like Parquet and Iceberg. In addition, they should support common languages like Apache Spark, Python, and SQL. Copilots should also integrate with the DevOps systems including Git and CI/CD.

Copilots should foster interoperability across various cloud data platforms, such as Snowflake, Databricks and hyperscalers’ data ecosystems. By adhering to open standards and prioritizing interoperability, copilots can break down data silos, enabling a collaborative and efficient data transformation ecosystem.

Copilots should automatically capture and enrich metadata during data transformation processes, delivering transparency through lineage. This metadata should be available to external data catalogs and governance products through well-defined APIs. Effective metadata governance ensures clarity and transparency in data pipelines, empowering users to trust the data generated by copilots, and make informed decisions.

Data observability focuses on monitoring the health and performance of data pipelines. Copilots can enhance data observability by tracking performance metrics and displaying them visually. Copilots should allow users to set up alerts to notify users of potential issues within the data pipelines or copilot performance deviations. In an agentic architecture, which has the ability to perfotm multi-step reasoning, copilots should also initiate necessary tasks to recommend fixes and remediate anomalies.

Conclusion

The current approach to data transformation is increasingly proving to be outdated, unable to keep pace with the demands of an integrated and intelligent data ecosystem. Copilots represent the future of data transformation, offering enhanced reliability, efficiency, integration, error reduction, and agility. By leveraging AI-driven copilots, organizations can transform their data transformation processes, unlocking new levels of productivity, accuracy, and insight. As the data landscape continues to evolve, embracing these advanced solutions will be critical for organizations seeking to stay competitive and drive innovation in the digital age.

Copilots are not a silver bullet, but they represent a significant leap forward in data transformation. By automating mundane tasks and empowering users, they bridge the gap between data and insights. As data continues to grow exponentially, copilots are poised to become indispensable tools for businesses navigating the modern data landscape.

--

--

Sanjeev Mohan

Sanjeev researches the space of data and analytics. Most recently he was a research vice president at Gartner. He is now a principal with SanjMo.