What is Shift Left Data Security?

Sanjeev Mohan
7 min readSep 29, 2023

When the Dutch philosopher Desiderius Erasmus said, “prevention is better than cure,” around 1500 CE, he might as well have been talking about data security. The prevention he was talking about was to proactively detect causes of ill health so their impact could be minimized. “Shift Left” data security has similar goals, albeit for your mission critical data.

Many companies implement data access governance and security measures in their analytical destination, such as in a cloud data warehouse or a cloud lakehouse. However, stakeholders responsible for risk, compliance, governance, and security now understand the importance of subjecting sensitive workloads to complete data governance and protection even before they land in the analytical data store.

Data owners need to safeguard data from the moment it begins its journey in the source systems to data transformation subsystems and finally into the analytical stores. It is no longer sufficient to rely on securing your data once it has landed in a cloud data warehouse. Consistent data security policies should be applied whether the data is in ETL and ELT pipelines or in temporary storage facilities, like Google Cloud Storage (GCS) or Amazon S3 buckets.

Safeguarding data from various sources is of utmost importance. Therefore it is imperative for data enterprises to enforce data governance and security protocols early on in their data’s lifecycle, prioritizing data protection from the moment it is transferred from a source system.

Why is the current data security approach insufficient?

Although most organizations have made the bold leap to the cloud, unfortunately, they have taken their traditional data security approach into the new world as well. The problem with this is that these older approaches were built for the on-premises environment when the number of data sources were small and manageable.

In the so-called “modern data stack”, we have seen a Cambrian explosion of data sources, data consumers, and an ever-increasing number of use cases. No longer are we limited by securing data that originated within our firewall and was consumed by a handful of well-known consumers. Recent studies show that enterprises, on an average, used 110 SaaS products and large companies now have close to 500.

In modern data architectures, data is “handed-off” at various points. Each system involved in the architecture has its own security posture which knows nothing about the other systems. Figure 1 shows all the touch points that have the potential to “leak” security.

Figure 1: Modern data architectures don’t implement data security early enough which increases risks and costs. Hence, a new approach is needed.

Force-fitting traditional data security approaches to modern data architectures is possible but only because of the following limitations:

  • Lack of common security standards. Traditional data security products built for either operational or analytical data stores are unable to interoperate with one another.
  • Data security silos. Typically, data security capabilities for databases, files, events and APIs are offered by different vendors. However, organizations are demanding simplification and unification of architectures instead of multiple disconnected solutions.

Modern architectures are defined by a hybrid of centralized data infrastructure, disaggregated compute engines, and distributed applications, such as data products. Therefore, a modern data security approach must be flexible and address hybrid data ecosystems. This allows the consumers to use multiple data consumption approaches, including conversational user interfaces, made possible through the large language models.

Hence, it behooves us to reimagine data security using the first principles- by examining the various layers of the data stack without the pre-existing assumptions. And start thinking about data security early in data’s journey from producers to consumers. This is what we call “shift left” data security.

Building blocks of Shift Left data security

Shift left approach is used in software engineering to continuously monitor and test early in the software development life cycle so that potential issues can be identified, prevented and/or addressed sooner. Applying shift left principles to data management is in keeping with how the DevOps capabilities of automation, CI/CD, agile, and data observability are being adopted to data and analytics workloads. By shifting data management and governance left, teams can identify and address data-related issues early on, which can save time and money in the long run.

Shifting left data governance means initiating the strong data access governance and data security capabilities available on cloud data warehouses and lakehouses, and extending them back to data as it leaves source systems.

By achieving data access governance and data security practices further left in the data stream, data users ensure the policies are attached, and applied, throughout the data journey to the cloud and to the data users themselves.

Figure 2 shows the building blocks of shift left data security.

Figure 2: Data security gets closely integrated with the rest of data observability and data governance platforms for unified and comprehensive metadata management.

Two important aspects of this new thinking for data security are:

  • Expansion of data observability. So far, data observability has been confined to aspects like data quality and reliability. However, data security must be a key linchpin before any application is put into production. Data security needs to become another use case application of the underlying data and be unified into the rest of the data observability subsystem. In doing so, data security benefits from the alerts and notification capabilities of data observability offerings.
  • Comprehensive data governance. Today, data governance platforms capabilities include business glossary, catalog, and lineage. They leverage metadata that is used to accelerate and govern analytics. However, the same metadata should be augmented by data security policies and user access rights. This expansion of metadata management will further increase trust and allow more users to access data.

A comprehensive data observability and governance will truly lead to data democratization. Proactive and transparent views on security of critical data elements will speed up application development and make organizations more productive.

How does one achieve Shift Left data security in ALTR?

Shift Left data governance and data security requires specific architectural focus to be successful. ALTR accomplishes this by offering a SaaS based platform that can be implemented as soon as data leaves the source system.

When data leaves source systems and enters an ETL/ELT pipeline, that partner can call directly to ALTR through open-source connectors or through REST APIs to achieve classification, tagging, or tokenization directly in the ELT/ETL solution.

Similarly for data catalogs, sensitive data is governed the instant it leaves the source system. When the data traverses the data pipeline and lands in Snowflake, it will land with all appropriate tags and governance policies still attached. This is achievable because of ALTR’s integration with connectors to the left of the cloud data warehouse in the data pipeline. ALTR’s SaaS capabilities provide the connectors needed to shift governance and security left — offering implementation into ETL pipelines, data catalogs, streaming buses, and anywhere in the architectural diagram that exists to the left of the cloud data warehouse.

This method of applying policy as it is leaving source systems, ensures that the data is secure in motion and at rest — allowing data users to trust the full security of their data, while confidently moving it to where it needs to go.

Data governance solutions that rely solely on direct integrations with cloud data warehouses to implement governance and security, may leave highly sensitive data unsecured while in motion.

Conclusion

Sensitive and regulated data left unprotected prior to reaching the cloud data warehouse means that data is at a high risk of exposure. The shift left principles from highly available, cloud-native, SaaS- based offerings provide many benefits including:

  • Complete regulatory compliance

For companies in various verticals — from healthcare to financial services to education, the compliance and privacy risks have yet to be quantified for data in transit to the cloud. Gaps exist as data traverses the data pipeline from source systems to the cloud to data users that represent high security threats and compliance gaps. ALTR’s Shift Left approach closes the security and compliance gaps by securing and governing data throughout the entire data journey — not only once it reaches the cloud data warehouse.

  • Faster access to data

Many data governance and data security solutions require months long implementation cycles for their offerings because of legacy architecture and proxy-based approaches. Then the created policies only apply to data that already exists in a cloud data warehouse. ALTR’s architecture and SaaS based approach allows data users to implement data governance and data security policies in minutes, not months.

  • Self service

Because of ALTR’s SaaS based offerings and point-and-click user interface, non-technical users can classify data, create masking policies, and set locks and thresholds without having to write SQL. Simply by connecting a database through Snowflake Partner Connect, data users are able to begin governing their data immediately — with no wait time, no lengthy implementation period, and all for free for one database.

Shift left approach for data management is the new north star- whether it is for data quality, observability, and now for data security. Even the concept of data mesh, including data as a product pushes the accountability of data to the left — to the business domain teams.

--

--

Sanjeev Mohan

Sanjeev researches the space of data and analytics. Most recently he was a research vice president at Gartner. He is now a principal with SanjMo.