Drowning in Data? A Data Security Platform (DSP) is Your Life Raft

Sanjeev Mohan
4 min readJun 5, 2024

--

In January 2024, Gartner published its first Market Guide on Data Security Platforms, a recognition of the growing importance of combining data security controls, business logic, and fine grained authorization. These are the fundamental traits which allow businesses to unlock the potential of all their data assets and use them to drive decisions.

Data security is not a new revelation, but a well-known imperative. In the past, these controls had been implemented as an afterthought and in a siloed manner that has hampered timely and secure access to corporate data. However, organizations are now prioritizing data security and implementing it in a structured manner. It is no surprise to see Gartner report that the number of calls related to data security grew 70% between 2021 and 2022.

While ChatGPT unleashed the potential of AI transforming our organizations, an even bigger opportunity lies ahead in the shape of personalized AI stack. This stack takes the generic large language models and corporate data, allowing the model to generate results grounded in real-world business information

With this power, businesses can truly leverage the hidden potential of vast amounts of structured and unstructured data. And by grounding the results in carefully curated corporate data, they can reduce hallucinations and increase trust in the outcomes of generative AI workloads.

However, to achieve this state of zen enlightenment, one must ensure that corporate security guidelines and all relevant regulatory compliances are first met. This requires a sophisticated Data Security Platform (DSP).

Components of a DSP

Like a car’s brakes, data security is not meant to slow one down but allow them to accelerate with trust and confidence. They help create guardrails against intentional or unintentional use of the data infrastructure. The goal is to ensure that the right people have access to the right data when they need it to drive business decisions and gain competitive advantage.

A solid data security platform consists of three building blocks as outlined below.

Data Security Platform (DSP) building blocks

Discovery and Observability

The first iteration of big data, started by Hadoop, turned data lakes into data swamps due to a lack of understanding of the data. In the rush to make data available for analysis, the critical step of understanding the data was skipped. And that included sensitive data.

A modern DSP should have the ability to connect to the source systems and determine the nature of data. Whether data is sensitive or not is hidden in the context of data. Once sensitive data is discovered, it must be tagged according to the corporate security guidelines and applicable regulatory compliances. This data may be personally identifiable information (PII), personal health information (PHI), financial data, intellectual property, or trade secrets.

Your DSP should be able to connect to all the relevant data sources and detect sensitive data using multiple approaches:

  • Profile source data: data scanning and profiling often uses a sample but must have the ability to scan a complete set of data. However, the latter can put load on operational systems. This option requires permissions to access source data.
  • Profile responses: To overcome some of the barriers mentioned above, responses can be profiled and classified. For example, it can detect emails, social security numbers and other PII information.

Tagging of data can be manual but most often it uses sophisticated ML inference algorithms. Also, this process should be continuous as data changes in real-time. Hence the need for the observability capabilities in your DSP.

Policy Definition

The next step is to have the ability to define fine-grained policies and rules for data security, such as authorization and encryption. Data stewards should have the ability to author governance policies intuitively and self-serviced, not use some of the older approaches that are prevalent in the identity and access management systems. The most common approach is to use a user interface with drop down options. For example, there may be options for encrypting or masking data or tags.

Newer systems allow policies to be inferred and auto discovered. Automation of policy creation also helps when a user leaves the company and all relevant policies must be deleted. In order to achieve this capability, it is important that the DSP integrates with the rest of the data governance infrastructure such as data catalogs. For example, an integration with a data catalog allows users to shop for data and see what is available, request access, capture intent and consent, and finally grant access. These products must maintain access history and audit logs.

Policy Enforcement

The final capability is executing data security policies with minimal overhead and latency. Role-based access control (RBAC) should be used to assign access rights based on the user’s role in the project. Attribute-based access control (ABAC) should be used to grant access based on a combination of user attributes, data attributes, and environmental attributes, providing more granular control.

Some of the important considerations for policy enforcement include:

  • Consistency: security policies should be enforced on data irrespective of where it resides, like cloud data warehouses, operational systems, object stores, or lake houses.
  • Low latency and scalable: the data security overhead should add minimal overhead to the queries. It should also be able to scale with the growth of workloads.
  • Dynamic: ABAC is preferred as it is dynamic and adjusts as users’ environments change.
  • Agile: your data security product should be transparent to the end-users. Ideally, the user should go to an endpoint or use an API that automatically executes security policies and does not require changes to the schema or queries.
  • Deployment: Modern security products are deployed as either SaaS tools or within a private cloud orchestrated via Kubernetes.

Seamless policy enforcement helps build trust in data and increases its utility. It allows organizations to expand their options to share data with consumers, like deploy data marketplaces.

--

--

Sanjeev Mohan

Sanjeev researches the space of data and analytics. Most recently he was a research vice president at Gartner. He is now a principal with SanjMo.