Building Teams Where Policy Meets Engineering and Distinguishing Normal from Almost Normal Behavior

Last updated
Written by:
Vishwa Pandagle
Vishwa Pandagle
Cybersecurity Staff Editor
Key Takeaways
  • Once data enters AI pipelines, it is transformed and reused in ways not anticipated when original consent was obtained. 
  • Teams often struggle to determine whether new data uses still match the original purpose and consent terms.
  • De argues that purpose limitation must be enforced in technical architecture, since policy documents alone are insufficient.
  • Privacy must be treated as a runtime property, verified continuously as data moves through different stages.
  • Privacy License recommends creating shared technical primitives that teams can integrate into existing workflows to reduce fragmentation.

Nabanita De, CEO and Founder of Privacy License, discusses the realities of privacy inside AI pipelines as part of TechNadu’s International Women’s Day series. De brings experience across privacy, cloud security, and data engineering roles, including at Microsoft and Uber.

She discusses the earliest access-pattern signals that can indicate misuse before an incident becomes obvious, including query behavior, unusual access times, unexpected exports, new API keys, and atypical inference activity in AI systems. 

De emphasizes that detection depends on understanding baseline behavior closely enough to distinguish legitimate variation from genuinely concerning deviations. She stresses that compliance must be treated as a runtime property, verified through ongoing validation and consistent telemetry across systems.

Read on to understand how De approaches privacy enforcement in a world of overlapping and shifting regulations, from GDPR and CCPA to Brazil’s LGPD and emerging rules in Asia.

Vishwa: What privacy or data-governance challenges emerge around data reuse and secondary processing as AI systems move into everyday use, and how can they be addressed? 

Nabanita: The fundamental challenge is that once data enters AI pipelines, it gets transformed, combined, and reused in ways that were never anticipated when consent was originally obtained. When someone agrees to share their information for one purpose, they have no idea that the same data might be used to train a model, fine-tune another system, or get embedded in ways that make it nearly impossible to delete later.

We see this constantly with AI training datasets. Data collected for customer service interactions suddenly becomes training material for language models. Information gathered for fraud detection gets repurposed for behavioral prediction. 

The original context and consent framework simply breaks down. Teams often struggle to track whether new uses still match the original purpose, whether the consent and notice still apply, and whether sensitive attributes are being inferred rather than directly collected. 

Secondary processing also creates governance gaps because data flows across products, vendors, and internal teams faster than policies can keep up. Addressing this requires building purpose limitation enforcement into the technical architecture itself; policy documents alone are insufficient. 

The system must understand and enforce boundaries on how data flows and transforms by implementing technical controls that track data lineage, enforce usage restrictions programmatically, and ensure secondary processing occurs only within the bounds of original consent. 

Privacy must become a runtime property, verified continuously as data moves through different stages. Treat data use as a lifecycle problem: 

Strong controls include purpose binding, data minimization, retention enforcement, dataset-level access controls, and auditability designed specifically for ML workflows. 

Vishwa: How does data move through AI pipelines in ways that reduce visibility for security and privacy teams? 

Nabanita: AI pipelines create what I think of as visibility black holes. Data enters through various ingestion points, gets preprocessed, transformed, embedded into vector representations, fine-tuned into model weights, and surfaces again during inference. 

Each transition creates opportunities for data to move in ways that traditional security and privacy monitoring tools cannot follow. The challenge is particularly acute with embeddings and model weights. 

When personal information gets encoded into a neural network, it no longer exists as discrete records trackable by conventional data loss prevention tools; the information is diffused across millions of parameters. 

You cannot point to a specific weight and say "this contains John's health record," yet the model might still reveal that information under the right prompting conditions. Visibility drops further because AI development often happens in experimentation environments with looser controls. 

Data scientists pull datasets, create copies, run experiments, and data proliferates across notebooks, training runs, and model artifacts. By the time something moves to production, reconstructing the full provenance of what data influenced the system becomes extremely difficult. 

Data gets copied into feature stores, vector databases, staging buckets, prompt logs, and third-party labeling tools, each copy creating a new exposure surface. Security and privacy teams lose signal because these components are often owned by different teams, run in different environments, and produce logs that are inconsistent or not retained. 

Improving visibility means standardizing lineage, centralizing telemetry, and building inventory and access insights across the full pipeline, including prompts, embeddings, model inputs, and outputs. 

Vishwa: What makes operationalizing privacy beyond policy documents difficult for teams? 

Nabanita: There is a massive translation gap between what policies say in natural language and what engineers can actually implement in code. A policy might state "personal data should only be retained as long as necessary for the original purpose." 

That sounds clear until you try to implement it. 

Policies describe intent, but teams execute through systems, tickets, and engineering decisions. The hardest part is translating broad legal requirements into technical controls that are measurable, testable, and repeatable. 

Ownership is often fragmented across legal, security, product, data, and engineering, and no single team feels accountable for end-to-end implementation. The pace of product iteration also creates drift: a policy can be correct while systems quietly change underneath it. 

The other major challenge is that privacy requirements often cut across organizational boundaries and technical systems in ways that do not align with how teams are structured. Implementing proper consent management might require changes to the mobile app, the backend APIs, the data warehouse, and the analytics platform. 

Each of those is owned by a different team with different priorities and release cycles. Getting everyone aligned on implementing privacy controls consistently becomes a coordination nightmare. What I have found effective is creating shared technical primitives that teams can integrate into their existing workflows. 

Instead of asking every team to implement privacy from scratch, you provide them with tools that make the right thing the easy thing.

Privacy controls need to be embedded into the development process itself, with automated validation that catches issues before they reach production. Operationalizing privacy becomes much easier when requirements are expressed as controls, controls are mapped to systems, and teams can verify compliance through continuous checks rather than relying on static documentation. 

Vishwa: What role can continuous validation play, compared to one-time approval, in managing privacy controls as AI systems change over time? 

Nabanita: One-time approvals assume systems stay stable. AI systems do not stay stable because prompts change, models get swapped, datasets get refreshed, and outputs evolve as the environment changes. These systems evolve constantly through retraining, fine-tuning, prompt adjustments, and configuration changes. 

A model that was privacy-compliant when it launched might drift into non-compliance as it learns from new data or as its operating context shifts. Continuous validation recognizes that privacy compliance is a runtime property that must be monitored, tested, enforced and verified repeatedly, on an ongoing basis. 

This means implementing automated checks that run regularly to verify that access patterns remain within expected bounds, that data retention policies are being enforced, that model outputs are not leaking sensitive information, consent preferences are being respected across all processing activities, drift in dataset composition, policy violations in logs, and access anomalies in model-serving and data layers. 

Continuous validation reduces risk by catching regression early, and it creates a feedback loop where teams learn which controls are effective in practice. The practical benefit is that you catch problems while they are still small. 

If a configuration change inadvertently expands data access beyond what was intended, continuous validation detects this within hours rather than discovering it during an annual audit. You also build an evidence trail that demonstrates ongoing compliance rather than relying on point-in-time assessments that may not reflect the current reality. 

Vishwa: When making privacy or governance requirements machine-verifiable, what contextual factors need to be considered? 

Nabanita: Machine-verifiable requirements fail when they ignore context. The same data can be low-risk in one setting and high-risk in another based on purpose, user expectations, jurisdiction, sector rules, contractual terms, and whether data is sensitive, inferred, or aggregated. 

Actor context matters too: 

System context is equally important: 

The temporal dimension adds further complexity. Privacy expectations evolve; data considered innocuous a decade ago might be highly sensitive today. Machine-verifiable rules need mechanisms to accommodate shifting norms without requiring complete system redesigns. 

Jurisdictional variation compounds this: what is permitted under GDPR differs from CCPA, which differs from Brazil's LGPD, which differs from emerging regulations in Asia. A global system must verify compliance against multiple overlapping and sometimes contradictory requirements simultaneously. 

Nabanita De

The approach I advocate is building systems that reason about context explicitly. Rather than encoding rigid rules, you create frameworks that evaluate whether a given processing activity is appropriate, given the relationship between the data subject and the processor, the stated purpose, the applicable regulations, and the current operational context. 

Nabanita De
CEO and Founder of Privacy License

Machine-verifiable governance works best when it captures purpose, scope, sensitivity, jurisdiction, and permitted actions in a structured way that systems can enforce and logs can prove. 

Vishwa: From a security practitioner's perspective, how do unusual access-pattern signals indicate data misuse before an incident occurs, and where do they appear in the attack chain? 

Nabanita: Anomalous access patterns are often the earliest warning signs in the kill chain, appearing during the reconnaissance and initial access phases before any data exfiltration occurs. When someone is preparing to misuse data, their access behavior changes in characteristic ways. 

In AI systems specifically, these signals manifest in interesting ways. 

Someone attempting to steal model weights might first conduct extensive testing to understand model behavior before attempting the actual extraction. Early signals can appear in feature store reads, object storage access, prompt logging systems, vector database queries, or model-serving endpoints that suddenly receive atypical inputs.

In the attack chain, these signals often appear during reconnaissance and collection phases, before exfiltration becomes obvious. The challenge is distinguishing malicious anomalies from legitimate ones. 

Effective detection requires understanding baseline behavior at a granular level and building models that account for legitimate variation while still flagging genuinely concerning deviations. That is why anomaly detection and tight telemetry around data access, not just perimeter alerts, are so valuable. 

Vishwa: How can security and privacy professionals align technical controls with legal, ethical, and business expectations? 

Nabanita: The alignment challenge stems from these domains using different conceptual frameworks and vocabularies to describe related concerns. Legal thinks in terms of liability, obligations, and regulatory interpretation. Ethics considers principles, values, and stakeholder impacts. Business focuses on revenue, risk tolerance, and competitive dynamics. 

Security speaks in threats, vulnerabilities, and controls. Getting these perspectives to communicate requires active translation work. Alignment improves when teams start with shared definitions of risk and acceptable use. Legal teams define obligations and boundaries, security teams define threat realities, and product teams define user experience and business goals. 

Translating that into controls requires mapping each expectation to specific system behaviors, measuring whether those behaviors are happening, and documenting evidence that is understandable across disciplines. 

Building shared artifacts helps enormously. Frameworks that map business risks to technical controls to legal requirements to ethical principles let everyone see how their concerns connect. When a lawyer asks about data retention, they can trace that to specific technical implementations. When engineers propose a new feature, they can understand the compliance implications before building. 

Cross-functional working groups that include representatives from legal, security, privacy, product, and ethics create ongoing translation capacity. These groups develop shared understanding over time and build relationships that make ad-hoc collaboration easier. Regular reviews, clear ownership, and a control framework connecting requirements to implementation and verification keeps alignment strong as systems evolve. 

The practical advice: invest in people who can bridge these worlds. Professionals who understand both the technical and policy dimensions are rare and incredibly valuable. When you find them, give them the organizational support and authority to drive alignment. 

Vishwa: How can founders think about prioritizing governance based on customer risk exposure? 

Nabanita: A useful approach is to prioritize based on where harm and liability concentrate. Founders can start by mapping what data they handle, who is affected, what the worst credible misuse looks like, and which customers face strict obligations. 

Then they can focus on controls that reduce risk quickly: data minimization, access control, logging, retention, breach readiness, and clear purpose limits. High-risk customers often care about provable controls and auditability, while lower-risk customers may care about baseline hygiene and trust signals. 

Prioritization becomes clearer when founders tie governance work to specific customer buying triggers such as enterprise security reviews, regulated sector demands, or contractual requirements. Understand where your customers face their most acute regulatory and operational risks. 

If you are selling to healthcare organizations, HIPAA compliance is not optional and the penalties for violations are severe. If your customers are processing children's data, COPPA becomes critical. The governance capabilities you prioritize should directly address the risks that keep your buyers awake at night. 

This requires genuine conversations with customers and prospects about their risk landscape. 

The answers tell you where governance investment creates the most value. You also need to consider asymmetric downside risks. Some governance failures create bounded problems that can be remediated. Others create existential threats to customer businesses through regulatory action, loss of customer trust, or litigation exposure. Prioritize preventing catastrophic outcomes even if they seem less probable than minor incidents. 

The practical guidance is to get very specific about which certifications and compliance capabilities actually influence purchasing decisions versus which ones are nice-to-have. Then sequence your governance investments to unlock revenue gates as efficiently as possible.


For a better user experience we recommend using a more modern browser. We support the latest version of the following browsers: For a better user experience we recommend using the latest version of the following browsers: