Personally Identifiable Information (PII)

Personally Identifiable Information (PII)

Strategies for Secure Data Handling and Compliance

Personally Identifiable Information (PII) is crucial for data and AI companies, like retail businesses, because mishandling it can lead to privacy breaches, legal issues, and damage to trust. Protecting PII is vital for compliance and maintaining customer confidence.

1, Encrypt PII data using non-reversible methods.

2, Restrict direct client access to PII; require Privacy and Security team approval for specific use cases, ensuring GDPR compliance.

3, When sharing data with clients (data science or BI reporting), either remove PII or use non-reversible methods like sha256 or "salt" hashing to protect it.

  • Personally Identifiable Information (PII)
  • HMLT5/CSS3
  • EU GDPR
  • Security
  • Data Governance

Details

PII Control in Data Warehousing Ecosystem
PII Optimizing in Data Warehousing Layers

1, Raw Layer: Reserve for initial raw data, including PII, but only if not shared or used for business purposes. Implement regional-specific PII protection strategies (e.g., GDPR compliance in Europe, China Mainland regulations).

2, Silver Layer: Contains cleaned, high-quality data from the raw layer. Ensure it has no PII, either through complete removal or non-reversible masking methods. This layer is accessible to business consumers.

3, Gold Layer: Customized data layer built on top of the Silver Layer to meet specific data consumer needs. Exclude PII from this layer while maintaining the underlying data quality.

PII Control in Feature Store Ecosystem
PII Optimization in Feature Store Layers

1, Curated Data Layer (CDL): Exclude PII during feature table onboarding in the Landing Zone, except in cases without a Centralized Data Lake. PII should stay in the landing zone, not accessible to data science consumers.

2, Entities Layer (EL): Built atop CDL, it must never contain PII. EL is fully open to data scientists for model training and serving.