
GCP Analytics Platform Foundation
N Br*wn (NBG), 2025
Background
1. Initial Setback:
On Day 1, half the team resigned, requiring urgent backfills and strategic communication to stabilize morale and ensure project continuity.2. Budget Constraints:
By Month 2, a budget freeze reduced headcount by 5, necessitating internal workload reallocation and efforts to maintain stakeholder trust.3. Operational Turbulence:
By Month 4, layoffs affected 50% of data engineers, and the line manager departed during the company’s transition from public to private. The new IT manager lacked data engineering expertise, pausing all projects except the critical 12-week GCP analytics initiative.WHAT
Delivera modernized GCP data ecosystem
within three months for NBG, a 160-year-old UK retailer:- Analytics Platform Architecture:
Improved 14 areas, including ingestion, transformation, orchestration, and logical architecture.- Secure Foundations:
Enhanced 25 areas, such as user management, service accounts, networking, and data governance (protection, quality, cataloguing).HOW
- Robustness:
Leveraged Google Cloud Functions, BigQuery, Cloud Composer, and Terraform for efficient data management and compliance.- Scalability:
Replaced Kafka with Storage Transfer Service, improving data ingestion and decision-making via Analytics Hub.- Security & Governance:
Strengthened access controls and data integrity with IAM, Dataplex, and data masking.Currently: NBG GCP Analytics Platform Foundation

Post-Remediation NBG GCP Analytics Platform Foundation
Summary of the critical operational enhancements as robust management, scalable architectures, secure environments, and streamlined processes:
1. Robustness
- Data Management:
Utilizes Google Cloud Functions and Run for processing; BigQuery for warehousing; aligns DBT codebase to architectural standards.- Governance:
Managed with Cloud Monitoring, Cloud Composer, and Terraform for stringent compliance and operational efficiency.2. Scalability
- Ingestion:
Streamlined using Storage Transfer Service, removing Kafka to boost scalability, integrating API sources, and simplifying data flow with ODX projects.- Data Sharing:
Enabled through an Analytics Hub for scalable decision-making across the organization.3. Realizability
- Database Management:
Uses Google Cloud SQL and AlloDB for operational and legacy system interactions, incorporating system integration and data quality processes.- Reporting:
Employs tools like Vertex AI, BigQuery Analytics, Looker, and Power BI for diverse reporting needs, streamlining data usage.4. Security
- Access Control:
Tightened through IAM updates and role management, ensuring secure data handling and restricted access.- Data Integrity:
Enhanced with Dataplex for quality management and policy updates in DBT and BigQuery, including data masking for added security.Previously: NBG GCP Analytics Platform Foundation

Previously Data Flow Overview
(E) Extract:FinTech:
Financial, PII, and audit data from S3 to Kafka.NBG AWS:
Customer communications data.On-Premise (Teradata):
Canonical data in Teradata for reporting. (T) Transform:BigQuery:
Primary data warehouse for structured data.Data Warehouse & ODX Projects:
Enhanced data processing.Dataplex:
Ensures data quality and catalog management. (L) Load:Reporting:
Operational and analytical reporting from warehouse and ODX.Oracle Systems:
Financial operations in Oracle's Financial Accounting Hub and Fusion Finance.Key Concerns
1. Kafka Over-Engineering:
Kafka used for file triggers without real-time needs; Airflow could suffice.2. Data Layers & Zones:
Abstract design lacks clear FACT/Dimension tables, aggregation layers, and data mesh for scalability and access control.3. Real-Time Analytics:
Only 2 out of 300 reports require real-time data, sourced from batch processes; real-time ETL unnecessary.4. Data Governance:
Lack of cataloging, lineage, and quality checks raises reliability and security concerns.FS GCP Architecture
Ingestion
1. Simplification:
Simplified the Fintech ingestion architecture using the Storage Transfer Service, removing Kafka to enhance scalability.2. Frameworks:
Created frameworks for API-based data source ingestion and processes for data flows between the BigQuery data warehouse and Cloud SQL ODX.3. Transfer Setup:
Configured AWS-GCP Data Transfer Service.4. Function Optimization:
Simplified Cloud functions to directly listen to GCS events.5. Orchestration:
Limited Cloud Composer to orchestration, utilizing BigQuery's streaming API for data loading.6. IaC Implementation:
Established Infrastructure as Code for ingestion resources.7. Audit Logging:
Enabled audit logs for transfer service jobs.8. Ingestion Code:
Developed a code template for API to ODX ingestion.Transformation
Alignment:
Aligned the DBT codebase with the proposed logical architecture and data modeling principles, including naming conventions and structure.Data Quality:
Implemented data quality tests, tables, and hygiene processes.Data Masking:
Executed data masking to protect sensitive information.
Analytics
Analytics Hub:
Implemented the Analytics Hub with a POC to demonstrate data sharing with analytics projects, reducing direct queries to the Data Warehouse. Developed example datasets, tables, and labels for each layer of the medallion architecture.Proof of Concepts (PoC):
Created proofs of concept for data quality tests, the modeling layer, the presentation layer, and the Analytics Hub.
Operational Data Exchange (ODX)
Deployment of the ODX projects using Cloud SQL databases to enable operational data flows, along with network connectivity.
Secure Foundations
IAM, User Management, and Service Accounts
Roles and Permissions:
Restructured roles and permissions for enhanced access control and alignment with best practices.Custom Roles:
Transitioned from admin roles to specific custom roles such as Storage, BigQuery, and Secret Manager.Group Accesses:
Replaced individual accesses with group accesses and established new groups.Role Elimination:
Eliminated superfluous owner and organization admin roles.Service Accounts Cleanup:
Removed unused service accounts and redundant keys.
Data Governance
1. Dataplex Configuration:
a, Established a central data catalogue project. b, Created tag templates for improved data management. c, Enabled automatic discovery and tagging of sensitive data through API integration. d, Enhanced data quality management protocols.2. DBT Updates:
a, Implemented data masking for sensitive data protection. b, Developed POC for data quality tests and tables.3. Data Retention Policies:
a, Updated data retention policies in BigQuery to ensure data consistency, accuracy, and security.Logging, Monitoring, and Alerting
Terraform Resources:
Implemented Terraform resources to enable organization-wide log-based metrics and alerts for N Brown.Central Logging and Alerting:
Established central logging and alerting systems.Castle Project Logging:
Configured, logging, alerting, Castle project resources.
Networking, CI/CD, and IaC
1. Centralized CI/CD:
Implemented a centralized CI/CD solution for automated management of project creation and network infrastructure, including firewall rules and shared VPCs.2. Refactor Terraform:
Refactored the existing network Terraform codebase to enhance dynamism and scalability.3. Synchronize Terraform/GCP:
Synchronized Terraform and GCP project states to eliminate the need for manual intervention.4. Deploy ODX instances:
Deployed ODX instances across new projects (dev, test, pre-prod, and prod) via IaC.5. Expanded IaC usage for:
a, Managing IAM, user management, and ingestion resources; b, Setting filter-based alerts and managing BigQuery policy tags.