ETL Data Flow
Extract, Transform, and Load (ETL) for Softbank Data Lake
We use
AWS Cloud Platform
, 30 services, to produce100 GB
IoT data daily from diverse databases (Mysql
,Cassandra
,Salesforce
, etc.) to SoftbankData Lake
.Details
The SBRE Data Lake
has been storing about 85 TB data since 2016 so that interested people (data science team or other SBR teams or external partners) can access and analyze it.
The Data Sources
Data have been fetched from 4 diverse sources, such as: from
Cassandra
via API, fromMySql
via SQL query; fromSalesforce
via API, and fromDynamoDB
via API. All Data Lake data are stored in AWS S3 buckets.