Data Analytics and AI

Centralized Enterprise Data Warehouse Build for Modern Data Platform

case-study-feature-img

One of the largest vertically integrated cannabis companies in U.S. needed to build a modern data platform on the cloud with a centralized Enterprise Data Warehouse to provide accurate and timely data to users and ensure data availability for reporting and BI functions.

Client Challenges and Requirements

  • Build an end-to-end cloud agnostic data platform that is secure, flexible and high performant.
  • Solve current data quality and accuracy challenges.
  • Identify and implement different open source or native tool stacks for cost-effective solutions.
  • Onboard, cleanse and enrich data for companywide consumption.
  • Build a scalable architecture that can be extended to future use cases.

Bitwise Solution

  • Use Talend to replicate data from various SQL Server and Domo sources into a target data warehouse raw layer.
  • Use dbt to transform raw data in data warehouse into the format required by data models. Run automated data quality tests using dbt-expectations to ensure data quality.
  • Use Airflow to schedule the execution of Talend data ingestion pipelines, dbt transformations and dbt models. Trigger automated emails in case of DAG failure.
  • Implement a branching strategy for each environment to facilitate a controlled release process and maintain separate codebases for different stages.
  • Configure Kubernetes container to host Airflow, internal load balancer to expose Airflow UI to schedule. Create separate log workspace for Kubernetes logs.
  • Support migrated data during all testing phases and fix reported issues with minimal turnaround time.

Tools & Technologies We Used

Talend Open Studio
dbt core
Airflow
Azure DevOps
Kubernetes
BCP
Postgres

Key Results

skill-icon

Data gets refreshed every 15 minutes in production to achieve data sync with source

skill-icon

100% data integrity achieved with automated data quality checks of data loads

skill-icon

Improved efficiencies with automated code deployments to higher environments

skill-icon

High scalability, portability and stability of the Airflow scheduler with Kubernetes

Download Case Study

    To get our latest updates subscribe to our Newsletter.

    Ready to start a conversation?