Data Analytics and AI
Data Reliability Engineering for ETL System
A $100B NYSE listed retail company required a data reliability engineering team to help them stabilize the ETL system by reducing issues and getting complete control of end-to-end data flow in the system to improve time-to-repair and timely arrival of quality data.
Client Challenges and Requirements
- System had multiple issues related to incorrect SLAs, bad data, long haul to detect issues and multiple instances of major incidents
- Stage environment was difficult to monitor due to out of sync code issues making it unstable and not an environment to carry out integration testing
- Inefficient collaboration between operations and development for the betterment of production stability
Bitwise Solution
Data reliability team followed below practices to bring in the change.
Tools & Technologies We Used
Informatica PowerCenter & IDMC
Unix
TWS
Key Results
Incidents reduced by > 60%
Stage in sync with Prod helps to carry out integration test
Quick response to major incidents and reducing blast radius
Incident Time-to-Detect and Time-to-Repair reduced by 40%