BABA MALIK
HUSSAIN

SENIOR DATA ENGINEER

AzureDatabricksMLOpsLakehouse

Building scalable Lakehouse and MLOps foundations on Azure. Transforming raw data into actionable intelligence through cutting-edge data engineering solutions.

WHO I AM

Data Engineer focused on building scalable Lakehouse and MLOps foundations on Azure. With hands-on experience in Databricks, Spark Structured Streaming, Delta Lake, and Synapse, I deliver reliable batch and real-time pipelines that power enterprise analytics.

I've implemented automated deployments with Terraform + Azure DevOps and built strong data quality, governance, and monitoring systems using Unity Catalog, Great Expectations, and Azure Monitor.

My expertise extends to building end-to-end MLOps workflows with MLflow, feature stores, drift detection, and retraining pipelines, plus a multimodal RAG agent for document intelligence.

3.5
Years Exp
3
Projects
3
Certifications

Lakehouse Architecture

Expert in Medallion (Bronze/Silver/Gold) architecture for data lineage and governance

Azure Ecosystem

Deep expertise in Azure Data Factory, Synapse, Databricks, and ADLS Gen2

MLOps & AI

Building end-to-end ML workflows with MLflow, feature stores, and drift detection

Real-time Pipelines

Spark Structured Streaming for reliable batch and real-time data processing

TECHNICAL SKILLS

Languages

PythonSQLPySparkShell Scripting

Cloud Platforms

Azure Data FactorySynapse AnalyticsDatabricksAzure FunctionsLogic AppsAzure MonitorADLS Gen2Azure ML

ETL & Orchestration

Azure Data FactoryDatabricks WorkflowsSpark Structured StreamingAirflowdbt

DataOps & CI/CD

GitJenkinsDockerTerraformAzure DevOpsCI/CD Pipelines

Data Quality & Governance

Unity CatalogGreat ExpectationsData LineageSLA/SLO Monitoring

Databases

Delta LakeSynapse SQLSnowflakeBigQueryRedshiftSQL ServerPostgreSQL

ML/AI Tools

MLflowLangChainChromaDBHugging FaceOpenAI APIRAGASFastAPI
40+ TECHNOLOGIES

PROFESSIONAL EXPERIENCE

Senior Data Engineer

Target
Mar 2024 – Jan 2026
  • Built the Lakehouse platform using Databricks, Spark, and Delta Lake with Medallion (Bronze/Silver/Gold) architecture for data lineage and governance
  • Developed and optimized ETL pipelines for batch and streaming data using Python, PySpark, and Spark Structured Streaming
  • Automated CI/CD pipeline deployments and infrastructure provisioning with Terraform and Azure DevOps
  • Implemented data quality monitoring and observability using Great Expectations, Azure Monitor, and custom SLA/SLO alerting frameworks
  • Optimized Spark performance through Z-ordering, file clustering, and partitioning strategies on Parquet/Delta formats
  • Used PySpark MLlib and MLflow in Databricks to train and deploy classification and clustering models for customer segmentation and churn prediction
  • Built and maintained a feature store in Unity Catalog/Delta tables to serve consistent features to training and production ML pipelines

Data Engineer

Cognizant
Jan 2021 - Jul 2022
  • Designed Azure Data Factory ETL pipelines to ingest data from multiple transactional systems into Azure Synapse dedicated SQL pools
  • Built Azure Functions to validate incoming files, update metadata tables, and trigger downstream processing
  • Orchestrated end-to-end data ingestion and transformation workflows using Azure Data Factory pipelines and Azure Logic Apps
  • Leveraged Azure Synapse serverless SQL and Azure Data Lake Storage for on-demand querying over large datasets
  • Implemented monitoring and alerting with Azure Monitor, Log Analytics, and action groups to track ETL pipeline health
  • Collaborated with BI teams to deliver Power BI dashboards on certified Azure Synapse and Data Lake datasets

KEY PROJECTS

Azure MLOps & Data Lakehouse

Built automated MLOps framework with Medallion architecture on ADLS Gen2, processing NYC taxi data. Implemented Delta Lake ACID transactions ensuring data integrity and optimized feature serving through Synapse Serverless SQL views.

Key Features
  • Medallion architecture (Bronze/Silver/Gold)
  • Delta Lake ACID transactions
  • Self-healing monitoring with Azure Monitor
  • Automated model drift detection and retraining
AzureDatabricksDelta LakeMLflowPySpark

Multimodal RAG Agent for Document Intelligence

Architected Multimodal RAG system capable of querying unstructured PDFs containing text, hierarchical tables, and visual charts. Engineered Multi-Vector Indexing strategy with ChromaDB, linking text embeddings to visual assets for context-aware retrieval.

Key Features
  • Multi-Vector Indexing strategy
  • Context-aware retrieval system
  • RAGAS evaluation framework
  • Parent Document Retrieval for accuracy
LangChainChromaDBOpenAIFastAPIRAGAS
ENTERPRISE SOLUTIONS

EDUCATION & CERTIFICATIONS

EDUCATION

Master of Science in Data Science

New Jersey Institute of Technology

2023New Jersey, USA

Bachelor of Technology in Computer Science

Narasaraopeta Engineering College

2020India

CERTIFICATIONS

Microsoft

Microsoft Certified: Azure Fundamentals

Issued 2024
VERIFY
Microsoft

Microsoft Certified: Fabric Data Engineer Associate

Issued 2025
VERIFY
Databricks

Databricks Certified Data Engineer Associate

Issued 2025
VERIFY
VERIFIED CREDENTIALS
3
Industry Certifications

LET'S CONNECT

Ready to build scalable data solutions? Let's discuss how I can help transform your data infrastructure and drive business value.

READY TO START A PROJECT?

I'm currently available for freelance work and full-time opportunities. Let's build something amazing together.