Key Responsibilities
- Develop ETL/ELT pipelines in Databricks using PySpark, Spark SQL, Delta Lake .
- Use Delta Live Tables for simplified pipeline orchestration.
- Implement Databricks Auto Loader for real-time/batch data ingestion.
- Build Databricks SQL dashboards and queries for reporting and analytics.
- Manage Databricks clusters, jobs, and workflows ensuring cost efficiency.
- Work with cloud-native services ( ADF, Synapse, ADLS or AWS Glue, S3, Redshift ) for data integration.
- Apply Unity Catalog for role-based access and lineage tracking.
- Collaborate with data scientists to support ML workloads using MLflow .
Mandatory Skills
- Strong Databricks expertise : PySpark, Spark SQL, Delta Lake (ACID, schema evolution, time travel).
- Exposure to Delta Live Tables, Auto Loader, Unity Catalog, MLflow .
- Hands-on with Azure or AWS data services .
- Strong SQL and Python programming for data pipelines.
- Knowledge of data modeling (star/snowflake, lakehouse) .
Good to Have
- Streaming data experience (Kafka, Event Hub, Kinesis).
- Familiarity with Databricks REST APIs .
- Certification: Databricks Data Engineer Associate , Azure DP-203 / AWS Analytics Specialty.