Your key responsibilities
- Develop and deploy data lakehouse pipelines in a cloud environment using AWS and Databricks services.
- Design and implement end-to-end ETL/ELT workflows using Databricks (PySpark, Delta Lake, DLT) and AWS native services such as S3, Glue, Lambda, and Step Functions.
- Migrate existing on-premises ETL workloads to the AWS + Databricks platform with optimized performance and cost efficiency.
- Interact with business and technical stakeholders, understand their data goals, and translate them into scalable, governed solutions.
- Design, optimize, and monitor Spark jobs and SQL models for faster execution and high throughput.
- Implement and manage Unity Catalog for metadata, lineage, and fine-grained access control across workspaces.
- Participate in data architecture reviews, DevOps automation, and CI/CD deployments for Databricks workflows.
- Collaborate closely with data analysts, BI developers, and data scientists to ensure reliable, high-quality data delivery.
Skills and attributes for success
- Overall 4+ years of IT experience with 2+ years of relevant experience in Databricks and AWS cloud-based data engineering.
- Strong hands-on experience in Databricks (PySpark, Delta Lake, Delta Live Tables, Unity Catalog).
- Practical knowledge of AWS services such as S3, Glue, Lambda, Step Functions, CloudWatch, and Redshift.
- Experience working with structured and semi-structured data formats (CSV, JSON, Parquet, XML).
- Good understanding of ETL orchestration using Databricks Workflows, Airflow, or Step Functions.
- Familiarity with metadata-driven ingestion frameworks and medallion architecture (Bronze, Silver, Gold) design principles.
- Solid experience in Python, PySpark, and SQL for data transformation and validation.
- Knowledge of CI/CD practices and version control (GitHub, Azure DevOps, Jenkins).
- Excellent analytical, troubleshooting, and problem-solving skills.
- Ability to work independently, interact with stakeholders, and deliver high-quality solutions under minimal supervision.
To qualify for the role, you must have
- A Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
- 4–7 years of industry experience in data engineering or data architecture roles.
- Proven working experience in AWS Databricks environment with production-grade data pipelines.
- Working knowledge of Agile/Scrum methodologies and data delivery lifecycle.
- Strong communication skills (both written and verbal) and ability to present complex solutions clearly.
- Flexible, proactive, and self-motivated working style with strong ownership of problem resolution.
- Experience across all phases of data solution delivery lifecycle – analysis, design, development, testing, deployment, and support.
Ideally, you’ll also have
What we look for
- People with technical experience and enthusiasm to learn new things in this fast-moving environment
You get to work with inspiring and meaningful projects. Our focus is education and coaching alongside practical experience to ensure your personal development. We value our employees and you will be able to control your own development with an individual progression plan. You will quickly grow into a responsible role with challenging and stimulating assignments. Moreover, you will be part of an interdisciplinary environment that emphasizes high quality and knowledge exchange. Plus, we offer:
- Support, coaching and feedback from some of the most engaging colleagues around
- Opportunities to develop new skills and progress your career
- The freedom and flexibility to handle your role in a way that’s right for you
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.