Your key responsibilities
- Designing and implementing data pipelines that reliably process large volumes of data
- Managing and optimizing data infrastructure and systems
- Ensuring data quality, security, and compliance
- Collaborating with stakeholders to understand data requirements
- Troubleshooting and maintaining production systems
Skills and attributes for success
- Technical Foundation:Need strong programming skills, particularly in Python, with hands-on experience building real systems in production environments. Should be comfortable working with modern data platforms like Databricks and be proficient in tools that orchestrate data workflows.
- Data Infrastructure Knowledge: Understanding how data flows through systems are essential. This includes knowledge of databases, data warehouses, data lakes, architectures—essentially knowing different ways to store and organize data at scale.
- Cloud Proficiency:You need practical experience with major cloud providers (AWS, Azure, or GCP) and their data services, as modern data engineering happens almost exclusively in the cloud.
- Problem-Solving Mindset: The role requires analytical thinking to design efficient solutions and troubleshoot issues when systems fail or perform poorly.
To qualify for the role, you must have
- : 5+ years of hands-on work in production environments. You should be able to write clean, efficient code at scale.
- Python data libraries: Proficiency with Pandas, NumPy, SQL Alchemy, and PySpark for data manipulation and processing.
- Workflow orchestration: Experience with tools like Apache Airflow, Prefect, or Luigi that automate and schedule data jobs.
- SQL and Databases: Solid SQL skills and familiarity with both relational databases (PostgreSQL, MySQL) and NoSQL databases (MongoDB, Cassandra).
- Cloud platforms: Working knowledge of AWS, Azure , or GCP data services.
- Data Lake and Lakehouse concepts: Understanding how to structure and manage large-scale data storage systems.
- Git
Ideally, you’ll also have
- : A bachelor's or master's degree in computer science, engineering, or a related field
- Advanced Databricks Skills:Knowledge of Databricks ML flow for machine learning pipelines, Databricks SQL, and BI tool integrations. This shows you can work across the analytics and machine learning spectrum
- Open-source Contributions:Active contributions to open-source Python projects. This demonstrates engagement with the tech community and software engineering best practices.
- Data Science Knowledge
What we look for
EY People Consulting is likely seeking someone who:
- Has a proven track record of shipping data systems to production
- Balances technical depth with practical business awareness
- Can communicate across technical and non-technical audiences
- Is proactive about learning and staying current with evolving technologies
- Works well in teams and takes pride in enabling others' success