Objectives and Purpose
- The Lead Data Engineer leads large scale solution architecture design and optimisation to provide streamlined insights to partners throughout the business. This individual leads the team of Mid- and Senior data engineers to partner with visualization on data quality and troubleshooting needs.
- The Lead Data Engineer will:
- Implement data processes for the data warehouse and internal systems
- Lead a team of Junior and Senior Data Engineers in executing data processes and providing quality, timely data management
- Managing data architecture, designing ETL process
- Clean, aggregate and organize data from disparate sources and transfer it to data warehouses.
- Lead development testing and maintenance of data pipelines and platforms, to enable data quality to be utilized within business dashboards and tools.
- Support team members and direct reports in refining and validating data sets.
- Create, maintain, and support the data platform and infrastructure that enables the analytics front-end; this includes the testing, maintenance, construction, and development of architectures such as high-volume, large-scale data processing and databases with proper verification and validation processes.
Data Engineering
- Lead the design, development, optimization, and maintenance of data architecture and pipelines adhering to ETL principles and business goals.
- Develop and maintain scalable data pipelines, build out new integrations using AWS native technologies and data bricks to support increases in data source, volume, and complexity.
- Define data requirements, gather and mine large scale of structured and unstructured data, and validate data by running various data tools in the Big Data Environment.
- Lead the ad hoc data analysis, support standardization, customization and develop the mechanisms to ingest, analyze, validate, normalize, and clean data.
- Write unit/integration/performance test scripts and perform data analysis required to troubleshoot data related issues.
- Implement processes and systems to drive data reconciliation and monitor data quality, ensuring production data is always accurate and available for key stakeholders, downstream systems, and business processes.
- Lead the evaluation, implementation and deployment of emerging tools and processes for analytic data engineering to improve productivity.
- Develop and deliver communication and education plans on analytic data engineering capabilities, standards, and processes.
- Solve complex data problems to deliver insights that help achieve business objectives.
- Partner with Business Analysts and Enterprise Architects to develop technical architectures for strategic enterprise projects and initiatives.
- Coordinate with Data Scientists, visualization developers and other data consumers to understand data requirements, and design solutions that enable advanced analytics, machine learning, and predictive modelling.
Relationship Building and Collaboration
- Partner with Business Analysts and Enterprise Architects to develop technical architectures for strategic enterprise projects and initiatives.
- Coordinate with Data Scientists, visualization developers and other data consumers to understand data requirements, and design solutions that enable advanced analytics, machine learning, and predictive modelling.
- Support Data Scientists in data sourcing and preparation to visualize data and synthesize insights of commercial value.
- Collaborate with AI/ML engineers to create data products for analytics and data scientist team members to improve productivity.
- Advise, consult, mentor and coach other data and analytic professionals on data standards and practices, promoting the values of learning and growth.
- Foster a culture of sharing, re-use, design for scale stability, and operational efficiency of data and analytical solutions.
Technical/Functional Expertise
- Advanced experience and understanding of data/Big Data, data integration, data modelling, AWS, and cloud technologies.
- Strong business acumen with knowledge of the Pharmaceutical, Healthcare, or Life Sciences sector is preferred, but not required.
- Expertise in building processes that support data transformation, workload management, data structures, dependency, and metadata.
- Expertise to build and optimize queries (SQL), data sets, 'Big Data' pipelines, and architectures for structured and unstructured data.
- Experience with or knowledge of Agile Software Development methodologies.
Leadership
- Mentoring Senior and Junior data engineers in the team
- Strategic mindset of thinking above the minor, tactical details and focusing on the long-term, strategic goals of the organization.
- Advocate of a culture of collaboration and psychological safety.
Decision-making and Autonomy
- Shift from manual decision-making to data-driven, strategic decision-making.
- Proven track record of applying critical thinking to resolve issues and overcome obstacles.
Interaction
- Proven track record of collaboration and developing strong working relationships with key stakeholders by building trust and being a true business partner.
- Demonstrated success in collaborating with different IT functions, contractors, and constituents to deliver data solutions that meet standards and security measures.
Innovation
- Passion for re-imagining new solutions, processes, and end-user experience by leveraging digital and disruptive technologies and developing advanced data and analytics solutions.
- Leading research and development (R&D) efforts in data engineering.
- Advocate of a culture of growth mindset, agility, and continuous improvement.
Complexity
- Demonstrates high multicultural sensitivity to lead teams effectively.
- Ability to coordinate and problem-solve amongst larger teams.
Essential skillsets
- Bachelor’s degree in Engineering, Computer Science, Data Science, or related field
- 9+ years of experience in software development, data engineering, ETL, and analytics reporting development.
- Expert in building and maintaining data and system integrations using dimensional data modelling and optimized ETL pipelines.
- Advanced experience utilizing modern data architecture and frameworks like data mesh, data fabric, data product design
- Experience with designing data integration frameworks capable of supporting multiple data sources, consisting of both structured and unstructured data
- Proven track record of designing and implementing complex data solutions
- Demonstrated understanding and experience using:
- Data Engineering Programming Languages (i.e., Python)
- Distributed Data Technologies (e.g., Pyspark)
- Cloud platform deployment and tools (e.g., Kubernetes)
- Relational SQL databases
- DevOps and continuous integration
- AWS cloud services and technologies (i.e., Lambda, S3, DMS, Step Functions, Event Bridge, Cloud Watch, RDS)
- Knowledge of data lakes, data warehouses, AI pipelines or similar
- Databricks/ETL
- IICS/DMS
- GitHub
- Event Bridge, Tidal
- Deep understanding of database architecture and administration
- Processes high proficiency in code programming languages (e.g., SQL, Python, Pyspark, AWS services) to design, maintain, and optimize data architecture/pipelines that fit business goals.
- Extracts, transforms, and loads data from multiple external/internal sources using Databricks Lakehouse/Data Lake concepts into a single, consistent source to serve business users and data visualization needs.
- Utilizes the principles of continuous integration and delivery to automate the deployment of code changes to elevate environments like by using GitHub Actions.
- Excellent written and verbal communication skills, including storytelling and interacting effectively with multifunctional teams and other strategic partners.
- Strong organizational skills with the ability to manage multiple projects simultaneously, operating as leading member across globally distributed teams.
- Strong problem solving and troubleshooting skills.
- Ability to work in a fast-paced environment and adapt to changing business priorities.
- Lead and oversee the code review process within the data engineering team to ensure high-quality, efficient, and maintainable code, ensure code is optimized for performance and scalability.
- Responsible for optimizing the performance of Python and Spark jobs/scripts to ensure efficient data processing.
- Identifying and implementing strategies to optimize AWS / Databricks cloud costs, ensuring efficient and cost-effective use of cloud resources.
Desired skillsets
- Master’s degree in Engineering, Computer Science, Data Science, or related field
- Experience in a global working environment
Travel requirements
- Access to transportation to attend meetings.
- Ability to fly to meetings regionally and globally.
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.