The Senior Data Engineer will:
- Clean, aggregate, and organize data from disparate sources and transfer it to data warehouses.
- Support development testing and maintenance of data pipelines and platforms, to enable data quality to be utilized within business dashboards and tools.
- Create, maintain, and support the data platform and infrastructure that enables the analytics front-end; this includes the testing, maintenance, construction, and development of architectures such as high-volume, large-scale data processing and databases with proper verification and validation processes.
Data Engineering
- Design, develop, optimize, and maintain data architecture and pipelines that adheres to ETL principles and business goals.
- Develop and maintain scalable data pipelines, build out new integrations using AWS native technologies to support continuing increases in data source, volume, and complexity.
- Define data requirements, gather and mine large scale of structured and unstructured data, and validate data by running various data tools in the Big Data Environment.
- Support standardization, customization and ad hoc data analysis and develop the mechanisms to ingest, analyze, validate, normalize, and clean data.
- Write unit/integration/performance test scripts and perform data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
- Implement processes and systems to drive data reconciliation and monitor data quality, ensuring production data is always accurate and available for key stakeholders, downstream systems, and business processes.
- Lead the evaluation, implementation and deployment of emerging tools and processes for analytic data engineering to improve productivity.
- Develop and deliver communication and education plans on analytic data engineering capabilities, standards, and processes.
- Learn about machine learning, data science, computer vision, artificial intelligence, statistics, and/or applied mathematics.
- Solve complex data problems to deliver insights that help achieve business objectives.
- Implement statistical data quality procedures on new data sources by applying rigorous iterative data analytics.
Relationship Building and Collaboration
- Partner with Business Analytics and Solution Architects to develop technical architectures for strategic enterprise projects and initiatives.
- Coordinate with Data Scientists to understand data requirements, and design solutions that enable advanced analytics, machine learning, and predictive modelling.
- Support Data Scientists in data sourcing and preparation to visualize data and synthesize insights of commercial value.
- Collaborate with AI/ML engineers to create data products for analytics and data scientist team members to improve productivity.
- Advise, consult, mentor and coach other data and analytic professionals on data standards and practices, promoting the values of learning and growth.
- Foster a culture of sharing, re-use, design for scale stability, and operational efficiency of data and analytical solutions.
Technical/Functional Expertise
- Advanced experience and understanding of data/Big Data, data integration, data modelling, AWS, and cloud technologies.
- Strong business acumen with knowledge of the Pharmaceutical, Healthcare, or Life Sciences sector is preferred, but not required.
- Ability to build processes that support data transformation, workload management, data structures, dependency, and metadata.
- Ability to build and optimize queries (SQL), data sets, 'Big Data' pipelines, and architectures for structured and unstructured data.
- Experience with or knowledge of Agile Software Development methodologies.
Leadership
- Strategic mindset of thinking above the minor, tactical details and focusing on the long-term, strategic goals of the organization.
- Advocate of a culture of collaboration and psychological safety.
Decision-making and Autonomy
- Shift from manual decision-making to data-driven, strategic decision-making.
- Proven track record of applying critical thinking to resolve issues and overcome obstacles.
Interaction
- Proven track record of collaboration and developing strong working relationships with key stakeholders by building trust and being a true business partner.
- Demonstrated success in collaborating with different IT functions, contractors, and constituents to deliver data solutions that meet standards and security measures.
Innovation
- Passion for re-imagining new solutions, processes, and end-user experience by leveraging digital and disruptive technologies and developing advanced data and analytics solutions.
- Advocate of a culture of growth mindset, agility, and continuous improvement.
Complexity
- Demonstrates high multicultural sensitivity to lead teams effectively.
- Ability to coordinate and problem-solve amongst larger teams.
Essential skillsets
- Bachelor’s degree in Engineering, Computer Science, Data Science, or related field
- 5+ years of experience in software development, data science, data engineering, ETL, and analytics reporting development
- Experience designing, building, implementing, and maintaining data and system integrations using dimensional data modelling and development and optimization of ETL pipelines
- Proven track record of designing and implementing complex data solutions
- Demonstrated understanding and experience using:
- Data Engineering Programming Languages (i.e., Python)
- Distributed Data Technologies (e.g., Pyspark)
- Cloud platform deployment and tools (e.g., Kubernetes)
- Relational SQL databases
- DevOps and continuous integration
- AWS cloud services and technologies (i.e., Lambda, S3, DMS, Step Functions, Event Bridge, Cloud Watch, RDS)
- Databricks/ETL
- IICS/DMS
- GitHub
- Event Bridge, Tidal
- Strong organizational skills with the ability to manage multiple projects simultaneously and operate as a leading member across globally distributed teams to deliver high-quality services and solutions
- Understanding of database architecture and administration
- Processes high proficiency in code programming languages (e.g., SQL, Python, Pyspark, AWS services) to design, maintain, and optimize data architecture/pipelines that fit business goals
- Extracts, transforms, and loads data from multiple external/internal sources using Databricks Lakehouse/Data Lake concepts into a single, consistent source to serve business users and data visualization needs
- Utilizes the principles of continuous integration and delivery to automate the deployment of code changes to elevate environments, fostering enhanced code quality, test coverage, and automation of resilient test cases
- Excellent written and verbal communication skills, including storytelling and interacting effectively with multifunctional teams and other strategic partners
- Strong problem solving and troubleshooting skills
- Ability to work in a fast-paced environment and adapt to changing business priorities
Desired skillsets
- Degree in Engineering, Computer Science, Data Science, or related field
- Experience in a global working environment
Travel requirements
- Access to transportation to attend meetings
- Ability to fly to meetings regionally and globally
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.