Help validate client requirements and architecture proposals for AWS components
Design, build and operationalize large scale enterprise data solutions and applications using one or more of AWS data and analytics services in combination with 3rd parties - Spark, EMR, DynamoDB, RedShift, Kinesis, Lambda, Glue, Event bridge and any data platforms/warehouse like Databricks or Snowflake.
Lead a team of data engineers in designing low level design for various data engineering components – ETL/ data quality and lineage from client requirements using tools like Glue/EMR/Redshift and other third party tools like Snowflake
Lead development of re-usable frameworks and patterns for ingestion, DQ, audit and other data management tasks
Help troubleshoot performance and cost challenges in solutions with appropriate fixes
Support upkeep of DevOps pipelines as required
Help automation of operations through jobs for alerting/monitoring and remediations
Experience in building spark-based data processing frameworks and handling optimizations (PySpark/Python)
Experience in troubleshooting new/existing ETL packages, SCD methodologies and historic/incremental data load handling
Design and build production data pipelines from ingestion to consumption within a big data architecture, using Spark
Experience in SQL and stored proc development
Strong analytical and problem solving skills to overcome technical challenges and provide appropriate technical guidance to the team
Skills
8-12 years of work experience with ETL, Data warehouses, Data Platforms, Data Modelling, and Data Architecture
A minimum of 3+ project experience with AWS data solutions
Has experience in any of the following AWS services
Proficiencies in more than 5+ analytical services – Glue, EMR, Athena, Redshift, Kinesis, Lambda, DynamoDB, Sagemaker, Eventbridge
Strong experience in the foundational services – S3/Glacier, Cloud trail/watch
Exposure to third party services like – Snowflake, Matillion, Kafka, Privicera etc.
Expert-level skills in writing and optimizing SQL.
Very strong Python and Spark experience, including optimization techniques.
Expertise in streaming pipeline using Kinesis/Kafka/Beam/Flink big plus
Experience with Big Data technologies such as Hadoop/Hive/Spark.
Solid Linux skills.
Experience working in very large data warehouses or data lakes.
Ability performance tune data pipeline (Glue/EMR) and Redshift, knows how to optimize the distribution, partitioning, and MPP of high-level data structures.
Show efficiency in handling data - tracking data lineage, ensuring data quality, and improving discoverability of data.
Sound understanding of DevOps on AWS with prior experience using AWS or open source stacks
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.