We are seeking a highly skilled and motivated AWS Data Engineer with 3-7 years of experience in AWS Glue, AWS Redshift, S3, and Python to join our dynamic team. As a Data Engineer, you will be responsible for designing, developing, and optimizing data pipelines and solutions that support business intelligence, analytics, and large-scale data processing. You will work closely with data scientists, analysts, and other engineering teams to ensure seamless data flow across our systems.
Key Responsibilities:
- Design and Develop ETL Pipelines: Leverage AWS Glue to design and implement scalable ETL (Extract, Transform, Load) processes that move and transform data from various sources into AWS Redshift or other storage systems.
- Engineering governed, batch and near real time data pipelines using AWS native technologies like DirectConnect, S3, Lambda functions, Glue, Kinesis and CloudTrail or equivalent
- Designing and implementing serverless data engineering workloads using AWS ecosystem , taking inputs from S3, RDS, and other cloud based sources (ex: SaaS data) , applying business transformations using distributed compute (ex : EMR, Glue, Spark, etc. ) and persisting insights in the target store (ex: S3, Redshift, DynamoDB)
- Maintain, optimize, and scale AWS Redshift clusters to ensure efficient data storage, retrieval, and query performance.
- Utilize Amazon S3 to store raw data, manage large datasets, and integrate with other AWS services to ensure secure, scalable, and cost-effective data solutions.
- Create and manage AWS Glue crawlers and jobs to automate data cataloging and ingestion processes across various structured and unstructured data sources.
- Use Python (and PySpark within Glue) to write scripts for data transformation, integration, and automation tasks, ensuring clean, efficient, and reusable code.
- Ensure data accuracy and integrity by implementing data validation, cleansing, and error-handling processes in ETL pipelines.
- Optimize AWS Glue jobs, Redshift queries, and data flows to ensure optimal performance and reduce processing times and costs.
- Enable data consumption from reporting and analytics business applications using AWS services (ex: QuickSight, Sagemaker, JDBC / ODBC connectivity, etc.)
- Experience in identify, define and design logical data model, required entities, relationships, data constraints and dependencies focused on enabling reporting and analytics business use cases
- Work closely with data scientists, analysts, and stakeholders to understand data requirements and provide solutions that enable data-driven decision-making.
- Monitoring and Troubleshooting: Develop and implement monitoring strategies to ensure data pipelines are running smoothly. Quickly troubleshoot and resolve any data-related issues or failures.
Required Skills and Qualifications:
- 3-7 years of experience in data engineering or a similar role, with a focus on AWS technologies.
- Academic background in Computer science
- Strong background in programming- PySpark, SQL, Stored Procedures, Python
- Strong experience with AWS Glue building ETL pipelines, managing crawlers, and working with Glue data catalogue.
- Proficiency in AWS Redshift designing and managing Redshift clusters, writing complex SQL queries, and optimizing query performance.
- Hands-on experience with Amazon S3 data storage, data lifecycle policies, and integration with other AWS services.
- Solid programming skills in Python especially for data manipulation (using libraries like pandas) and automation of ETL jobs.
- Experience with PySpark within AWS Glue for large-scale data transformations.
- Proficiency in writing and optimizing SQL queries for data manipulation and reporting.
- Familiarity with data warehouse concepts: star schemas, partitioning, indexing, and data normalization.
- Strong problem-solving skills and attention to detail.
- Experience with version control systems like SVN, Git.
- Experience with Data Streaming Technologies like AWS Kinesis and Kafka Implementation on AWS
Good To have :
- Knowledge of AWS IAM for managing secure access to data resources.
- Familiarity with DevOps practices and automation tools like Terraform or CloudFormation.
- Experience with data visualization tools like QuickSight or integrating Redshift data with BI tools (Tableau, PowerBI, etc.).
- AWS certifications such as AWS Certified Data Analytics – Specialty or AWS Certified Solutions Architect are a plus.
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.