Key Responsibilities
- Design and implement robust, scalable, and efficient data pipelines and architectures on AWS.
- Develop data models and schemas to support business intelligence and analytics requirements.
- Utilize AWS services such as S3, Redshift, EMR, Glue, Lambda, and Kinesis to build and optimize data solutions.
- Implement data security and compliance measures using AWS IAM, KMS, and other security services.
- Design and develop ETL processes to ingest, transform, and load data from various sources into data warehouses and lakes.
- Ensure data quality and integrity through validation, cleansing, and transformation processes.
- Optimize data storage and retrieval performance through indexing, partitioning, and other techniques.
- Monitor and troubleshoot data pipelines to ensure high availability and reliability.
- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver solutions.
- Provide technical leadership and mentorship to junior data engineers and team members.
- Identify opportunities to automate and streamline data processes for increased efficiency.
- Participate in on-call rotations to provide support for critical systems and services.
Required qualifications, capabilities, and skills
- Experience in software development and data engineering, with demonstrable hands-on experience in Python and PySpark.
- Proven experience with cloud platforms such as AWS, Azure, or Google Cloud.
- Good understanding of data modeling, data architecture, ETL processes, and data warehousing concepts.
- Experience or good knowledge of cloud native ETL platforms like Snowflake and/or Databricks.
- Experience with big data technologies and services like AWS EMRs, Redshift, Lambda, S3.
- Proven experience with efficient Cloud DevOps practices and CI/CD tools like Jenkins/Gitlab, for data engineering platforms.
- Good knowledge of SQL and NoSQL databases, including performance tuning and optimization.
- Experience with declarative infra provisioning tools like Terraform, Ansible or CloudFormation.
- Strong analytical skills to troubleshoot issues and optimize data processes, working independently and collaboratively.
Preferred qualifications, capabilities, and skills
- Knowledge of machine learning model lifecycle, language models and cloud-native MLOps pipelines and frameworks is a plus.
- Familiarity with data visualization tools and data integration patterns.