Job Description
Job responsibilities
- Design, develop, and maintain robust data pipelines and ETL processes to ingest, process, and store large volumes of data from various sources.
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business needs.
- Optimize and improve existing data systems for performance, scalability, and reliability.
- Implement data quality checks and validation processes to ensure data accuracy and integrity.
- Monitor and troubleshoot data pipeline issues, ensuring timely resolution and minimal disruption.
- Stay up-to-date with industry trends and best practices in data engineering and incorporate them into our processes.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 5+ years of applied experience
- Proven experience as a Data Engineer or in a similar role.
- Strong proficiency in SQL and experience with relational databases (e.g., MySQL, PostgreSQL).
- Experience with big data technologies (e.g., Hadoop, Spark) and cloud platforms, particularly AWS.
- Proficiency in programming languages such as Python, Java, or Scala.
- Familiarity with data warehousing solutions, especially Snowflake, and ETL tools.
- Experience with infrastructure as code tools, particularly Terraform.
- Experience with Airflows or AWS MWAA.
- Experience with containerization and orchestration tools, especially Kubernetes.
- Proficiency with AWS services such as EKS (Elastic Kubernetes Service), EMR (Elastic MapReduce), Lambda, DynamoDb, and ECS (Elastic Container Service).
- Excellent problem-solving skills and attention to detail.
Preferred qualifications, capabilities, and skills
- Experience with Python, Java, Scala, AWS MWAA
- Knowledge of Hadoop, AWS, Terraform concepts and frameworks.
- Bachelor's degree in Computer Science, Engineering, or a related field.