Job Summary:
We are seeking a skilled Data Engineer with a strong background in data ingestion, processing, and storage. The ideal candidate will have experience working with various data sources and technologies, particularly in a cloud environment. You will be responsible for designing and implementing data pipelines, ensuring data quality, and optimizing data storage solutions.
Key Responsibilities:
- Design, develop, and maintain scalable data pipelines for data ingestion and processing using Python, Spark, and AWS services.
- Work with on-prem Oracle databases, batch files, and Confluent Kafka for data sourcing.
- Implement and manage ETL processes using AWS Glue and EMR for batch and streaming data.
- Develop and maintain data storage solutions using Medallion Architecture in S3, Redshift, and Oracle.
- Collaborate with cross-functional teams to understand data requirements and deliver solutions that meet business needs.
- Monitor and optimize data workflows using Airflow and other orchestration tools.
- Ensure data quality and integrity throughout the data lifecycle.
- Implement CI/CD practices for data pipeline deployment using Terraform and other tools.
- Utilize monitoring and logging tools such as CloudWatch, Datadog, and Splunk to ensure system reliability and performance.
- Communicate effectively with stakeholders to gather requirements and provide updates on project status.
Technical Skills Required:
- Proficient in Python for data processing and automation.
- Strong experience with Apache Spark for large-scale data processing.
- Familiarity with AWS S3 for data storage and management.
- Experience with Kafka for real-time data streaming.
- Knowledge of Redshift for data warehousing solutions.
- Proficient in Oracle databases for data management.
- Experience with AWS Glue for ETL processes.
- Familiarity with Apache Airflow for workflow orchestration.
- Experience with EMR for big data processing.
- Mandatory: Strong AWS data engineering skills.
:
- Familiarity with Terraform for infrastructure as code.
- Experience with messaging services such as SNS and SQS.
- Knowledge of monitoring and logging tools like CloudWatch, Datadog, and Splunk.
- Experience with AWS DataSync, DMS, Athena, and Lake Formation.
Communication Skills:
- Excellent verbal and written communication skills are mandatory for effective collaboration with team members and stakeholders.