Job responsibilities
- Design, develop, and maintain data ingestion pipelines to collect, process, and transport large volumes of data between various sources, sinks and stores into our data ecosystem
- Collaborate with data engineers, data scientists, and other stakeholders to normalize, correlate and integrate data from diverse sources, ensuring data quality and consistency.
- Identify and resolve bottlenecks in data ingestion processes, optimizing performance and throughput.
- Implement robust monitoring and alerting systems to proactively identify and address data ingestion issues. Troubleshoot and resolve data ingestion problems promptly.
- Ensure data security and compliance by implementing appropriate access controls, encryption, and data protection measures within the ingestion pipelines.
- Maintain comprehensive documentation of data ingestion processes, configurations, and best practices.
- Utilize your expertise in a development language (e.g Java, Python) to provide management services for data ingestion solutions, ensuring code quality and performance optimization.
Required qualifications, capabilities, and skills
- Formal training or certification on data engineering concepts and 5+ years applied experience.
- Experience deploying, using and monitoring AWS data services including Glue, Athena or Neptune
- Experience with Graph Databases, including Cypher and Gremlin, or Relational Databases including DML, DDL, and PL/SQL.
- Experience managing deployment, upgrade, redeployment and teardown of Cloud services using Terraform
- Experience integrating data platforms through Continuous Integration & Deployment (CI/CD) - ideally, using git/Bitbucket and Jenkins or spinnaker
- Experience in building data pipelines using Spark, Glue or similar. Strong understanding of data integration concepts, ETL processes, and data modeling.
- Proficient in working with various data formats, including JSON, XML, and CSV.
- Demonstrated execution of full delivery (designing, developing, coding, testing, debugging and documenting) of secure data systems to satisfy business requirements.
Preferred qualifications, capabilities, and skills
- Experience working in an Agile Development environment and is able to contribute/facilitate Agile ceremonies.
- Proficient in full stack development having experience creating REST based services on Java/Spring/Spring Boot.
- Test driven development using modern source control and continuous integration
- Experience with API and GraphQL interfaces. Experience with data ingestion, transports and message busses (e.g EKS, Apache Kafka, Apache Nifi, Pentaho, Apache Hop, or similar)
- Good understanding of Data Warehousing and Data Modelling concepts on AWS Redshift .Knowledge of distributed computing and big data technologies (e.g., Hadoop, Spark)
- Strong interpersonal and communication skills to work collaboratively in and across teams to achieve common goals