Design, build, and maintain scalable data pipelines and infrastructure to support the collection, processing, and analysis of large volumes of data.
Develop robust ETL processes to extract, transform, and load data from various sources into our data warehouse.
Collaborate with cross-functional teams to understand data requirements and implement solutions to address business needs.
Optimize data processing workflows for performance, reliability, and scalability.
Implement data quality monitoring and validation processes to ensure accuracy and consistency of data.
Work closely with software engineers to integrate data-driven features and functionalities into our products and services.
Stay abreast of emerging technologies and best practices in data engineering, and propose innovative solutions to enhance our data infrastructure.
What we expect:
Proven experience (5+ years) as a Software Engineer or similar role, with a focus on building data pipelines and infrastructure.
Proficiency in Python programming and experience with relevant libraries and frameworks for data processing (e.g. Pydantic, MongoDB, FastAPI, Redis, Pandas, NumPy, Spark).
Strong understanding of database systems, with experience in designing and optimizing queries.
Hands-on experience with cloud platforms, particularly AWS (Amazon Web Services), and familiarity with services such as S3, ECS, SQS.
Experience working with large-scale distributed systems and parallel processing frameworks.
Solid understanding of data modeling concepts and techniques.
Excellent problem-solving skills and attention to detail.
Strong communication and collaboration skills, with the ability to work effectively in a team environment.