Share
Design, develop, and maintain robust data pipelines in Hadoop and related ecosystems, ensuring data reliability, scalability, and performance.
Implement data ETL processes for batch and streaming analytics requirements.
Optimize and troubleshoot distributed systems for ingestion, storage, and processing.
Collaborate with data engineers, analysts, and platform engineers to align solutions with business needs.
Ensure data security, integrity, and compliance throughout the infrastructure.
Maintain documentation and contribute to architecture reviews.
Participate in incident response and operational excellence initiatives for the data warehouse.
Continuously learn and apply new Hadoop ecosystem tools and data technologies
Extensive experience with Apache Kafka, Apache Flink, and other relevant streaming technologies.
Proficiency in Hadoop ecosystems such as Hive, Iceberg, Spark sql.
Good Understanding of Apache Airflow tool for orchestrating complex computational workflows and data processing pipelines.
Proven ability to design and implement automated data pipelines and materialized views.
Proficiency in Python, Unix or similar languages.
Good understanding of SQL oracle, SQL server or similar languages.
Java + Spring Boot: Build and maintain microservices.
Ops & CI/CD: Monitoring (Prometheus/Grafana), logging, pipelines (Jenkins/GitHub Actions).
Core Engineering: Data structures/algorithms, testing (JUnit/pytest), Git, clean code.
Cloud Native: Docker, Kubernetes (deploy, network, scale, troubleshoot).
6+ years of directly applicable experience
BS in Computer Science, Engineering, or equivalent experience.
These jobs might be a good fit