

Share
In this role, you’ll design, build, and maintain scalable, high-performance data pipelines that handle massive volumes of real-time telemetry data from hardware, communication modules, firmware, and large-scale AI and HPC clusters.
What you’ll be doing:
Define and execute the group’s data technical roadmap, aligning with R&D, hardware, and DevOps teams
Design and maintain flexible ETL/ELT frameworks for ingesting, transforming, and classifying telemetry and performance data
Build and optimize streaming pipelines using Apache Spark, Kafka, and Databricks, ensuring high throughput, reliability, and adaptability to evolving data schemas
Implement and maintain observability and data quality standards, including schema validation, lineage tracking, and metadata management
Develop monitoring and alerting for pipeline health using Prometheus, Grafana, or Datadog
Support self-service analytics for engineers and researchers via Databricks notebooks, APIs, and curated datasets
Promote best practices in data modeling, code quality, security, and operational excellence across the organization
Deliver reliable insights for cluster performance analysis, telemetry visibility, and end-to-end test coverage
What we need to see:
B.Sc. or M.Sc. in Computer Science, Computer Engineering, or a related field
5+ years of hands-on experience in data engineering or backend development
Strong practical experience with Apache Spark (PySpark or Scala) and Databricks
Expertise with Apache Kafka, including stream ingestion, schema registry, and event processing
Proficiency in Python and SQL for data transformation, automation, and pipeline logic
Familiarity with ETL orchestration tools (Airflow, Prefect, or Dagster)
Experience with schema evolution, data versioning, and validation frameworks (Delta Lake, Iceberg, or Great Expectations)
Solid understanding of cloud environments (AWS preferred; GCP or Azure also relevant)
Knowledge of streaming and telemetry data architectures in large-scale, distributed systems
Ways to stand out from the crowd:
Exposure to hardware, firmware, or embedded telemetry environments.
Experience with real-time analytics frameworks (Spark Structured Streaming, Flink, Kafka Streams)
Experience with data cataloging or governance tools (DataHub, Collibra, or Alation)
Familiarity with CI/CD for data pipelines andinfrastructure-as-code(Terraform, GitHub Actions)
Experience designing performance metrics data systems (latency, throughput, resource utilization) that support high-volume, high-frequency telemetry at scale
These jobs might be a good fit