Role Overview
As a senior Data Engineer, you will be responsible for building robust, scalable, and compliant data pipelines from raw scientific sources to integrated and consumption-ready data products in the R&D landscape. You will partner with architects, domain SMEs, and analysts to develop re-usable components supporting early discovery, development, and regulatory needs.
Key Responsibilities
- Design and implement pipelines across L1–L3 layers using Databricks and Spark.
- Develop modular, parameterized ETL processes for large-scale scientific data integration.
- Implement and maintain STTM mappings and transformation logic with domain BAs.
- Build metadata-driven orchestration, logging, and observability frameworks.
- Support performance optimization, validation, and deployment readiness across squads.
- Collaborate with architects on cloud-native best practices and standards.
Must-Have Skills
- 8+ years of experience in data engineering, with 3+ in life sciences or R&D settings.
- Hands-on expertise with Spark, Databricks, Delta Lake, and Python/SQL.
- Experience in building pipelines with STTM/SDTM alignment and scientific data quality requirements.
- Good understanding of L1–L3 pipeline layering and governance controls.
- Familiarity with GxP or other compliance-driven implementations.
Nice-to-Have Skills
- Exposure to ELNs, LIMS, compound registration systems, or bioassay platforms.
- Knowledge of Unity Catalog, Matillion, or metadata-driven architecture approaches.
- Experience with ingestion frameworks, schema evolution, and CI/CD for data.