Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

MSD Associate Specialist Scientific Data Engineering 
India, Telangana, Hyderabad 
390194765

Today

Job Description

The Opportunity

  • Based in Hyderabad, join a global healthcare biopharma company and be part of a 130-year legacy of success backed by ethical integrity, forward momentum, and an inspiring mission to achieve new milestones in global healthcare.
  • Be part of an organisation driven by digital technology and data-backed approaches that support a diversified portfolio of prescription medicines, vaccines, and animal health products.
  • Drive innovation and execution excellence. Join a team that is passionate about using data, analytics, and insights to drive decision-making and create custom software, allowing us to tackle some of the world's greatest health threats.

Role Overview

  • Design, develop, and maintain data pipelines to extract data from various sources and populate a data lake and data warehouse.
  • Work closely with data scientists, analysts, and business teams to understand data requirements and deliver solutions aligned with business goals.
  • Build and maintain platforms that support data ingestion, transformation, and orchestration across various data sources, both internal and external.
  • Use data orchestration, logging, and monitoring tools to build resilient pipelines.
  • Automate data flows and pipeline monitoring to ensure scalability, performance, and resilience of the platform.
  • Monitor, troubleshoot, and resolve issues related to the data integration platform, ensuring uptime and reliability.
  • Maintain thorough documentation for integration processes, configurations, and code to ensure easy onboarding for new team members and future scalability.
  • Develop pipelines to ingest data into cloud data warehouses.
  • Establish, modify and maintain data structures and associated components.
  • Create and deliver standard reports in accordance with stakeholder needs and conforming to agreed standards.
  • Work within a matrix organizational structure, reporting to both the functional manager and the project manager.
  • Participate in project planning, execution, and delivery, ensuring alignment with both functional and project goals.

What should you have

  • Bachelors’ degree in Information Technology, Computer Science or any Technology stream.
  • 1+ years of developing data pipelines & data infrastructure, ideally within a drug development or life sciences context.
  • Demonstrated expertise in delivering large-scale information management technology solutions encompassing data integration and self-service analytics enablement.
  • Experienced in software/data engineering practices (including versioning, release management, deployment of datasets, agile & related software tools).
  • Ability to design, build and unit test applications on
    Spark framework on Python
    .
  • Build PySpark based applications for both
    batch and streaming
    requirements, which will require in-depth knowledge on Databricks/ Hadoop.
  • Experience working with storage frameworks like Delta Lake/ Iceberg
  • Experience working with MPP Datawarehouse’s like Redshift
  • Cloud-native, ideally AWS certified.
  • Strong working knowledge of at least one Reporting/Insight generation technology
  • Good interpersonal and communication skills (verbal and written).
  • Proven record of delivering high-quality results.
  • Product and customer-centric approach.
  • Innovative thinking, experimental mindset.

Foundational Data Concepts

SQL (Intermediate / Advanced)

Python (Intermediate)

Cloud Fundamentals (AWS Focus)

AWS Console, IAM roles, regions, concept of cloud computing

AWS S3

Data Processing & Transformation

Apache Spark (Concepts & Usage)

Databricks (Platform Usage), Unity Catalog, Delta Lake

ETL & Orchestration

AWS Glue (ETL, Catalog), Lambda

Apache Airflow (DAGs and Orchestration)
or other orchestration tool

dbt (Data Build Tool)

Matillion (or similar ETL tool)

Data Storage & Querying

Amazon Redshift / Azure Synapse

Trino / Equivalent

AWS Athena / Query Federation

Data Quality & Governance

Data Quality Concepts / Implementation

Data Observability Concepts

Collibra / equivalent tool

Real-time / Streaming

Apache Kafka (Concepts & Usage)

DevOps & Automation

CI / CD concepts, Pipelines
(GitHub Actions / Jenkins / Azure DevOps)


What we look for:

Current Contingent Workers apply




*A job posting is effective until 11:59:59PM on the dayBEFOREthe listed job posting end date. Please ensure you apply to a job posting no later than the dayBEFOREthe job posting end date.