Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Member Technical Staff AI Data 
Taiwan, Taoyuan City 
121965266

24.04.2025

Help build the

This dataset, spanning all modalitiesfrom across the weband beyond, pushing the boundaries of scale, and product deployment.

to support our model pre-training operations,including collecting data from the source, extractingand transformingthe consumer

to the next generation of systems that will transform the field.In particular, we

  • Are passionate about the role of data in large-scale AI model training
  • Will thrive in a highly collaborative, fast-paced environment
  • Have a high degree of craftsmanship and pay close attention to details
  • Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies
  • Effectively manage multiple responsibilities and can adjust to shifting priorities

Required/Minimum Qualifications

  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field ANDexperience
  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work
  • OR equivalent experience.
  • Experience using data processing technologies for Multimodal dataset scalability, parellel processing, data handling, streaming/batch processing, etc.
  • Experience working with distributed computing tools such as; Spark, Kubernetes, TensorFlow, Flink and Pyspark.
  • Experience conducted research in Machine Learning or worked as an ML Engineer/ MLOps/ SWE.
  • Experience designing and developing data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video) AND have the skills to be able to build infrastructure to support this work from ground up.

Preferred:

  • Experience working with large scale of data ideal Petabyte scale or above.
Responsibilities
  • Design and developdata pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video).
  • Buildand maintaincutting-edge infrastructure that can store and process the petabytes of data needed to power models.
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation.
  • Collaborate with theproductteam and other engineers and researchers across Microsoft AI to identify gaps in the current generation of models.
  • Embodyourand