Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Senior ML Research Engineer – LLM Quantization & Model Optimization
Taiwan, Taoyuan City
733180218

17.07.2025

Required/Minimum Qualifications

Doctorate in relevant field OR equivalent experience.
4+ years of combined experience, including 2+ years of industry experience in low-precision model optimization and quantization for LLM workloads

Other Qualifications

Abilityto meet Microsoft, customer and/or government security screening requirementsarerequired for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications

Experience publishing academic papers as a lead author or essential contributor.
Experience participating in a top conference in relevant research domain.
Proven track record in developing production-scale software for model compression and performance optimization.
Proficient with deep learning frameworks such as PyTorch, TensorFlow, TensorRT, and ONNX Runtime.
In-depth understanding of Transformer and LLM architecture, including various model optimization techniques such as quantization, pruning, neural architecture search (NAS), knowledge distillation, sharding/parallelism, KV cache optimization, and FlashAttention.
Hands-on experience in setting up large scale evaluation framework for SOTA LLMs, fine tuning of large models.
Programming skills in Python, C, and C++.
Excellent communication skills and a team-oriented mindset.
Hands-on experience implementing and optimizing low-level linear algebra routines and custom BLAS kernels would be a plus.
Deep knowledge of mixed-precision arithmetic unit microarchitecture would be a plus.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:Microsoft will accept applications for the role until July 18th, 2025.

Responsibilities

Design and develop novel quantization techniques to enable efficient deployment of LLM inference and training in Microsoft’s Azure production environments.
Drive software development and model optimization tooling proof-of-concept effort to streamline deployment of quantized models.
Analyze performance bottlenecks in state-of-the-art LLM architectures and drive performance improvements.
Prototype and evaluate emerging low-precision data formats through proof-of-concept implementations.
Co-design model architecture optimized for low-precision deployment in close collaboration with companywide AI teams.
Work cross-functionally with data scientists and ML researchers/engineers to align on model accuracy and performance goals.
Partner with hardware architecture and AI software framework teams to ensure end-to-end system efficiency.

These jobs might be a good fit

Apple LLM Research Engineer Model Optimization Algorithms Developm... United States, Washington, Seattle

Google Senior Software Engineer ML Deployment Optimization Taiwan, New Taipei

Professional CV Builder tool from Expoint.

Get to the top of the "yes list" with a standout CV!

CREATE CV

Microsoft Senior ML Research Engineer – LLM Quantization & Model Optimization Taiwan, Taoyuan City 733180218

Apple LLM Research Engineer Model Optimization Algorithms Developm... United States, Washington, Seattle

Google Senior Software Engineer ML Deployment Optimization Taiwan, New Taipei

Google Senior Software Engineer ML Deployment Optimization Taiwan, New Taipei

Google Senior Software Engineer ML Deployment Optimization Taiwan, New Taipei

Microsoft Senior ML Research Engineer – LLM Quantization & Model Optimization
Taiwan, Taoyuan City
733180218