In this highly visible role, your primary responsibilities will include:
Experience in Python
Experience with at least one of the following model deployment frameworks: VLLM, Triton, or TensorRT-LLM
Experience scaling and optimizing machine learning models in production environments
Minimum requirement of BS and 10+ years of relevant industry experience
Understanding of model optimization techniques (e.g., quantization, pruning, or format conversions)
Familiarity with containerization and orchestration tools such as Docker or Kubernetes
Proven track record of leading complex ML infrastructure projects or cross-functional initiatives from concept to production
Ability to evaluate model choices based on hardware efficiency and constraints
Exposure to performance monitoring and observability systems for ML workloads
Designed and optimized RESTful services
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.