In this highly visible role, your primary responsibilities will include:
Experience in Python
Experience with at least one of the following model deployment frameworks: VLLM, Triton, or TensorRT-LLM
Experience scaling or optimizing machine learning models in production environments
Minimum requirement of BS and 3+ years of relevant industry experience
Understanding of model optimization techniques (e.g., quantization, pruning, or format conversions)
Familiarity with containerization and orchestration tools such as Docker or Kubernetes
Ability to evaluate model choices based on hardware efficiency and constraints
Exposure to performance monitoring and observability systems for ML workloads
Designed and optimized RESTful services
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.