Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Apple Apple Ray Inference Engineer 
United States, California, Cupertino 
159272779

Yesterday
* Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale* Experiment with, deploy, and manage LLMs in a production context* Benchmark and optimize inference deployments for different workloads, e.g. online vs. batch vs. streaming workloads* Diagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performance * Design and extend services to improve functionality and reliability of the platform* Monitor system performance, optimize for cost and efficiency, and resolve any issues that arise* Build relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end users
  • 5+ years of experience in distributed systems with deep knowledge in computer science fundamentals
  • Experience managing deployments of LLMs at scale
  • Experience with inference runtimes/engines, e.g. ONNXRT, TensorRT, vLLM, sglang
  • Experience with ML Training/Inference profiling and optimization for different workloads and tasks, e.g. online inference, batch inference, streaming inference
  • Experience with profiling ML models for different end use cases, e.g. RAG vs. code completion, etc.
  • Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
  • Experience in delivering data and machine learning infrastructure in production environments
  • Experience configuring, deploying and troubleshooting large scale production environments
  • Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use
  • Experience with alerting, monitoring and remediation automation in a large scale distributed environment
  • Extensive programming experience in Java, Python or Go
  • Strong collaboration and communication (verbal and written) skills
  • B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience
  • Understanding of the ML lifecycle and state of the art ML Infrastructure technologies
  • Familiarity with CUDA + kernel implementation
  • Experience with inference optimization and fine-tuning techniques (e.g. pruning, distilling, quantization)
  • Experience with deploying + optimizing ML models on heterogenous hardware, e.g. GPUs, TPUs, Inferentia, etc.
  • Experience with GPU and other type of HPC infrastructure
  • Experience with training framework like PyTorch, Tensorflow, JAX
  • Deep understanding of Ray and KubeRay
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.