Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Apple Apple Ray Inference Engineer
United States, California, Cupertino
159272779

03.08.2025

* Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale* Experiment with, deploy, and manage LLMs in a production context* Benchmark and optimize inference deployments for different workloads, e.g. online vs. batch vs. streaming workloads* Diagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performance * Design and extend services to improve functionality and reliability of the platform* Monitor system performance, optimize for cost and efficiency, and resolve any issues that arise* Build relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end users

5+ years of experience in distributed systems with deep knowledge in computer science fundamentals
Experience managing deployments of LLMs at scale
Experience with inference runtimes/engines, e.g. ONNXRT, TensorRT, vLLM, sglang
Experience with ML Training/Inference profiling and optimization for different workloads and tasks, e.g. online inference, batch inference, streaming inference
Experience with profiling ML models for different end use cases, e.g. RAG vs. code completion, etc.
Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
Experience in delivering data and machine learning infrastructure in production environments
Experience configuring, deploying and troubleshooting large scale production environments
Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use
Experience with alerting, monitoring and remediation automation in a large scale distributed environment
Extensive programming experience in Java, Python or Go
Strong collaboration and communication (verbal and written) skills
B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience

Understanding of the ML lifecycle and state of the art ML Infrastructure technologies
Familiarity with CUDA + kernel implementation
Experience with inference optimization and fine-tuning techniques (e.g. pruning, distilling, quantization)
Experience with deploying + optimizing ML models on heterogenous hardware, e.g. GPUs, TPUs, Inferentia, etc.
Experience with GPU and other type of HPC infrastructure
Experience with training framework like PyTorch, Tensorflow, JAX
Deep understanding of Ray and KubeRay

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Full job details

These jobs might be a good fit

Apple Apple Ray Inference Engineer United States, West Virginia

Apple Senior Manager Apple Ray United States, California, Cupertino

Apple Senior Software Engineer Ray ML Infrastructure Apple Data Pl... United States, California, Cupertino

Flex X-ray Engineer China, Jiangsu, Suzhou City

Professional CV Builder tool from Expoint.

Get to the top of the "yes list" with a standout CV!

CREATE CV