The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Apple Senior Software Engineer ML Inference
United States, California, Cupertino
670296332

07.04.2025

We value a culture of speed and agility—"iterate quickly, improve continuously"—where rapid iteration, learning from failures, and constant refinement are at the core of how we operate. In this environment, you will develop and deploy solutions swiftly, learn from each iteration, and constantly refine your approach to deliver better results. As a self-driven, results-oriented individual with a strong work ethic, you will play a key role in guiding the technical direction of the ML Platform, solving complex problems, and leading by example. You’ll bring leadership to the team through both mentorship and hands-on contributions, helping drive innovations in model optimization and performance tuning.

* Optimize LLMs for Inference: Implement and enhance large language models for real-time and batch inference, balancing performance and resource efficiency.* Advanced Inference Optimization: Apply techniques such as quantization and speculative decoding to reduce model size and accelerate inference without sacrificing accuracy. Leverage quantization-aware training (QAT) and post-training quantization (PTQ) to deploy models on resource-constrained hardware.* Cross-Functional Collaboration: Partner with data scientists, ML researchers, and infrastructure engineering teams to understand model requirements, provide feedback, and ensure smooth deployment of models into production.* Monitoring & Resource Management: Implement monitoring tools to profile and track the performance of models running on GPUs, including real-time monitoring of GPU utilization, memory usage, and inference throughput. Manage and optimize resource allocation to ensure high availability and minimal downtime.* Continuous Improvement & R&D: Stay on top of the latest research in LLM inference techniques, GPU optimizations, and distributed systems to bring innovative improvements to the overall system.

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems.
Expertise in deploying and optimizing LLMs for high-performance, production-scale inference.
Proficiency in Python, Java or C++.
Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.
Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding.
Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks.
Familiarity with cloud technologies like Docker, Kubernetes, AWS EKS for scalable deployment.

Master’s or PhD in Computer Science, Machine Learning, or a related field.
Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models.
Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed.
Strong understanding of distributed systems, multi-GPU/multi-node parallelism, and system-level optimization for large-scale inference.

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

These jobs might be a good fit

Amazon Senior Software Engineer - AI/ML AWS Neuron Inference United States, Washington, Seattle

Amazon Senior Software Development Engineer AI/ML AWS Neuron Model ... United States, California, Cupertino

Amazon Software Engineer-AI/ML AWS Neuron Inference United States, Washington, Seattle

Professional CV Builder tool from Expoint.

Get to the top of the "yes list" with a standout CV!

CREATE CV

Apple Senior Software Engineer ML Inference United States, California, Cupertino 670296332

Amazon Senior Software Engineer - AI/ML AWS Neuron Inference United States, Washington, Seattle

Amazon Senior Software Development Engineer AI/ML AWS Neuron Model ... United States, California, Cupertino

Amazon Senior Software Development Engineer AI/ML AWS Neuron Model ... United States, California, Cupertino

Amazon Software Engineer-AI/ML AWS Neuron Inference United States, Washington, Seattle

Apple Senior Software Engineer ML Inference
United States, California, Cupertino
670296332