Demonstrated experience in leading and driving complex, ambiguous projects.
Experience with high throughput services particularly at supercomputing scale.
Proficient in running applications on Cloud (AWS, Azure, or equivalent) using Kubernetes and Docker.
Familiar with GPU programming concepts using CUDA and with popular machine learning frameworks like PyTorch or TensorFlow.
Proficient in building and maintaining systems written in modern languages (e.g. Go, Python).
Familiar with fundamental deep learning architectures such as Transformer models and encoder/decoder models.
Familiar with NVIDIA TensorRT-LLM, vLLM, DeepSpeed, NVIDIA Triton Inference Server.
Experience in writing custom CUDA kernels using CUDA or OpenAI Triton.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.