Strong background in computer science: algorithms, data structures and system design
10+ year experience on large scale distributed system design, operation and optimization
Familiar with one of the popular ML Frameworks like Pytorch, Tensorflow
Excellent interpersonal skills able to work independently as well as cross-functionally
Proficient in building and maintaining systems written in modern languages (e.g. Golang, Python)
Familiar with fundamental Deep Learning architectures such as Transformers, Encoder/Decoder models.
Familiarity with Nvidia TensorRT-LLM, vLLLM, DeepSpeed, Nvidia Triton Server etc.
Experience writing custom CUDA kernels using CUDA or OpenAI Triton.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.