* Work along side Foundation Model Research team to prototype and develop inference for cutting edge model architectures. * Build tools to understand bottlenecks in Inference for different hardwares and use cases.
Bachelor’s degree or higher in Computer Science or related technical field.
2 year+ industry experience in ML technologies (LLMs, Machine Learning, NLP, Information Retrieval, Statistics).
Experience with high throughput services particularly at supercomputing scale.
Proficient with running applications on Cloud (AWS / Azure or equivalent) using Kubernetes, Docker etc.
Proficient in building and maintaining systems written in modern languages (eg: Golang, python)
Familiar with one of the popular ML Frameworks like Pytorch, Tensorflow.
Familiar with fundamental Deep Learning architectures such as Transformers, Encoder/Decoder models.
Familiarity with Nvidia TensorRT-LLM, vLLLM, DeepSpeed, Nvidia Triton Server etc.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.