Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Amazon Sr Software Engineer - AI/ML AWS Neuron Distributed 
United States, California, Cupertino 
288637143

16.09.2024
DESCRIPTION

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine
learning accelerators and the Trn1 and Inf1 servers that use them. This role for a senior software engineering responsible for driving and enabling the AWS Neuron software stack to support next generation capabilities such as newer model architectures (like Mamba and Mixture of Experts) and lower precision training techniques.This is a cross functional role where you will be responsible for -
- Influencing Neuron roadmap to support newer model architectures and training techniques based on your technical assessment of state-of-the-art literature.
- Working side by side with chip architects, applied scientists, compiler and runtime engineers to build performant support for the next generation models and training techniques (e.g. low precision training).This role requires experience on two dimensions -
- Experience training large models using PyTorch/JAX is a must. FSDP, Deepspeed and other distributed training libraries are central to this and extending all of this for the Neuron based system is key.
- Experience with profiling and building an understanding of systems bottlenecks and developing solutions (e.g. custom kernels) to improve performance is a must.
Work/Life Balance
Mentorship & Career Growth

BASIC QUALIFICATIONS

- 5+ years of non-internship professional software development experience
- 5+ years of programming with at least one software programming language experience
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience as a mentor, tech lead or leading an engineering team


PREFERRED QUALIFICATIONS

- Bachelor's degree in computer science or equivalent
- Machine Learning knowledge in frameworks and end to end model training.