Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Amazon Sr SDM ML Acceleration Neuron Inference Apps 
United States, Washington, Seattle 
111552918

05.08.2024
DESCRIPTION

AWS Neuron is the complete software stack for the Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1 and Inf1/Inf2 servers that use them.As the Sr. SDM for the Neuron Inference Technology Team, you will be responsible for leading a strong team of Managers and engineers to help design and develop distributed ML Inference features and usecases on various frameworks such as Pytorch, JAX, Tensorflow. You will be responsible for the full development life cycle of features and extensions for inference support in our Neuronx_Distributed and Transformers_Neuronx Inference Libraries. Develop reliability/scalability features and performance updates in these Distributed Inference Libraries as well as contribute to other popular open Inference Libraries to enable them make Trainium and Inferentia devices as the first-class citizens for ML Acceleration. Lead the way to ensure support for key ML functionality in a combined chip / software platform. Ensure the right thing is being built and delivered to customersA successful candidate will have an established background in developing Machine Learning Inference Libraries using Pytorch/JAX on XLA devices using Torch-XLA or Open-XLA project integrations to develop distributed inference libraries and frameworks. The ideal candidate should have a strong technical ability to work/deliver on a vertically integrated system stack that consists of a combinatorial matrix of hardware, frameworks, and workflows. Deep expertise in Framework integrations and development using C++ is a must along-with direct customer-facing experience and a strong motivation to achieve results.A day in the life
You will work with the executive leadership and other senior management and technical leaders to define product directions and deliver them to customers. We build massive-scale distributed training and inference solutions. This organization builds the full stack of software, servers and chips to accelerate at the highest scale.
Work/Life Balance
Mentorship & Career Growth


BASIC QUALIFICATIONS

- 10+ years of engineering experience
- 5+ years of engineering team management experience
- 10+ years of planning, designing, developing and delivering consumer software experience
- Experience partnering with product or program management teams
- Experience managing multiple concurrent programs, projects and development teams in an Agile environment