Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Amazon Software engineer -AI/ML AWS Neuron Inference 
United States, Washington, Seattle 
802842620

21.09.2025
Description

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine
The team works side by side with chip architects, compiler engineers and runtime engineers to deliver performance and accuracy on Neuron devices across a range of models such as Llama 3.3 70B, 3.1 405B, DBRX, Mixtral, and so on.Key job responsibilities
Responsibilities of this role include adapting latest research in LLM optimization to Neuron chips to extract best performance from both open source as well as internally developed models. Working across teams and organizations is key.

Basic Qualifications

- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Programming proficiency in Python or C++ (at least one required)
- Experience with PyTorch
- Working knowledge of Machine Learning and LLM fundamentals including transformer architecture, training/inference lifecycles, and optimization techniques
- Strong understanding of system performance, memory management, and parallel computing principles


Preferred Qualifications

- Experience with JAX
- Experience with debugging, profiling, and implementing software engineering best practices in large-scale systems
- Expertise with PyTorch, JIT compilation, and AOT tracing
- Experience with CUDA kernels or equivalent ML/low-level kernels
- Experience with performant kernel development (e.g., CUTLASS, FlashInfer)
- Experience with inference serving platforms (vLLM, SGLang, TensorRT) in production environments
- Deep understanding of computer architecture, operating systems, and parallel computing