Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Amazon Senior Inference Engineer AGI 
United Kingdom, England, Cambridge 
513466627

Yesterday
DESCRIPTION

Key job responsibilities
Drive the technical strategy and roadmap for inference optimizations across AGI
• Develop high-performance inference software for a diverse set of neural models, typically in C/C++
• Optimize inference performance across various platforms (on-device, cloud-based CPU, GPU, proprietary ASICs)
• Collaborate closely with research scientists to bring next-generation neural models to life
• Partner with internal and external hardware teams to maximize platform utilization
• Work in an Agile environment to deliver high-quality software against tight schedules
• Mentor and grow technical talent

BASIC QUALIFICATIONS

- 5+ years of non-internship professional software development experience
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 5+ years of programming with at least one software programming language experience
- Experience as a mentor, tech lead or leading an engineering team
- Bachelor's degree in Computer Science, Computer Engineering, or related field
- 2+ years of experience optimizing neural models
- Deep expertise in C/C++ and low-level system optimization
- Proven track record of leading large-scale technical initiatives
- Solid understanding of deep learning architectures (CNNs, RNNs, Transformers, etc.)
- Experience with inference frameworks (PyTorch, TensorFlow, ONNXRuntime, TensorRT, LLaMA.cpp, etc.)
- Strong communication skills and ability to work in a collaborative environment


PREFERRED QUALIFICATIONS

- Proficiency in kernel programming for accelerated hardware
- Experience with latency-sensitive optimizations and real-time inference
- Understanding of resource constraints on mobile/edge hardware
- Knowledge of model compression techniques (quantization, pruning, distillation, etc.)
- Experience with LLM efficiency techniques like speculative decoding and long context