Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Amazon Software Development Engineer AGI Sensory ASR Inference 
United States, Massachusetts, Boston 
683780418

01.12.2024
DESCRIPTION

Key job responsibilities
• Develop high-performance inference software for a diverse set of neural models, typically in C/C++
• Design, prototype, and evaluate new inference engines and optimization techniques
• Participate in deep-dive analysis and profiling of production code
• Optimize inference performance across various platforms (on-device, cloud-based CPU, GPU, proprietary ASICs)
• Collaborate closely with research scientists to bring next-generation neural models to life
• Partner with internal and external hardware teams to maximize platform utilization
• Work in an Agile environment to deliver high-quality software against aggressive schedules
• Hold a high bar for technical excellence within the team and across the organization

BASIC QUALIFICATIONS

- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language
- Bachelor's degree in Computer Science, Computer Engineering, or related field
- Strong C/C++ programming skills
- Solid understanding of deep learning architectures (CNNs, RNNs, Transformers, etc.)


PREFERRED QUALIFICATIONS

- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience with inference frameworks such as PyTorch, TensorFlow, ONNXRuntime, TensorRT, LLaMA.cpp, etc.
- Proficiency in performance optimization for CPU, GPU, or AI hardware
- Proficiency in kernel programming for accelerated hardware using programming models such as (but not limited to) CUDA, OpenMP, OpenCL, Vulkan, and Metal
- Experience with latency-sensitive optimizations and real-time inference
- Understanding of resource constraints on mobile/edge hardware
- Knowledge of model compression techniques (quantization, pruning, distillation, etc.)
- Experience with LLM efficiency techniques like speculative decoding and long context
- Strong communication skills and ability to work in a collaborative environment
- Passion for solving complex problems and driving innovation in AI technology