Key job responsibilities
- As an AIML Specialist Solutions Architect (SA) in AI Infrastructure, you will serve as the Subject Matter Expert (SME) for providing optimal solutions in model training and inference workloads that leverage Amazon Web Services accelerator computing services. As part of the Specialist Solutions Architecture team, you will work closely with other Specialist SAs to enable large-scale customer model workloads and drive the adoption of AWS EC2, EKS, ECS, SageMaker and other computing platform for GenAI practice.
- You will interact with other SAs in the field, providing guidance on their customer engagements, and you will develop white papers, blogs, reference implementations, and presentations to enable customers and partners to fully leverage AI Infrastructure on Amazon Web Services. You will also create field enablement materials for the broader SA population, to help them understand how to integrate Amazon Web Services GenAI solutions into customer architectures.
- You must have deep technical experience working with technologies related to Large Language Model (LLM), Stable Diffusion and many other SOTA model architectures, from model designing, fine-tuning, distributed training to inference acceleration. A strong developing machine learning background is preferred, in addition to experience building application and architecture design. You will be familiar with the ecosystem of Nvidia and related technical options, and will leverage this knowledge to help Amazon Web Services customers in their selection process.
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.Work/Life Balance
- 5+ years of hands-on experience optimizing AI infrastructure, with deep expertise in inference acceleration frameworks (e.g., vLLM, SGLang, TensorRT, etc.), model training and serving systems across PyTorch and TensorFlow ecosystems;
- Advanced proficiency in Nvidia GPU performance optimization techniques, including memory management, kernel fusion, and quantization strategies for large-scale deep learning workloads;
- Strong foundation in parallel computing principles with practical CUDA programming experience, emphasizing efficient resource utilization and throughput maximization;
- Demonstrated success implementing and tuning distributed AI systems leveraging modern frameworks like Megatron-LM and Ray, with particular focus on LLM deployment and horizontal scaling across GPU clusters.
משרות נוספות שיכולות לעניין אותך