Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Amazon Sr Applied Scientist AI Agent Evaluation AWS 
United States, Washington, Seattle 
164405035

03.08.2025
DESCRIPTION

The Core Services team within AWS Applied AI Solutions is creating the foundations that will power the next generation of AI agents from small business to enterprise scale. As a scientific leader, you'll drive our new agentic AI building blocks initiative, pioneering the development of reusable AI components that accelerate and standardize AI product delivery across AWS business applications such as contact center, supply chain, healthcare and life sciences. You'll own one critical capability area such as agent identity and governance, agent collaboration & orchestration, agent evaluation, or knowledge management, and have the privilege to define what we build and how we build it.This role combines the excitement of a startup environment with the scale of AWS. You'll research state-of-the-art open source and internal tools, tackle highly ambiguous problems, and lead scientific innovation building a V1 product from the ground up. You won't just be implementing someone else's vision — you'll chart the course, define the roadmap, and create novel scientific solutions that eliminate non-differentiating work while ensuring enterprise-grade quality and consistency. If you thrive on ownership, are passionate about AI research and development, and want to fundamentally influence how AWS builds AI products, this role offers an extraordinary opportunity to make your mark.
Key job responsibilities
- Design and implement evaluation frameworks for AI agents, including benchmarking tools, annotation systems for RLHF, and standardized patterns for memory orchestration and retrieval that ensure consistent performance across diverse use cases- Develop deep expertise in a strategic research area within AI agent systems, becoming the organization's scientific expert in areas such as agent design and evaluation.- Identify and devise new research solutions for ill-defined customer or business problems that require novel methodologies and paradigms to be invented at the product level- Lead the design, implementation, and successful delivery of solutions for scientifically-complex problems, writing "critical-path" code- Write clear, useful narratives and documentation describing inventions, solutions, and design choices that enable others to understand and reproduce your work- Independently assess alternative AI technologies and choose the right approaches to be integrated into your systems- Actively mentor and develop other scientists and engineers, elevating the technical capabilities of the organizationA day in the life
Your morning starts with a team standup, followed by collaborative sessions with software engineers and product managers to understand requirements and internal customer needs. You'll dedicate focused time to researching the latest scientific advances in AI agents and designing novel approaches for your capability area. Throughout the day, you'll work with engineers and scientists within your team to implement solutions, applying scientific rigor to validate approaches. You might lead a scientific design review gathering diverse perspectives to strengthen your methodologies. Your day could include conducting experiments, analyzing results, designing new evaluation methodologies, meeting with internal customers to understand their AI agent needs, or presenting your scientific vision to leadership stakeholders.
Why AWSDiverse Experiences
Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
Work/Life BalanceInclusive Team CultureMentorship and Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.

BASIC QUALIFICATIONS

- 3+ years of building machine learning models for business application experience
- PhD, or Master's degree and 6+ years of applied research experience
- Experience programming in Java, C++, Python or related language
- Experience with neural deep learning methods and machine learning
- Experience using managed ML/AI solutions
- Deep expertise in at least one AI/ML discipline such as NLP, reinforcement learning, etc.
- Experience with neural deep learning methods and popular frameworks such as PyTorch, TensorFlow, or MxNet
- Experience designing and implementing AI/ML systems, including working with LLMs, prompt engineering, retrieval augmented generation (RAG), fine-tuning, or AI agent development
- Experience using managed ML/AI solutions such as AWS SageMaker AI or Amazon Bedrock
- Demonstrated track record of scientific innovation with measurable business impact
- Experience building production systems that operate at scale
- Proven ability to drive scientific roadmap and secure management buy-in for new initiatives
- Strong collaboration skills with ability to influence across organizational boundaries


PREFERRED QUALIFICATIONS

- Experience with modeling tools such as R, scikit-learn, Spark MLLib, MxNet, Tensorflow, numpy, scipy etc.
- Experience with large scale distributed systems such as Hadoop, Spark etc.
- 7+ years of experience applying scientific methods to solve complex AI problems at scale
- Experience mentoring junior scientists and engineers
- Advanced knowledge of large language models, including fine-tuning, evaluation, and responsible AI practices
- Experience developing bias detection, fairness metrics, and ethical evaluation frameworks for AI systems
- Proven track record building reinforcement learning systems with human feedback, including annotation frameworks and reward modeling
- Experience creating evaluation frameworks measuring factuality, robustness, and safety across diverse scenarios, comparable to HELM or HealthBench
- Expertise in building reusable AI components with well-defined interfaces