What You'll Do:Partner with cross-functional teams to translate evaluation needs into robust technical solutions for conversational AI, language models, and AI agent capabilitiesOwn end-to-end requirements gathering, proof-of-concept development, and co-drive the development roadmap for ML system evaluation platformsDesign and implement scalable solutions that enable statistical analysis of product experiences, model performance, and AI agent behavior at scaleDrive system integration efforts and influence how evaluation software is incorporated into ML model and AI agent CI/CD pipelinesDevelop monitoring and observability solutions to provide deep insights into platform performance, evaluation quality, and AI agent reliabilityBuild specialized evaluation frameworks for AI agents, including multi-step reasoning assessment, tool usage validation, and agent interaction quality measurementIterate rapidly based on stakeholder feedback while maintaining platform reliability and performance across diverse AI workloads