Responsibilities include:- Create evaluation task design and guidelines; identify a relevant data annotation platform to run evaluations at scale.- Implement metrics to measure the effectiveness and accuracy of models to ensure they meet performance standards.- Monitor LLM performance in production environments through human evaluations, identifying trends, and raising alerts when quality degradation occurs. - Perform detailed failure analysis to understand model weaknesses and identify areas for improvement, offering actionable insights to engineers- Maintain high standards for data quality and continuously enhance processes based on both quantitative and qualitative feedback