Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Capital One Principal Associate - Full Stack 
India, Karnataka, Bengaluru 
313616811

18.09.2024
Voyager (94001), India, Bangalore, Karnataka Principal Associate - Full Stack


Generative AI Observability & Governance for ML Platform

As a Capital One Principal Associate, Full Stack, you'll be part of a team focusing on observability and model governance automation for cutting edge generative AI use cases. You will work on building solutions to collect metadata, metrics and insights from the large scale genAI platform. And build intelligent and smart solutions to derive deep insights into platform's use-cases performance and compliance with industry standards.

You will contribute to building a system to do this for Capital One models, accelerating the move from fully trained models to deployable model artifacts ready to be used to fuel business decisioning and build an observability platform to monitor the models and platform components.

What You’ll Do:

● Lead the design and implementation of observability tools and dashboards that provide actionable insights into platform performance and health.

● Leverage Generative AI models and fine tune them to enhance observability capabilities, such as anomaly detection, predictive analytics, and troubleshooting copilot.

● Build and deploy well-managed core APIs and SDKs for observability of LLMs and proprietary Gen-AI Foundation Models including training, pre-training, fine-tuning and prompting.

● Work with model and platform teams to build systems that ingest large amounts of model and feature metadata and runtime metrics to build an observability platform and to make governance decisions to ensure ethical use, data integrity, and compliance with industry standards for Gen-AI.

● Partner with product and design teams to develop and integrate advanced observability tools tailored to Gen-AI.

● Collaborate as part of a cross-functional Agile team,data scientists, ML engineers, and other stakeholders to understand requirements and translate them into scalable and maintainable solutions.

● Bring research mindset, lead Proof of concept to showcase capabilities of large language models in the realm of observability and governance which enables practical production solutions for improving platform users productivity.

Basic Qualifications:

● Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

● At least 4 years of experience designing and building data intensive solutions using distributed computing with deep understanding of microservices architecture.

● At least 4 years of experience programming with Python, Go, or Java

● Proficiency in observability tools such as Prometheus, Grafana, ELK Stack, or similar, with a focus on adapting them for Gen AI systems.

● Excellent knowledge in Open Telemetry and priority experience in building SDKs and APIs.

● Hands-on experience with Generative AI models and their application in observability or related areas.

● Excellent knowledge in Open Telemetry and priority experience in building SDKs and APIs.

● At least 2 years of experience with cloud platforms like AWS, Azure, or GCP.

Preferred Qualifications:

● At least 4 years of experience building, scaling, and optimizing ML systems

● At least 3 years of experience in MLOps either using open source tools like MLFlow or commercial tools

● At least 2 Experience in developing applications using Generative AI i.e open source or commercial LLMs, and some experience in latest open source libraries such as LangChain, haystack and vector databases like open search, chroma and FAISS.

● Preferred prior experience in leveraging open source libraries for observability such as langfuse, phoenix, openInference, helicone etc.

● Contributed to open source libraries specifically GEN-AI and ML solutions ● Authored/co-authored a paper on a ML technique, model, or proof of concept

● Preferred experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow.

● Prior experience in NVIDIA GPU Telemetry and experience in CUDA

● Knowledge of data governance and compliance, particularly in the context of machine learning and AI systems.

If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation, please contact Capital One Recruiting at 1-800-304-9102 or via email at . All information you provide will be kept confidential and will be used only to the extent required to provide needed reasonable accommodations.