The point where experts and best companies meet
Share
We’re looking for a
Lead the design and implementation of observability tools and dashboards that provide actionable insights into platform performance and health.
Leverage Generative AI models and fine tune them to enhance observability capabilities, such as anomaly detection, predictive analytics, and troubleshooting copilot.
Build and deploy well-managed core APIs and SDKs for observability of LLMs and proprietary Gen-AI Foundation Models including training, pre-training, fine-tuning and prompting.
Stay abreast of the latest trends in Generative AI and platform observability, and drive the adoption of emerging technologies and methodologies.
Bring research mindset, lead Proof of concept to showcase capabilities of large language models in the realm of observability and governance which enables practical production solutions for improving platform users productivity.
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
Atleast 4 years of experience in machine learning engineering with a strong focus on platform observability and hands-on experience on building RAG patterns, semantic kernels etc
Hands-on experience with Generative AI models and their application in observability or related areas.
At least 4 years of experience programming with Python, Go, or Java
At least 2 years Proficiency in observability tools such as Prometheus, Grafana, ELK Stack, or similar, with a focus on adapting them for Gen AI systems.
At least 3 years of experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow.
At least 2 years of Experience in developing applications using Generative AI i.e open source or commercial LLMs, and some experience in latest open source libraries such as LangChain, haystack and vector databases like open search, chroma and FAISS.
Prior experience in leveraging open source libraries for observability such as langfuse, phoenix, openInference, helicone etc.
Excellent knowledge in Open Telemetry and priority experience in building SDKs andAPIs.
Proficiency in programming languages such as Python, Java, or Go, with strong understanding of microservices architecture.
Experience with cloud platforms like AWS, Azure, or GCP.
Experience in machine learning, particularly in deploying and operationalizing ML models.
Familiarity with container orchestration tools like Kubernetes and Docker.
Knowledge of data governance and compliance, particularly in the context of machine learning and AI systems.
Prior experience in NVIDIA GPU Telemetry and experience in CUDA
Master's or doctoral degree in computer science, electrical engineering, mathematics, or a similar field.
Contributed to open source ML software.
Authored/co-authoredpapers, patents on ML techniques, models, or proof of concept.
Knowledge of data governance and compliance, particularly in the context of machine learning and AI systems.
If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation, please contact Capital One Recruiting at 1-800-304-9102 or via email at . All information you provide will be kept confidential and will be used only to the extent required to provide needed reasonable accommodations.
These jobs might be a good fit