You will design, enhance and implement cluster monitoring infrastructure for NVIDIA's deep learning enterprise server platforms; collaborating with engineering teams across company to implement python scripts, database and dynamic reports.
In this role you'll be learning about out of band management for server systems, time series databases, dashboarding tools.
Develop and deploy tools that would be used by wider teams at NVIDIA as we design next generation deep learning platforms.
You have the opportunity to interact with diverse technical groups, spanning all organizational levels.
What we need to see:
Pursuing BS, MS, or PhD in Electrical Engineering, Computer Science or related field
Proficiency in Python programming
Has built and maintained scalable API solutions
Previous experience with Natural Language Processing (NLP) and an understanding of Large Language Models (LLM)/GenAI technologies such as OpenAI API, ChatGPT, GPT-4, Bard, Synthesia, Langchain, HuggingFace Transformers, PyTorch or similar
Familiarity with prompt engineering and vector databases
Prior experience with MLOps and/or CI/CD pipeline development, containerization, model deployment in test and production environments
Ways to stand out from the crowd:
Understanding of system architecture concepts.
Deep knowledge of a specific domain or industry, with a focus on NLP/LLM
Applied research background leveraging frameworks to build LLM prototypes, knowledge of best practices for production LLM development
Be a team player with the ability to clearly communicate complex LLM capabilities and limitations to non-technical stakeholders.