

Share
What you’ll be doing:
Build and maintain data pipelines and ETL flows for logs, telemetry, and hardware test data supporting AI/ML workflows.
Prepare, clean, and structure large, complex datasets (structured & unstructured) to train and fine-tune LLMs.
Assist in developing and deploying LLM-based applications for root cause analysis and hardware debugging.
Experiment with prompt engineering, retrieval-augmented generation (RAG), and vector search to integrate knowledge into models.
Collaborate with hardware, reliability, and AI platform teams to embed intelligent debugging tools into NVIDIA’s engineering ecosystem.
Monitor and evaluate model performance, ensuring accuracy, scalability, and reliability in production environments.
What we need to see:
B.Sc. or M.Sc. in Computer Science, Electrical/Computer Engineering, Data Science or related field (or equivalent practical experience).
2+ years of industry experience in machine learning or data engineering.
Strong programming skills in Python (pandas, NumPy, PyTorch or TensorFlow).
Proficiency with SQL and modern data pipeline tools.
Understanding of deep learning fundamentals and strong interest in LLMs/NLP.
Hands-on experience with Linux environments, version control (Git), and container tools (e.g., Docker).
Strong analytical and problem-solving skills
Eagerness to learn complex hardware/software systems.
Ways to stand out from the crowd:
Internship or project experience with LLM fine-tuning, prompt engineering, or retrieval-augmented generation.
Exposure to hardware debugging,observability/loggingsystems, or chip/system reliability analysis.
Experience with vector databases (FAISS, Pinecone, Milvus) or MLOps tools (MLflow, Kubeflow).
Master’s degree in a related field (e.g., Computer Science, Electrical Engineering, Data Science), showing advanced theoretical foundation and research exposure.
These jobs might be a good fit