What you’ll be doing
Drive LLM efficiency: Design and leverage advanced low-precision quantization techniques (INT8, FP8, FP4) to optimize inference performance for customer deployments.
Innovate with frameworks: Simulate, optimize, and extend cutting-edge training & inference frameworks (e.g., vLLM, SGLang, TensorRT-LLM, NeMo, Megatron) within NVIDIA's ecosystem.
Enable new AI capabilities: Integrate and validate next-generation LLM architectures and features within core frameworks to expand NVIDIA's solution offerings.
Tune for peak performance: Conduct rigorous performance analysis and tuning of LLM workloads for optimal execution on cloud and on-premises NVIDIA platforms.
Collaborate on customer solutions: Partner with engineering teams and solution architects to translate customer requirements into high-impact LLM engineering implementations.
What we need to see
Pursuing a MS or PhD in Computer Science, Artificial Intelligence, Electrical Engineering, or a related field.
Hands-on experience with large language model (LLM) training and/or inference frameworks from project work, research, or prior internships.
Strong proficiency in PyTorch and Python programming.
Solid foundational understanding of:
Transformer architectures & core LLM algorithms.
Principles and trade-offs of model quantization techniques.
Distributed training paradigms (e.g., FSDP, ZeRO, 3D/5D parallelism, RLHF infrastructure).
A link to your GitHub profile or code samples is required with your application (demonstrating relevant projects).
Ways to stand out from the crowd
Demonstrable experience with quantization tools and workflows (e.g., GPTQ, AWQ, SmoothQuant).
Contributions to relevant Open Source Software projects (e.g., vLLM, SGLang, Hugging Face Transformers, PyTorch, DeepSpeed).
Understanding of GPU architecture (CUDA), high-performance computing concepts, and cluster communication libraries (e.g., NCCL, MPI).
Record of published research in machine learning, NLP, or systems at majorconferences/journals.
Experience deploying or optimizing workloads on NVIDIA GPUs and familiarity with NVIDIA AI software stacks.
משרות נוספות שיכולות לעניין אותך