Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Nvidia NVIDIA SolutionS Architect Intern - CSP & CRISP 
United States, Oregon 
970858909

Yesterday
Hong Kong, STP
Hong Kong, Remote
time type
Full time
posted on
Posted 19 Days Ago
job requisition id

What you’ll be doing

  • Drive LLM efficiency:​​ Design and leverage advanced low-precision quantization techniques (INT8, FP8, FP4) to optimize inference performance for customer deployments.

  • Innovate with frameworks:​​ Simulate, optimize, and extend cutting-edge training & inference frameworks (e.g., vLLM, SGLang, TensorRT-LLM, NeMo, Megatron) within NVIDIA's ecosystem.

  • Enable new AI capabilities:​​ Integrate and validate next-generation LLM architectures and features within core frameworks to expand NVIDIA's solution offerings.

  • Tune for peak performance:​​ Conduct rigorous performance analysis and tuning of LLM workloads for optimal execution on cloud and on-premises NVIDIA platforms.

  • Collaborate on customer solutions:​​ Partner with engineering teams and solution architects to translate customer requirements into high-impact LLM engineering implementations.

What we need to see

  • Pursuing a ​MS or PhD​ in Computer Science, Artificial Intelligence, Electrical Engineering, or a related field.

  • Hands-on experience​ with large language model (LLM) training and/or inference frameworks from project work, research, or prior internships.

  • Strong proficiency in PyTorch​ and ​Python programming.

  • Solid foundational understanding​ of:

    • Transformer architectures & core LLM algorithms.

    • Principles and trade-offs of model quantization techniques.

    • Distributed training paradigms (e.g., FSDP, ZeRO, 3D/5D parallelism, RLHF infrastructure).

  • A ​link to your GitHub profile or code samples​ is required with your application (demonstrating relevant projects).


Ways to stand out from the crowd

  • Demonstrable experience with ​quantization tools and workflows​ (e.g., GPTQ, AWQ, SmoothQuant).

  • Contributions to relevant Open Source Software projects​ (e.g., vLLM, SGLang, Hugging Face Transformers, PyTorch, DeepSpeed).

  • Understanding of GPU architecture (CUDA), high-performance computing concepts, and cluster communication libraries​ (e.g., NCCL, MPI).

  • Record of ​published research​ in machine learning, NLP, or systems at majorconferences/journals.

  • Experience deploying or optimizing workloads on ​NVIDIA GPUs​ and familiarity with ​NVIDIA AI software stacks.