Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Nvidia Deep Learning Performance Architect Intern - 
China, Shanghai 
356484183

Yesterday
China, Shanghai
China, Beijing
time type
Full time
posted on
Posted 25 Days Ago
job requisition id

We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPCworkloads—spanning pre-siliconexploration to post-silicon

What you’ll be doing:

  • Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle

  • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities.

  • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure.

  • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to co-design performance-centric solutions.

  • End-to-End Optimization: Create benchmarks to validate performance improvements across AI/HPC workloads and present actionable insights.


What we need to see:

  • BS/MS+ in relevant discipline (CS, EE, Math)

  • Proficiencyin C/C++ (performance-criticalcoding)and Python (automation/scripting,and AI/ML frameworks)

  • Strong grasp of computerarchitecture (pipelines,memory hierarchies) and Operating System fundamentals

  • Understand machinelearning and data analysis basics, LLM techniques such as prompt engineering, fine-tuning, vector databases

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Self-starter who thrives in dynamic environments and manages competing priorities effectively.

Ways to stand out from the crowd:

  • Experience with developing HW performance debugging and analysis tools

  • Familiar with System Software Stack(like CUDA Driver), CUDA kernel optimization and understand GPU architecture

  • Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute

  • Practical experience or projects demonstrating LLM-based code generation, automated data analysis, or workflow assistants.Prior experience with agentic LLM frameworks like Langchain and LLamaIndex.

  • Full-Stack Versatility: Skillsin JavaScript, SQL,or UI/UX design for tool interfaces.