המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Nvidia Deep Learning Performance Architect Intern -
China, Shanghai
356484183

02.07.2025

שיתוף

התחבר/י כדי להגיש מועמדות

China, Shanghai

China, Beijing

time type: Full time

posted on: Posted 25 Days Ago

job requisition id

We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPCworkloads—spanning pre-siliconexploration to post-silicon

What you’ll be doing:

Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle
Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities.
AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure.
Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to co-design performance-centric solutions.
End-to-End Optimization: Create benchmarks to validate performance improvements across AI/HPC workloads and present actionable insights.

What we need to see:

BS/MS+ in relevant discipline (CS, EE, Math)
Proficiencyin C/C++ (performance-criticalcoding)and Python (automation/scripting,and AI/ML frameworks)
Strong grasp of computerarchitecture (pipelines,memory hierarchies) and Operating System fundamentals
Understand machinelearning and data analysis basics, LLM techniques such as prompt engineering, fine-tuning, vector databases
Experience with performance modeling, architecture simulation, profiling, and analysis.
Self-starter who thrives in dynamic environments and manages competing priorities effectively.

Ways to stand out from the crowd:

Experience with developing HW performance debugging and analysis tools
Familiar with System Software Stack(like CUDA Driver), CUDA kernel optimization and understand GPU architecture
Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute
Practical experience or projects demonstrating LLM-based code generation, automated data analysis, or workflow assistants.Prior experience with agentic LLM frameworks like Langchain and LLamaIndex.
Full-Stack Versatility: Skillsin JavaScript, SQL,or UI/UX design for tool interfaces.

פרטי המשרה המלאים

משרות נוספות שיכולות לעניין אותך

Nvidia Deep Learning Performance Architect - Intern China, Shanghai

Nvidia Deep Learning Performance Architect Intern - China, Shanghai

Nvidia Deep Learning Performance Architect China, Shanghai

כלי לבניית קורות חיים מקצועיים מבית אקספוינט

הצטרפו למאות שיצרו קורות חיים ושדרגו את הקריירה שלהם

צרו קו"ח

Nvidia Deep Learning Performance Architect Intern - China, Shanghai 356484183

Nvidia Deep Learning Performance Architect - Intern China, Shanghai

Nvidia Deep Learning Performance Architect Intern - China, Shanghai

Nvidia Deep Learning Performance Architect China, Shanghai

Nvidia Deep Learning Performance Architect China, Shanghai

Nvidia Deep Learning Performance Architect Intern -
China, Shanghai
356484183