Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Nvidia Deep Learning Performance Architect - Perf Tools 
China, Shanghai 
442494315

15.10.2025
China, Shanghai
time type
Full time
posted on
Posted 26 Days Ago
job requisition id

us to shape the performance analysis infrastructures for GPUs.We buildanalysis tools and visualization frameworks that empower engineers toGPU performance for Deep Learning and HPCthe next-gen GPUs architectures.


be doing:

  • Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle.

  • Unlock Architectural Insights: Analyze GPU workloads toidentifybottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities.

  • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure.

  • Cross-Stack Collaboration

What we need to see:

  • BS+ in Computer Science, Electronic Engineeringor related (or equivalent experience)

  • 4+ years of software development

  • Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program

  • Strong grasp of computer architecture (pipelines, memory hierarchies) and operatingsystem fundamentals

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Self-starter who thrives in dynamic environments and manages competing priorities effectively.

Ways to stand out from the crowd:

  • Experience withbuildingperformance debugging and analysis toolson silicon and simulators. Experience of developing application snapshot and replay tool is a big plus.

  • Familiar withCUDASystem SoftwareStack(e.g.,CUDADriver/Runtime APIs), CUDA kernel optimization and understand GPU architecture

  • Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute, NVTX, etc, or experience for developing similar tools for other processors.

  • Practical experience or projects demonstrating AI/ML-based code generation, automated data analysis, or workflow assistants.