Expoint – all jobs in one place
Finding the best job has never been easier

Deep Learning Performance Architect jobs at Nvidia in China, Shanghai

Discover your perfect match with Expoint. Search for job opportunities as a Deep Learning Performance Architect in China, Shanghai and join the network of leading companies in the high tech industry, like Nvidia. Sign up now and find your dream job with Expoint
Company (1)
Job type
Job categories
Job title (1)
China
Shanghai
80 jobs found
16.11.2025
N

Nvidia Senior Deep Learning Compiler Engineer - CUDA China, Shanghai

Limitless High-tech career opportunities - Expoint
Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures. Continuously innovate and iterate on the core architecture of the compiler to...
Description:
China, Shanghai
China, Beijing
time type
Full time
posted on
Posted 5 Days Ago
job requisition id

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.

What you'll be doing:
  • Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures

  • Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance

  • Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack

  • Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks

What we need to see:
  • Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)

  • 4 + years of relevant work experience

  • Excellent C/C++ programming and software engineering skills, ACM background is a plus

  • Good fundamental knowledges on computer architecture

  • Strong ability in abstracting problems and the methodology in resolving problems

  • Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired

  • Good knowledge of GPU architecture and fast kernel programming skills is a plus

  • Knowledge of LLM algorithms or a certain HPC domain is a plus

  • Knowledge of multi-GPU distributed communication is a plus

  • Excellent oral communication in English is a plus

Show more
16.11.2025
N

Nvidia LLM Reinforcement Learning Algorithm Engineer China, Shanghai

Limitless High-tech career opportunities - Expoint
Developing and introducing groundbreaking reinforcement learning algorithms tailored for LLM applications. Collaborating with a world-class team of engineers and researchers to integrate these algorithms into applied scenarios. Using your extensive...
Description:
China, Shanghai
time type
Full time
posted on
Posted 2 Days Ago
job requisition id

What you'll be doing:

  • Developing and introducing groundbreaking reinforcement learning algorithms tailored for LLM applications.

  • Collaborating with a world-class team of engineers and researchers to integrate these algorithms into applied scenarios.

  • Using your extensive expertise in math and AI to improve the reasoning capabilities of our models.

  • Engaging in rigorous testing and refinement processes to ensure flawless performance and reliability.

  • Contributing to our collective goal of delivering industry-leading AI solutions, strictly adhering to NVIDIA's high standards.

What we need to see:

  • Proficient in C++/Python programming.

  • 3+ years working experience.

  • BS or MS (or equivalent experience) in CS, CE, EE, or a related field.

  • Proven experience in reinforcement learning and its application to large language models.

  • Strong background in mathematics and AI algorithms, with a focus on reinforcement learning.

  • Demonstrated history of applying reinforcement learning algorithms in practical scenarios.

  • Understanding of GPU architecture is a huge plus.

  • Excellent problem-solving skills and the ability to work collaboratively in a dynamic team environment.

  • A passion for innovation and a dedication to achieving outstanding results.

Show more

These jobs might be a good fit

15.11.2025
N

Nvidia Senior Software Architect Humanoid Robotics China, Shanghai

Limitless High-tech career opportunities - Expoint
Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms;. Deploy and evaluate neural network models in physics simulation and on real humanoid hardware;. Design and maintain...
Description:
China, Shanghai
time type
Full time
posted on
Posted 2 Days Ago
job requisition id

What you will be doing:

  • Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms;

  • Deploy and evaluate neural network models in physics simulation and on real humanoid hardware;

  • Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision;

  • Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability;

  • Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection;

  • Collaborate with researchers on model training, data processing, and MLOps lifecycle.

What we need to see:

  • Bachelor’s degree in Computer Science, Robotics, Engineering, or a related field;

  • 3+ years of full-time industry experience in robotics hardware or software full-stack;

  • Hands-on experience with deploying and debugging neural network models on robotic hardware;

  • Ability to implement real-time control algorithms, teleoperation stack, and sensor fusion;

  • Proficiency in languages such as Python, C++, and experience with robotics frames (ROS) and physics simulation (Gazebo, Mujoco, Isaac, etc.).

  • Experience in maintaining and troubleshooting robotic systems, including mechanical, electrical, and software components.

  • Physically work on-site on all business days.

Ways to stand out from the crowd:

  • Master’s or PhD’s degree in Computer Science, Robotics, Engineering, or a related field;

  • Experience at humanoid robotics companies on real hardware deployment;

  • Experience in robot hardware design;

  • Demonstrated Tech Lead experience, coordinating a team of robotics engineers and driving projects from conception to deployment.

Show more

These jobs might be a good fit

15.11.2025
N

Nvidia Senior AI Performance Efficiency Engineer China, Shanghai

Limitless High-tech career opportunities - Expoint
Collaborate closely with our AI/ML researchers to make their ML models more efficient leading to significant productivity improvements and cost savings. Build tools, frameworks, and apply ML techniques to detect...
Description:
China, Shanghai
time type
Full time
posted on
Posted 4 Days Ago
job requisition id

What you will be doing:

  • Collaborate closely with our AI/ML researchers to make their ML models more efficient leading to significant productivity improvements and cost savings

  • Build tools, frameworks, and apply ML techniques to detect & analyze efficiency bottlenecks and deliver productivity improvements for our researchers

  • Work with researchers working on a variety of innovative ML workloads across Robotics, Autonomous vehicles, LLM’s, Videos and more

  • Collaborate across the engineering organizations to deliver efficiency in our usage of hardware, software, and infrastructure

  • Proactively monitor fleet wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them

  • Keep up to date with the most recent developments in AI/ML technologies, frameworks, and successful strategies, and advocate for their integration within the organization.

What we need to see:

  • BS or similar background in Computer Science or related area (or equivalent experience)

  • Minimum 8+ years of experience designing and operating large scale compute infrastructure

  • Strong understanding of modern ML techniques and tools

  • Experience investigating, and resolving, training & inference performance end to end

  • Debugging and optimization experience with NSight Systems and NSight Compute

  • Experience with debugging large-scale distributed training using NCCL

  • Proficiency in programming & scripting languages such as Python, Go, Bash, as well as familiarity with cloud computing platforms (e.g., AWS, GCP, Azure) in addition to experience with parallel computing frameworks and paradigms.

  • Dedication to ongoing learning and staying updated on new technologies and innovative methods in the AI/ML infrastructure sector.

  • Excellent communication and collaboration skills, with the ability to work effectively with teams and individuals of different backgrounds

Ways to stand out from the crowd:

  • Background with NVIDIA GPUs, CUDA Programming, NCCL and MLPerf benchmarking

  • Experience with Machine Learning and Deep Learning concepts, algorithms and models

  • Familiarity with InfiniBand with IBOP and RDMA

  • Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloads

  • Familiarity with deep learning frameworks like PyTorch and TensorFlow

Show more

These jobs might be a good fit

15.11.2025
N

Nvidia LLM Reinforcement Learning Algorithm Engineer - New College ... China, Shanghai

Limitless High-tech career opportunities - Expoint
Developing and introducing groundbreaking reinforcement learning algorithms tailored for LLM applications. Collaborating with a world-class team of engineers and researchers to integrate these algorithms into applied scenarios. Using your extensive...
Description:
China, Shanghai
time type
Full time
posted on
Posted 2 Days Ago
job requisition id

What you'll be doing:

  • Developing and introducing groundbreaking reinforcement learning algorithms tailored for LLM applications.

  • Collaborating with a world-class team of engineers and researchers to integrate these algorithms into applied scenarios.

  • Using your extensive expertise in math and AI to improve the reasoning capabilities of our models.

  • Engaging in rigorous testing and refinement processes to ensure flawless performance and reliability.

  • Contributing to our collective goal of delivering industry-leading AI solutions, strictly adhering to NVIDIA's high standards.

What we need to see:

  • Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)

  • Proficient in C++/Python programming.

  • Proven experience in reinforcement learning and its application to large language models.

  • Strong background in mathematics and AI algorithms, with a focus on reinforcement learning.

  • Demonstrated history of applying reinforcement learning algorithms in practical scenarios.

  • Understanding of GPU architecture is a huge plus.

  • Excellent problem-solving skills and the ability to work collaboratively in a dynamic team environment.

  • A passion for innovation and a dedication to achieving outstanding results.

Show more

These jobs might be a good fit

15.11.2025
N

Nvidia Deep Learning Performance Architect - Intern China, Shanghai

Limitless High-tech career opportunities - Expoint
Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. Develop analytical...
Description:
China, Shanghai
time type
Full time
posted on
Posted 6 Days Ago
job requisition id

What you’ll be doing:

  • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products.

  • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.

  • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.

  • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

What we need to see:

  • BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).

  • Strong programming skills in Python, C, C++.

  • Strong background in computer architecture.

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Prior experience with LLM or generative AI algorithms.

Ways to stand out from the crowd:

  • GPU Computing and parallel programming models such as CUDA and OpenCL.

  • Architecture of or workload analysis on other deep learning accelerators.

  • Deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, TensorRT-LLM, vLLM, etc.).

  • Open-sourceAIcompilers (OpenAI Triton, MLIR, TVM, XLA, etc.).

and proud to be an

Show more

These jobs might be a good fit

Limitless High-tech career opportunities - Expoint
Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures. Continuously innovate and iterate on the core architecture of the compiler to...
Description:
China, Shanghai
China, Beijing
time type
Full time
posted on
Posted 5 Days Ago
job requisition id

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.

What you'll be doing:
  • Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures

  • Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance

  • Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack

  • Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks

What we need to see:
  • Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)

  • 4 + years of relevant work experience

  • Excellent C/C++ programming and software engineering skills, ACM background is a plus

  • Good fundamental knowledges on computer architecture

  • Strong ability in abstracting problems and the methodology in resolving problems

  • Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired

  • Good knowledge of GPU architecture and fast kernel programming skills is a plus

  • Knowledge of LLM algorithms or a certain HPC domain is a plus

  • Knowledge of multi-GPU distributed communication is a plus

  • Excellent oral communication in English is a plus

Show more
Find your dream job in the high tech industry with Expoint. With our platform you can easily search for Deep Learning Performance Architect opportunities at Nvidia in China, Shanghai. Whether you're seeking a new challenge or looking to work with a specific organization in a specific role, Expoint makes it easy to find your perfect job match. Connect with top companies in your desired area and advance your career in the high tech field. Sign up today and take the next step in your career journey with Expoint.