Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

דרושים Accelerated Compute Systems Performance Architect ב-אנבידיה ב-China, Shanghai

מצאו את ההתאמה המושלמת עבורכם עם אקספוינט! חפשו הזדמנויות עבודה בתור Accelerated Compute Systems Performance Architect ב-China, Shanghai והצטרפו לרשת החברות המובילות בתעשיית ההייטק, כמו Nvidia. הירשמו עכשיו ומצאו את עבודת החלומות שלך עם אקספוינט!
חברה (1)
אופי המשרה
קטגוריות תפקיד
שם תפקיד (1)
China
Shanghai
נמצאו 62 משרות
24.11.2025
N

Nvidia Deep Learning Performance Architect - New College Grad China, Shanghai

Limitless High-tech career opportunities - Expoint
Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. Develop analytical...
תיאור:
China, Shanghai
China, Beijing
time type
Full time
posted on
Posted 4 Days Ago
job requisition id

What you’ll be doing:

  • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products.

  • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.

  • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.

  • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

What we need to see:

  • BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).

  • Strong programming skills in Python, C, C++.

  • Strong background in computer architecture.

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Prior experience with LLM or generative AI algorithms.

Ways to stand out from the crowd:

  • GPU Computing and parallel programming models such as CUDA and OpenCL.

  • Architecture of or workload analysis on other deep learning accelerators.

  • Deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, TensorRT-LLM, vLLM, etc.).

  • Open-sourceAIcompilers (OpenAI Triton, MLIR, TVM, XLA, etc.).

and proud to be an

Show more
24.11.2025
N

Nvidia Performance Engineer Intern Deep Learning HPC - China, Shanghai

Limitless High-tech career opportunities - Expoint
Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems. Aggregate...
תיאור:
China, Shanghai
time type
Full time
posted on
Posted 4 Days Ago
job requisition id

You will be part of global Performance Lab team, improving our capacity to expertly and accurately benchmark state-of-the-art datacenter applications and products. We also work to develop infrastructures and solutions that enhance the team’s ability to gather data through automation and designing efficient processes for testing a wide variety of applications and hardware. The data that we collect drives marketing/sales collaterals as well as engineering studies for future products. You will have the opportunity to work with multi-functional teams and in a dynamic environment where multiple projects will be active at once and priorities may shift frequently.

What you’ll be doing:

  • Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems.

  • Aggregate and produce written reports with the testing data for internal sales, marketing, SW, and HW teams.

  • Develop Python scripts to automate the testing of various applications.

  • Collaborate with internal teams to debug and improve performance issues.

  • Assist with the development of tools and processes that improve our ability to perform automated testing.

  • Setup and configure systems with appropriate hardware and software to run benchmarks.

What we need to see:

  • Currently pursuing a bachelor's degree (or higher) in Computer Science, Electrical Engineering, or a related field.

  • Experienced in programming and debugging with scripting languages such as Python or Unix shell.

  • Strong data analysis skills and the ability to summarize findings in a written report.

  • Hands-on experience with Linux based systems. Familiarity using a container platform such as Docker or Singularity. Experience with compiling and running software from source code.

  • Good English verbal and written skills to improve collaboration with coworkers.

  • Fast and self-learning capabilities.

Ways to stand out from the crowd:

  • Experience with CI/CD pipelines and modern DevOps practices. Familiar with cloud provisioning and scheduling tools (Kubernetes, SLURM).

  • Curiosity about GPUs, TPUs, cloud and performance benchmarking.

  • Familiar with ML/DL techniques, algorithms and frameworks like TensorFlow or PyTorch. Experience in AI model inference deployment and training launching.

  • Background of system-level problem solving.

Show more

משרות נוספות שיכולות לעניין אותך

15.11.2025
N

Nvidia Senior Software Architect Humanoid Robotics China, Shanghai

Limitless High-tech career opportunities - Expoint
Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms;. Deploy and evaluate neural network models in physics simulation and on real humanoid hardware;. Design and maintain...
תיאור:
China, Shanghai
time type
Full time
posted on
Posted 2 Days Ago
job requisition id

What you will be doing:

  • Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms;

  • Deploy and evaluate neural network models in physics simulation and on real humanoid hardware;

  • Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision;

  • Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability;

  • Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection;

  • Collaborate with researchers on model training, data processing, and MLOps lifecycle.

What we need to see:

  • Bachelor’s degree in Computer Science, Robotics, Engineering, or a related field;

  • 3+ years of full-time industry experience in robotics hardware or software full-stack;

  • Hands-on experience with deploying and debugging neural network models on robotic hardware;

  • Ability to implement real-time control algorithms, teleoperation stack, and sensor fusion;

  • Proficiency in languages such as Python, C++, and experience with robotics frames (ROS) and physics simulation (Gazebo, Mujoco, Isaac, etc.).

  • Experience in maintaining and troubleshooting robotic systems, including mechanical, electrical, and software components.

  • Physically work on-site on all business days.

Ways to stand out from the crowd:

  • Master’s or PhD’s degree in Computer Science, Robotics, Engineering, or a related field;

  • Experience at humanoid robotics companies on real hardware deployment;

  • Experience in robot hardware design;

  • Demonstrated Tech Lead experience, coordinating a team of robotics engineers and driving projects from conception to deployment.

Show more

משרות נוספות שיכולות לעניין אותך

15.11.2025
N

Nvidia Senior AI Performance Efficiency Engineer China, Shanghai

Limitless High-tech career opportunities - Expoint
Collaborate closely with our AI/ML researchers to make their ML models more efficient leading to significant productivity improvements and cost savings. Build tools, frameworks, and apply ML techniques to detect...
תיאור:
China, Shanghai
time type
Full time
posted on
Posted 4 Days Ago
job requisition id

What you will be doing:

  • Collaborate closely with our AI/ML researchers to make their ML models more efficient leading to significant productivity improvements and cost savings

  • Build tools, frameworks, and apply ML techniques to detect & analyze efficiency bottlenecks and deliver productivity improvements for our researchers

  • Work with researchers working on a variety of innovative ML workloads across Robotics, Autonomous vehicles, LLM’s, Videos and more

  • Collaborate across the engineering organizations to deliver efficiency in our usage of hardware, software, and infrastructure

  • Proactively monitor fleet wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them

  • Keep up to date with the most recent developments in AI/ML technologies, frameworks, and successful strategies, and advocate for their integration within the organization.

What we need to see:

  • BS or similar background in Computer Science or related area (or equivalent experience)

  • Minimum 8+ years of experience designing and operating large scale compute infrastructure

  • Strong understanding of modern ML techniques and tools

  • Experience investigating, and resolving, training & inference performance end to end

  • Debugging and optimization experience with NSight Systems and NSight Compute

  • Experience with debugging large-scale distributed training using NCCL

  • Proficiency in programming & scripting languages such as Python, Go, Bash, as well as familiarity with cloud computing platforms (e.g., AWS, GCP, Azure) in addition to experience with parallel computing frameworks and paradigms.

  • Dedication to ongoing learning and staying updated on new technologies and innovative methods in the AI/ML infrastructure sector.

  • Excellent communication and collaboration skills, with the ability to work effectively with teams and individuals of different backgrounds

Ways to stand out from the crowd:

  • Background with NVIDIA GPUs, CUDA Programming, NCCL and MLPerf benchmarking

  • Experience with Machine Learning and Deep Learning concepts, algorithms and models

  • Familiarity with InfiniBand with IBOP and RDMA

  • Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloads

  • Familiarity with deep learning frameworks like PyTorch and TensorFlow

Show more

משרות נוספות שיכולות לעניין אותך

15.11.2025
N

Nvidia Deep Learning Performance Architect - Intern China, Shanghai

Limitless High-tech career opportunities - Expoint
Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. Develop analytical...
תיאור:
China, Shanghai
time type
Full time
posted on
Posted 6 Days Ago
job requisition id

What you’ll be doing:

  • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products.

  • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.

  • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.

  • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

What we need to see:

  • BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).

  • Strong programming skills in Python, C, C++.

  • Strong background in computer architecture.

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Prior experience with LLM or generative AI algorithms.

Ways to stand out from the crowd:

  • GPU Computing and parallel programming models such as CUDA and OpenCL.

  • Architecture of or workload analysis on other deep learning accelerators.

  • Deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, TensorRT-LLM, vLLM, etc.).

  • Open-sourceAIcompilers (OpenAI Triton, MLIR, TVM, XLA, etc.).

and proud to be an

Show more

משרות נוספות שיכולות לעניין אותך

15.11.2025
N

Nvidia Performance Engineering Intern - China, Shanghai

Limitless High-tech career opportunities - Expoint
Identify, run graphics, studio and WinAI benchmarks across servers, PCs, workstations and laptops. Compose competitive analysis reports for internal and external customers to position NVIDIA products appropriately using their evaluation....
תיאור:
China, Shanghai
time type
Full time
posted on
Posted 2 Days Ago
job requisition id

What you’ll be doing:

  • Identify, run graphics, studio and WinAI benchmarks across servers, PCs, workstations and laptops.

  • Compose competitive analysis reports for internal and external customers to position NVIDIA products appropriately using their evaluation.

  • Develop and maintain automation scripts for games/studio/WinAI performance and system monitoring data collection on Windows and Linux to speed up providing business and engineering insights.

  • Develop, implement and maintain tools to improve testing efficiency.

What we need to see:

  • Pursuing BS in Computer Science or similar computer discipline.

  • Good knowledge of Python or other scripting languages.

  • Experienced and passionate about PC games or content creation.

  • Linux and Windows knowledge.

  • Good knowledge of PC systems and components.

  • Capability to work with a lot of data.

  • Good organizational, time management and task prioritization skills.

Ways to stand out from the crowd:

  • Familiar with GenAI or LLM is a plus.

  • Knowledge of OpenGL, Direct X and D3D.

  • Knowledge of Data Visualization.

  • Good knowledge of NVIDIA GeForce and RTX PRO series.

Show more

משרות נוספות שיכולות לעניין אותך

10.11.2025
N

Nvidia Senior Software Engineer - Performance China, Shanghai

Limitless High-tech career opportunities - Expoint
Develop, maintain and optimize performance KPIs necessary to deliver NVIDIA’s L2/L3/L4 autonomous driving solutions. Devise acceleration strategies and patterns to improve software architecture and its efficiency on our computers with...
תיאור:
China, Shanghai
time type
Full time
posted on
Posted 11 Days Ago
job requisition id

Intelligent machines powered by Artificial Intelligence computers that can learn, reason and interact with people are no longer science fiction. Today, a self-driving car powered by AI can meander through a country road at night and find its way. An AI-powered robot can learn motor skills through trial and error — this is truly an extraordinary time and the era of AI has begun.

What you’ll be doing:
  • Develop, maintain and optimize performance KPIs necessary to deliver NVIDIA’s L2/L3/L4 autonomous driving solutions

  • Devise acceleration strategies and patterns to improve software architecture and its efficiency on our computers with multiple heterogeneous hardware engines while meeting or exceeding product goals

  • Develop highly efficient product code in C++, making use of algorithmic parallelism offered by GPGPU programming (CUDA)/ARM NEON while following quality and safety standards such as defined by MISRA

  • Diagnose and fix performance issues reported on our target platform including on-road & simulation

What we need to see:
  • BS/MS or higher in computer science or a related engineering discipline

  • Excellent C and C++ programming skills

  • 10+ years of relevant industry experience

  • Strong knowledge of programming and debugging techniques, especially for parallel architectures

  • Good understanding of System SW / Operating Systems and Computer architecture

  • Experience with performance analysis, optimizations and benchmarking

  • You have excellent analytical, written and verbal interpersonal skills

Ways to stand out from the crowd:
  • Understanding of Embedded architectures and Real-time operating systems & scheduling

  • Strong mathematical fundamentals, including linear algebra and numerical methods

  • Experience implementing algorithms in Robotics, Computer Vision and/or Machine Learning

  • Software development experience with CUDA/GPGPU or any data parallel architectures

Show more

משרות נוספות שיכולות לעניין אותך

Limitless High-tech career opportunities - Expoint
Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. Develop analytical...
תיאור:
China, Shanghai
China, Beijing
time type
Full time
posted on
Posted 4 Days Ago
job requisition id

What you’ll be doing:

  • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products.

  • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.

  • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.

  • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

What we need to see:

  • BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).

  • Strong programming skills in Python, C, C++.

  • Strong background in computer architecture.

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Prior experience with LLM or generative AI algorithms.

Ways to stand out from the crowd:

  • GPU Computing and parallel programming models such as CUDA and OpenCL.

  • Architecture of or workload analysis on other deep learning accelerators.

  • Deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, TensorRT-LLM, vLLM, etc.).

  • Open-sourceAIcompilers (OpenAI Triton, MLIR, TVM, XLA, etc.).

and proud to be an

Show more
בואו למצוא את עבודת החלומות שלכם בהייטק עם אקספוינט. באמצעות הפלטפורמה שלנו תוכל לחפש בקלות הזדמנויות Accelerated Compute Systems Performance Architect בחברת Nvidia ב-China, Shanghai. בין אם אתם מחפשים אתגר חדש ובין אם אתם רוצים לעבוד עם ארגון ספציפי בתפקיד מסוים, Expoint מקלה על מציאת התאמת העבודה המושלמת עבורכם. התחברו לחברות מובילות באזור שלכם עוד היום וקדמו את קריירת ההייטק שלכם! הירשמו היום ועשו את הצעד הבא במסע הקריירה שלכם בעזרת אקספוינט.