

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What you'll be doing:Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures
Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance
Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack
Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks
Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
4 + years of relevant work experience
Excellent C/C++ programming and software engineering skills, ACM background is a plus
Good fundamental knowledges on computer architecture
Strong ability in abstracting problems and the methodology in resolving problems
Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired
Good knowledge of GPU architecture and fast kernel programming skills is a plus
Knowledge of LLM algorithms or a certain HPC domain is a plus
Knowledge of multi-GPU distributed communication is a plus
Excellent oral communication in English is a plus
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Developing and introducing groundbreaking reinforcement learning algorithms tailored for LLM applications.
Collaborating with a world-class team of engineers and researchers to integrate these algorithms into applied scenarios.
Using your extensive expertise in math and AI to improve the reasoning capabilities of our models.
Engaging in rigorous testing and refinement processes to ensure flawless performance and reliability.
Contributing to our collective goal of delivering industry-leading AI solutions, strictly adhering to NVIDIA's high standards.
What we need to see:
Proficient in C++/Python programming.
3+ years working experience.
BS or MS (or equivalent experience) in CS, CE, EE, or a related field.
Proven experience in reinforcement learning and its application to large language models.
Strong background in mathematics and AI algorithms, with a focus on reinforcement learning.
Demonstrated history of applying reinforcement learning algorithms in practical scenarios.
Understanding of GPU architecture is a huge plus.
Excellent problem-solving skills and the ability to work collaboratively in a dynamic team environment.
A passion for innovation and a dedication to achieving outstanding results.

What you'll be doing:
Define clear vision and roadmap for productivity efficiency improvement solutions in alignment with business needs, and drive execution from design through delivery.
Lead cross-function engineering teams on project deliverables commitment to streamline the system design and verification process and workflow.
Partner with global automation and Infrastructure teams to design, build, and maintain large-scale, cloud-based and on-premises infrastructures.
Stay hands-on technically, providing architectural guidance on complex infrastructure challenges.
Cross-function Collaboration with ASIC, SW, System Design, Product, Security, and Operations teams to ensure reliability, scalability, and performance, fostering a culture of technical excellence, collaboration, and ownership.
Continuously improve processes to ensure efficiency, reliability, and adaptability.
What we need to see:
MS in EE, CE, CS, or Systems Engineering (or equivalent experience).
10+ years of large-scale software development experience and experience in framework architecture design.
Strong skills in Python, Javascript.
Experience in HTML5, CSS, NodeJS, or React
Hands-on experience in AI/ML and data analysis, preferably with exposure to large-scale datasets.
Good communication and organization skills, with a logical approach to problem-solving, and task prioritization skills.
Effective experience working with global multi-functional teams, principals, and architects across organizational boundaries.
Ways to stand out from the crowd:
Knowledge of System Design, CPU and/or GPU architecture.
Background with large-scale full-stack development.
Experience with software design flow management or project management.
Demonstrable knowledge of Elastic Stack (Elasticsearch, Logstash etc) and Kafka.
Knowledge of telemetry, observability, or monitoring frameworks (e.g., Kibana, Grafana, OpenTelemetry).

What you will be doing:
Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms;
Deploy and evaluate neural network models in physics simulation and on real humanoid hardware;
Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision;
Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability;
Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection;
Collaborate with researchers on model training, data processing, and MLOps lifecycle.
What we need to see:
Bachelor’s degree in Computer Science, Robotics, Engineering, or a related field;
3+ years of full-time industry experience in robotics hardware or software full-stack;
Hands-on experience with deploying and debugging neural network models on robotic hardware;
Ability to implement real-time control algorithms, teleoperation stack, and sensor fusion;
Proficiency in languages such as Python, C++, and experience with robotics frames (ROS) and physics simulation (Gazebo, Mujoco, Isaac, etc.).
Experience in maintaining and troubleshooting robotic systems, including mechanical, electrical, and software components.
Physically work on-site on all business days.
Ways to stand out from the crowd:
Master’s or PhD’s degree in Computer Science, Robotics, Engineering, or a related field;
Experience at humanoid robotics companies on real hardware deployment;
Experience in robot hardware design;
Demonstrated Tech Lead experience, coordinating a team of robotics engineers and driving projects from conception to deployment.

What you'll be doing:
• Work on research, design and implementation of software features that are beneficial to our customers to meet their performance targets and build unique values
• World model and nvidia cosmos related development and bug fixes
• Develop solutions for DNN models acceleration optimization and deployment.
What we need to see:
• Subject to arrangement to different works at any time to take on different tasks and challenges
• Self-motivated attitude and motivation to make things success, intent to learn
• BS/MS degree in Computer Science/EE or related
• Proven fundamentals in c++/python programming and SW design and debug skills
• Abundant knowledge of ML/DL techniques for Computer Vision and autonomous driving
• Believe that experimenting is the only way to find the truth, not argument and arbitary guess
• Located in Shanghai or willing to work in Shanghai
Ways to stand out from the crowd:
• DNN development and network acceleration is highly desired
• Familiarity with GPU computing/NVIDIA CUDA/NVIDIA TensorRT

What you'll be doing:
Developing and introducing groundbreaking reinforcement learning algorithms tailored for LLM applications.
Collaborating with a world-class team of engineers and researchers to integrate these algorithms into applied scenarios.
Using your extensive expertise in math and AI to improve the reasoning capabilities of our models.
Engaging in rigorous testing and refinement processes to ensure flawless performance and reliability.
Contributing to our collective goal of delivering industry-leading AI solutions, strictly adhering to NVIDIA's high standards.
What we need to see:
Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
Proficient in C++/Python programming.
Proven experience in reinforcement learning and its application to large language models.
Strong background in mathematics and AI algorithms, with a focus on reinforcement learning.
Demonstrated history of applying reinforcement learning algorithms in practical scenarios.
Understanding of GPU architecture is a huge plus.
Excellent problem-solving skills and the ability to work collaboratively in a dynamic team environment.
A passion for innovation and a dedication to achieving outstanding results.

What you’ll be doing:
Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products.
Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.
Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.
Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.
What we need to see:
BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).
Strong programming skills in Python, C, C++.
Strong background in computer architecture.
Experience with performance modeling, architecture simulation, profiling, and analysis.
Prior experience with LLM or generative AI algorithms.
Ways to stand out from the crowd:
GPU Computing and parallel programming models such as CUDA and OpenCL.
Architecture of or workload analysis on other deep learning accelerators.
Deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, TensorRT-LLM, vLLM, etc.).
Open-sourceAIcompilers (OpenAI Triton, MLIR, TVM, XLA, etc.).
and proud to be an

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What you'll be doing:Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures
Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance
Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack
Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks
Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
4 + years of relevant work experience
Excellent C/C++ programming and software engineering skills, ACM background is a plus
Good fundamental knowledges on computer architecture
Strong ability in abstracting problems and the methodology in resolving problems
Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired
Good knowledge of GPU architecture and fast kernel programming skills is a plus
Knowledge of LLM algorithms or a certain HPC domain is a plus
Knowledge of multi-GPU distributed communication is a plus
Excellent oral communication in English is a plus
משרות נוספות שיכולות לעניין אותך