

What you’ll be doing:
Be responsible for running test cases to validate NVIDIA GPU Communications Libraries (NCCL, NVSHMEM, UCX, GDRCopy, GPUDirect RDMA etc).
Be responsible to automate test cases and maintain the automation scripts.
Collaborate with Developer, PM, marketing, and engineering teams on crafting test plan and implementing validation.
You will assist in the architecture, crafting and implementing of SWQA test frameworks.
Be responsible for code coverage improvement and code complexity optimization.
What we need to see:
BS or higher degree in CS/EE/CE or equivalent experience
5+ years of relevant experience
Seasoned software QA or software testing background; test infrastructure and strong analysis skills
Be proficient in scripting language (Python, Perl, bash)
Solid experience with AI development tools for test development and automation
Knowledge of basic networking concepts
UNIX/Linux experience is required
Experiences in C/C++ is required
Ability to work independently and leadership skillsas well as experience in using quality mindset to drive improvements
Proficient oral and written English
Ways to stand out from the crowd:
Experience with CUDA programming and NVIDIA GPUs
Knowledge of high-performance networks like InfiniBand, RoCE,etc
Experience with CSPs(AWS, Google Cloud, Oracle Cloud Infrastructure, Microsoft Azure), andHPC cluster,slurm, ansible, etc
Prior experience with virtualization technologies (KVM, HyperV, VMWARE, OpenStack, Docker, Kubernetes)
Experience with Deep Learning Frameworks such as PyTorch, TensorFlow, etc
משרות נוספות שיכולות לעניין אותך

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What you'll be doing:Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures
Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance
Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack
Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks
Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
4 + years of relevant work experience
Excellent C/C++ programming and software engineering skills, ACM background is a plus
Good fundamental knowledges on computer architecture
Strong ability in abstracting problems and the methodology in resolving problems
Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired
Good knowledge of GPU architecture and fast kernel programming skills is a plus
Knowledge of LLM algorithms or a certain HPC domain is a plus
Knowledge of multi-GPU distributed communication is a plus
Excellent oral communication in English is a plus

What you will be doing:
Design and implementfunctional/performancetests for CUDA products, like driver and library.
Automate CUDA tests, design test plans and integrate into automation testinginfrastructure.
Triage test results, root cause test failures or performance drops, and drive through bugs to fix.
Develop scripts/tools and optimize workflow to improve efficiency and productivity.
What we need to see:
MS or PhD degree from a leading university in computer science or a related field.
At least 3 years of relevant professional experience.
Excellent QA sense, knowledge, and experience in software testing.
Rich experience in test case development, tests automation and failure analysis.
Proficient programming and debugging skills in C/C++ and Python.
Comprehensive knowledge of Linux and Windows operating systems.
Experience in using AI development tools for test plans creation, test cases development and test cases automation.
Ways to stand out from the crowd:
Excellent English communication and collaboration skills.
Strong understanding of CUDA, HPC, Gcov, VectorCAST, Coverity.

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years.a unique legacy of innovationfueled by great technology—and amazing people.
What you'll be doing:
establishintegrations with NVIDIA Cloud Partners, enabling global developers to easily access GPU-optimized virtual machines.
You will craft and implement IaaS API integrations, collaborating with external engineering teams to ensure reliable, scalable, and consistent connectivity across diverse cloud environments.
Shape integration strategies, develop stateful workflow orchestration, and drive improvements in testing, observability, and automation to ensure high-quality, fault-tolerant solutions.
Be responsible for developing the two-sided marketplace, including the integration ofcomputeproviders and crafting discovery and bidding experiences to match supply with demand.
What we need to see:
5+ years of experience in developing software infrastructure for large-scale AI systems, with a proventrack recordof impact.
Expertise in software engineering withkubernetes, including cluster operations, operator development, node health monitoring, and GPU resource scheduling.
Familiarity with setting up cloud infrastructure environments (VMaaS, VPCs, RDMA, sharedfile-systems).
Proven ability to handle 3rd party API integrations, including communication with external teams, writing API clients, and improving integration reliability.
Comfort in a fast-paced environment, with the ability to collaborate and debug integrations with external engineering teams.
Strong technical knowledge, including proficiencyin a systems programming language (preference for Go) and a solid understanding of software design patterns for stateful workflow orchestration.
BS in Computer Science, Engineering, Physics, Mathematics, or a comparable degree or equivalent experience.
, distributed systems, and API development.

What you'll be doing:
Work on NVIDIA's next generation of Video Decoder and Encoder hardware architecture.
Research and study new video compression technology, specifications, papers etc.
Develop c-model for algorithm study, hardware simulation and verification.
Define the testplan, write architecture document, verify c-model and improve model coverage.
What we need to see:
Master degree or above in Computer Science, Electronic Engineering.
Minimum of 3 years' experience in the field of video technology ranging from codec, implementation, pre/post-processing, rate control and etc.
Good programming skill and C/C++ coding abilities.
Fluent English (both written and spoken) and good communication skill.
Ways to stand out from the crowd:
Project experiences in video encoder, decoder or computer vision.
Experience with video codec such as H264,HEVC, VP9, AV1or VVC.
Experience with DL video/image processing.
Creative, strong analysis, design and debug skill.

What you will be doing:
Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms;
Deploy and evaluate neural network models in physics simulation and on real humanoid hardware;
Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision;
Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability;
Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection;
Collaborate with researchers on model training, data processing, and MLOps lifecycle.
What we need to see:
Bachelor’s degree in Computer Science, Robotics, Engineering, or a related field;
3+ years of full-time industry experience in robotics hardware or software full-stack;
Hands-on experience with deploying and debugging neural network models on robotic hardware;
Ability to implement real-time control algorithms, teleoperation stack, and sensor fusion;
Proficiency in languages such as Python, C++, and experience with robotics frames (ROS) and physics simulation (Gazebo, Mujoco, Isaac, etc.).
Experience in maintaining and troubleshooting robotic systems, including mechanical, electrical, and software components.
Physically work on-site on all business days.
Ways to stand out from the crowd:
Master’s or PhD’s degree in Computer Science, Robotics, Engineering, or a related field;
Experience at humanoid robotics companies on real hardware deployment;
Experience in robot hardware design;
Demonstrated Tech Lead experience, coordinating a team of robotics engineers and driving projects from conception to deployment.

What you will be doing:
Collaborate closely with our AI/ML researchers to make their ML models more efficient leading to significant productivity improvements and cost savings
Build tools, frameworks, and apply ML techniques to detect & analyze efficiency bottlenecks and deliver productivity improvements for our researchers
Work with researchers working on a variety of innovative ML workloads across Robotics, Autonomous vehicles, LLM’s, Videos and more
Collaborate across the engineering organizations to deliver efficiency in our usage of hardware, software, and infrastructure
Proactively monitor fleet wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them
Keep up to date with the most recent developments in AI/ML technologies, frameworks, and successful strategies, and advocate for their integration within the organization.
What we need to see:
BS or similar background in Computer Science or related area (or equivalent experience)
Minimum 8+ years of experience designing and operating large scale compute infrastructure
Strong understanding of modern ML techniques and tools
Experience investigating, and resolving, training & inference performance end to end
Debugging and optimization experience with NSight Systems and NSight Compute
Experience with debugging large-scale distributed training using NCCL
Proficiency in programming & scripting languages such as Python, Go, Bash, as well as familiarity with cloud computing platforms (e.g., AWS, GCP, Azure) in addition to experience with parallel computing frameworks and paradigms.
Dedication to ongoing learning and staying updated on new technologies and innovative methods in the AI/ML infrastructure sector.
Excellent communication and collaboration skills, with the ability to work effectively with teams and individuals of different backgrounds
Ways to stand out from the crowd:
Background with NVIDIA GPUs, CUDA Programming, NCCL and MLPerf benchmarking
Experience with Machine Learning and Deep Learning concepts, algorithms and models
Familiarity with InfiniBand with IBOP and RDMA
Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloads
Familiarity with deep learning frameworks like PyTorch and TensorFlow

What you’ll be doing:
Be responsible for running test cases to validate NVIDIA GPU Communications Libraries (NCCL, NVSHMEM, UCX, GDRCopy, GPUDirect RDMA etc).
Be responsible to automate test cases and maintain the automation scripts.
Collaborate with Developer, PM, marketing, and engineering teams on crafting test plan and implementing validation.
You will assist in the architecture, crafting and implementing of SWQA test frameworks.
Be responsible for code coverage improvement and code complexity optimization.
What we need to see:
BS or higher degree in CS/EE/CE or equivalent experience
5+ years of relevant experience
Seasoned software QA or software testing background; test infrastructure and strong analysis skills
Be proficient in scripting language (Python, Perl, bash)
Solid experience with AI development tools for test development and automation
Knowledge of basic networking concepts
UNIX/Linux experience is required
Experiences in C/C++ is required
Ability to work independently and leadership skillsas well as experience in using quality mindset to drive improvements
Proficient oral and written English
Ways to stand out from the crowd:
Experience with CUDA programming and NVIDIA GPUs
Knowledge of high-performance networks like InfiniBand, RoCE,etc
Experience with CSPs(AWS, Google Cloud, Oracle Cloud Infrastructure, Microsoft Azure), andHPC cluster,slurm, ansible, etc
Prior experience with virtualization technologies (KVM, HyperV, VMWARE, OpenStack, Docker, Kubernetes)
Experience with Deep Learning Frameworks such as PyTorch, TensorFlow, etc
משרות נוספות שיכולות לעניין אותך