

What you will be doing:
Driving end-to-end responsibility for implementing and integrating major new features into the core storage solutions.
Maintaining and improving the SPDK project (https://spdk.io/), serving the broader open-source storage community.
Collaborating closely with hardware and other software teams to expose new, groundbreaking hardware capabilities.
Implementing critical performance improvements and low-latency optimizations across the storage stack.
Participating actively in the design and refinement of our next-generation technology storage solutions.
What we need to see:
A degree in Computer Science, Software Engineering, or a related field (or equivalent experience).
8+ years of expertise as a C/C++ software engineer, with a strong emphasis on systems-level programming.
Deep knowledge of Linux and Networking stack fundamentals.
Demonstrated leadership skills; quickly adapting to new technical environments and providing clear technical guidance to peers.
Proven familiarity with storage protocols (e.g., NVMe).
A commitment to high-quality code through rigorous testing, code reviews, and robust design practices.
Ways to stand out from the crowd:
Extensive experience with RDMA (Remote Direct Memory Access) and its application in low-latency systems.
A deep understanding of SoC (System-on-a-Chip) hardware design and its influence on software performance.
Serving as a core maintainer or significant contributor to a widely-used, high-visibility open-source project.
Prior work with kernel bypass techniques and user-mode driver architectures.
משרות נוספות שיכולות לעניין אותך

What you’ll be doing:
You will apply knowledge of compute programming models and compute architecture to build tools that provide actionable feedback to compute developers. You should be comfortable working in existing driver code and application code as well as writing new shared libraries and targeted performance tests, and have an eagerness to learn about new compute and graphics drivers, GPU architectures and operating systems.
Develop the Compute Sanitizer (which is a suite of memory checker) tools for GPUs running on Linux, Windows, and embedded operating systems.
Work with tools, compiler, architecture and driver teams to design, implement and verify new features in the Compute Sanitizer stack.
Work closely with internal and external partners including other peer organizations within NVIDIA.
Effectively estimate and prioritize tasks in order to create a realistic delivery schedule.
Write fast, effective, maintainable, reliable and well-documented code.
Provide peer reviews to other engineers, including feedback on performance, scalability and correctness.
Document requirements and designs, and review documents with teams throughout NVIDIA.
Mentor junior engineers.
What we need to see:
BS or MS in Computer Science or equivalent experience
5+ years of experience
Strong programming ability in C, C++, Assembly Language and scripting languages
Excellent knowledge of computer architecture of x86 or ARM CPUs
Strong problem solving and debugging skills
Familiar with low-level programming using assembly languages
Source control understanding (git, Perforce, etc.)
Ability to self-manage, communicate, and adapt in a fast paced, high demand environment with changing priorities and direction
Excellent communication skills, written and verbal
Ways to stand out from the crowd:
CUDA/OpenCL knowledge
Experience with code patching
ELF/DWARF knowledge
משרות נוספות שיכולות לעניין אותך

We are looking for aSenior Deep Learning Engineerto help bring Cosmos World Foundation Models from research into efficient, production-grade systems. You’ll focus on optimizing and deploying models for high-performance inference on diverse GPU platforms. This role sits at the intersection of deep learning, systems, and GPU optimization - working closely with research scientists, software engineers, and hardware experts.
What you'll be doing:
Improve inference speed for Cosmos WFMs on GPU platforms.
Effectively carry out the production deployment of Cosmos WFMs.
Profile and analyze deep learning workloads to identify and remove bottlenecks.
What we need to see:
5+ years of experience.
MSc or PhD in CS, EE, or CSEE or equivalent experience.
Strong background in Deep Learning.
Strong programming skills in Python and PyTorch.
Experience with inference optimization techniques (such as quantization) and inference optimization frameworks, one of: TensorRT, TensorRT-LLM, vLLM, SGLang.
Ways to stand out from the crowd:
Familiarity with deploying Deep Learning models in production settings (e.g., Docker, Triton Inference Server).
CUDA programming experience.
Familiarity with diffusion models.
Proven experience in analyzing, modeling, and tuning the performance of GPU workloads, both inference and training.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Running a lot of builds and tests on a lot of architectures, operating systems, and devices.
Collecting a lot of data and working collaboratively to brainstorm and build infrastructure and tools to make sense of it all.
Building relationships that allow us to work together as a team, not a group.
Working in a highly dynamic environment where we have to think on our feet.
What we need to see:
6+ years of relevant industry experience.
Proficient with Linux.
Bachelors degree in a related area of study or equivalent experience.
Expert with scripting in one or more of Python, Perl, shell, Groovy, etc..
Strong background with deploying, configuring, and debugging distributed systems.
You should be familiar with the software build process (read compiling C++ code with GNU Make, CMake, Visual Studio, MSBuild, etc.).
Background with some form of source control management (SCM), preferably git.
Familiar with containers.
Ways to stand out from the crowd:
Experience with HPC hardware systems such as compute clusters and HPC software performance benchmarking on such systems.
System administrator level experience with multi-user Linux servers.
Background with GPU accelerated systems.
Experience working in an environment where Agile processes and methodologies are used.
משרות נוספות שיכולות לעניין אותך

What you’ll be doing:
Implement deep learning models from multiple data domains (CV, NLP/LLMs, ASR, TTS, RecSys and others) in multiple DL frameworks (PyT, JAX, TF2, DGL and others)
Implement and test new SW features (Graph Compilation, reduced precision training) that use the most recent HW functionalities.
Analyze, profile, and optimize deep learning workloads on state-of-the-art hardware and software platforms.
Collaborate with researchers and engineers across NVIDIA, providing guidance on improving the design, usability and performance of workloads.
Lead best-practices for building, testing, and releasing DL software.
Contribute to creation of large scale benchmarking system, capable of testing thousands of models on vast diversity of hardware and software stacks.
What we need to see:
3+ years of experience in DL model implementation and SW Development.
BSc, MS or PhD degree in Computer Science, Computer Architecture or related technical field.
Excellent Python programming skills.
Extensive knowledge of at least one DL Framework (PyTorch, TensorFlow, JAX, MxNet) with practical experience in PyTorch required.
Strong problem solving and analytical skills.
Algorithms and DL fundamentals.
Docker containerization fundamentals.
Ways to stand out from the crowd:
Experience in performance measurements and profiling.
Experience with containerization technologies such as Docker.
GPU programming experience (CUDA or OpenCL) is a plus but not required.
Knowledge and love for DevOps/MLOps practices for Deep Learning-based product’s development.
Experience with CI systems (preferably GitLab).
משרות נוספות שיכולות לעניין אותך

Shape the future of AI by contributing to software used by the global community. Collaborate with top-tier software engineers to develop a comprehensive toolset that rigorously tests deep learning models and frameworks on the most powerful computers. Ability to work in a multifaceted, fast-paced environment is required, as well as strong social skills.
What you’ll be doing:
Automating and optimizing testing of deep learning models and AI services from different domains, with a focus on inference.
Developing shared utilities for setting up systems, running tests, and recording results.
Configuring, maintaining, and building upon deployments of industry-standard tools.
Lead best practices for building, testing, and releasing software, including AI services and Deep Learning models, and documenting them.
Identifying infrastructure needs and translating them into action.
Building tools for automatic content generation mechanisms that save dozens of engineering hours.
What we need to see:
BSc or MS degree in Computer Science, Software Architecture or related engineering field.
5+ years of work experience in software development.
Excellent Python programming and system design skills.
Understanding of Deep Learning foundations, allowing benchmarking on DL models and AI services.
Strong analytical and problem-solving skills, and a proactive, data-driven approach.
Effective time-management and organization skills for coordinating multiple initiatives, priorities, and implementations of new technology and products into very complex projects.
Effective communication, an open-minded attitude, and comprehensive documentation practices.
Ways to stand out from the crowd:
Proficiency in Linux Environments andContainerization:
Expertise in ContinuousIntegration/Deployment(CI/CD) and Large-Scale Automation
Familiarity with Front-end and Backend Python Frameworks
Experience with High-Performance Computing (HPC) Clusters and Orchestration Solutions like Slurm and Kubernetes
Understanding of Cloud Services, MLOps, DevOps, SRE, and AI Agentic Tools
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Understand, analyze, profile, and optimize deep learning training and inference workloads on state-of-the-art hardware and software platforms.
Collaborate with researchers and engineers across NVIDIA, providing guidance on improving the performance of workloads.
Implement production-quality software across NVIDIA's deep learning platform stack.
Build tools to automate workload analysis, workload optimization, and other critical workflows.
What we want to see:
5+ years of experience.
MSc or PhD in CS, EE or CSEE or equivalent experience.
Strong background in deep learning and neural networks, both training & inference.
Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
Proven experience analyzing, modeling and tuning application performance.
Programming skills in C++ and Python.
Ways to stand out from the crowd:
Experience with modern LLM inference frameworks (TRT-LLM, vLLM, Ollama, etc.)
Strong fundamentals in algorithms.
Experience with production deployment of Deep Learning models.
Proven experience with processor and system-level performance modelling.
GPU programming experience (CUDA or OpenCL) is a strong plus but not required.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Driving end-to-end responsibility for implementing and integrating major new features into the core storage solutions.
Maintaining and improving the SPDK project (https://spdk.io/), serving the broader open-source storage community.
Collaborating closely with hardware and other software teams to expose new, groundbreaking hardware capabilities.
Implementing critical performance improvements and low-latency optimizations across the storage stack.
Participating actively in the design and refinement of our next-generation technology storage solutions.
What we need to see:
A degree in Computer Science, Software Engineering, or a related field (or equivalent experience).
8+ years of expertise as a C/C++ software engineer, with a strong emphasis on systems-level programming.
Deep knowledge of Linux and Networking stack fundamentals.
Demonstrated leadership skills; quickly adapting to new technical environments and providing clear technical guidance to peers.
Proven familiarity with storage protocols (e.g., NVMe).
A commitment to high-quality code through rigorous testing, code reviews, and robust design practices.
Ways to stand out from the crowd:
Extensive experience with RDMA (Remote Direct Memory Access) and its application in low-latency systems.
A deep understanding of SoC (System-on-a-Chip) hardware design and its influence on software performance.
Serving as a core maintainer or significant contributor to a widely-used, high-visibility open-source project.
Prior work with kernel bypass techniques and user-mode driver architectures.
משרות נוספות שיכולות לעניין אותך