

In this role you will be interacting with internal partners, users, and members of the open source community to analyze, define and implement highly optimized algorithms and DL frameworks. The scope of these efforts includes a combination of performance tuning and analysis, defining APIs, analyzing functionality coverage, implementing new algorithms and frameworks, and other general software engineering work.
What you’ll be doing:
Research, analyze, and document state-of-the art algorithms
Design and implement a deep learning framework for model optimization
Develop algorithms for deep learning, data analytics, machine learning, or scientific computing
Analyze performance of GPU implementations
Benchmark software stacks across training and inference scenarios
Evaluate and understand capabilities of frontier models
Collaborate with team members and other partners
What we need to see:
Pursuing MSc or PhD in Computer Science, Artificial Intelligence, Applied Math, or related field
Excellent programming in Python, debugging, performance analysis, and test design skills
Strong algorithms and mathematical fundamentals
Good understanding of Deep Learning fundamentals
Ability to work independently and manage your own development effort
Good communication and documentation habits
Ways to stand out from the crowd:
Deep Learning experience
Experience with DL Frameworks (PyTorch preferred)andLarge Language Models
Experience with model compression techniques such as pruning, NAS, distillation, and quantization
Knowledge of CPU and/or GPU architecture
First-author publication in a top-tier deep learning or AI conference
משרות נוספות שיכולות לעניין אותך

In this role you will be part of our team responsible for the development of libraries that provide groundbreaking functionality and performance. The internship may include extending the capabilities of existing as well as building new libraries that will be used in various AI and HPC applications. It will involve working with senior software engineers who will provide mentorship and guidance. The project will include implementing new algorithms, defining APIs, analyzing performance, finding appropriate solutions for difficult numerical corner cases, and other general software engineering work.
What you’ll be doing:
Collaborate with team members to understand software use cases and requirements
Analyze the performance of GPU or CPU implementations and find opportunities for improvements
Prototype and develop algorithms for single node and multi GPU clusters
What we need to see:
Studying towards a MS or PhD degree in Computational Science, Computer Science, Applied Mathematics, Engineering, or a related field.
Programming skills (C/C++, Python)
Parallel or GPU programming experience (AVX, NEON, OpenMP, MPI, SHMEM, CUDA or OpenCL)
Ways to stand out from the crowd:
Exposure to floating-point arithmetic and numerical error analysis.
Knowledge of GPU/CPU and network hardware architecture.
Understanding of composability and fusions, compilers, and implementation of programming languages
Experience implementing sparse or dense linear algebra algorithms.
Experience with domain-specific language design and compiler optimizations, in particular sparse compilers (MLIR or TACO)
משרות נוספות שיכולות לעניין אותך

What you’ll be doing:
Research, prototype, develop and optimize solutions, tools and libraries for deep learning, data analytics, machine learning, or scientific computing
Analyse, influence and improve deep learning libraries and frameworks standards and APIs according to good engineering practices
Collaborate with team members and other partners
What we need to see:
Excellent Python and C/C++ programming knowledge
8+ years of work experience in software development
Experience in design and implementation of complex systems with decoupled dependencies
Knowledge of design patterns and software engineering principles
Strong analytical skill, knowledge about algorithms and data structures
Strong time-management and organization skills for coordinating multiple initiatives, priorities and implementations of new technology and products into very complex projects.
Good communication and documentation habits
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Driving end-to-end responsibility for implementing and integrating major new features into the core storage solutions.
Maintaining and improving the SPDK project (https://spdk.io/), serving the broader open-source storage community.
Collaborating closely with hardware and other software teams to expose new, groundbreaking hardware capabilities.
Implementing critical performance improvements and low-latency optimizations across the storage stack.
Participating actively in the design and refinement of our next-generation technology storage solutions.
What we need to see:
A degree in Computer Science, Software Engineering, or a related field (or equivalent experience).
8+ years of expertise as a C/C++ software engineer, with a strong emphasis on systems-level programming.
Deep knowledge of Linux and Networking stack fundamentals.
Demonstrated leadership skills; quickly adapting to new technical environments and providing clear technical guidance to peers.
Proven familiarity with storage protocols (e.g., NVMe).
A commitment to high-quality code through rigorous testing, code reviews, and robust design practices.
Ways to stand out from the crowd:
Extensive experience with RDMA (Remote Direct Memory Access) and its application in low-latency systems.
A deep understanding of SoC (System-on-a-Chip) hardware design and its influence on software performance.
Serving as a core maintainer or significant contributor to a widely-used, high-visibility open-source project.
Prior work with kernel bypass techniques and user-mode driver architectures.
משרות נוספות שיכולות לעניין אותך

What you’ll be doing:
You will apply knowledge of compute programming models and compute architecture to build tools that provide actionable feedback to compute developers. You should be comfortable working in existing driver code and application code as well as writing new shared libraries and targeted performance tests, and have an eagerness to learn about new compute and graphics drivers, GPU architectures and operating systems.
Develop the Compute Sanitizer (which is a suite of memory checker) tools for GPUs running on Linux, Windows, and embedded operating systems.
Work with tools, compiler, architecture and driver teams to design, implement and verify new features in the Compute Sanitizer stack.
Work closely with internal and external partners including other peer organizations within NVIDIA.
Effectively estimate and prioritize tasks in order to create a realistic delivery schedule.
Write fast, effective, maintainable, reliable and well-documented code.
Provide peer reviews to other engineers, including feedback on performance, scalability and correctness.
Document requirements and designs, and review documents with teams throughout NVIDIA.
Mentor junior engineers.
What we need to see:
BS or MS in Computer Science or equivalent experience
5+ years of experience
Strong programming ability in C, C++, Assembly Language and scripting languages
Excellent knowledge of computer architecture of x86 or ARM CPUs
Strong problem solving and debugging skills
Familiar with low-level programming using assembly languages
Source control understanding (git, Perforce, etc.)
Ability to self-manage, communicate, and adapt in a fast paced, high demand environment with changing priorities and direction
Excellent communication skills, written and verbal
Ways to stand out from the crowd:
CUDA/OpenCL knowledge
Experience with code patching
ELF/DWARF knowledge
משרות נוספות שיכולות לעניין אותך

We are looking for aSenior Deep Learning Engineerto help bring Cosmos World Foundation Models from research into efficient, production-grade systems. You’ll focus on optimizing and deploying models for high-performance inference on diverse GPU platforms. This role sits at the intersection of deep learning, systems, and GPU optimization - working closely with research scientists, software engineers, and hardware experts.
What you'll be doing:
Improve inference speed for Cosmos WFMs on GPU platforms.
Effectively carry out the production deployment of Cosmos WFMs.
Profile and analyze deep learning workloads to identify and remove bottlenecks.
What we need to see:
5+ years of experience.
MSc or PhD in CS, EE, or CSEE or equivalent experience.
Strong background in Deep Learning.
Strong programming skills in Python and PyTorch.
Experience with inference optimization techniques (such as quantization) and inference optimization frameworks, one of: TensorRT, TensorRT-LLM, vLLM, SGLang.
Ways to stand out from the crowd:
Familiarity with deploying Deep Learning models in production settings (e.g., Docker, Triton Inference Server).
CUDA programming experience.
Familiarity with diffusion models.
Proven experience in analyzing, modeling, and tuning the performance of GPU workloads, both inference and training.
משרות נוספות שיכולות לעניין אותך

What you’ll be doing:
Research, prototype, develop and optimize solutions, tools and libraries for deep learning, data analytics, machine learning, or scientific computing
Analyse, influence and improve deep learning libraries and frameworks standards and APIs according to good engineering practices
Collaborate with team members and other partners
What we need to see:
Excellent Python and C/C++ programming knowledge
8+ years of work experience in software development
Experience in design and implementation of sophisticated systems with decoupled dependencies
Knowledge of design patterns and software engineering principles
Strong analytical skill, knowledge about algorithms and data structures
Strong time-management and organization skills for coordinating multiple initiatives, priorities and implementations of new technology and products into very sophisticated projects
Good communication and documentation habits
משרות נוספות שיכולות לעניין אותך

In this role you will be interacting with internal partners, users, and members of the open source community to analyze, define and implement highly optimized algorithms and DL frameworks. The scope of these efforts includes a combination of performance tuning and analysis, defining APIs, analyzing functionality coverage, implementing new algorithms and frameworks, and other general software engineering work.
What you’ll be doing:
Research, analyze, and document state-of-the art algorithms
Design and implement a deep learning framework for model optimization
Develop algorithms for deep learning, data analytics, machine learning, or scientific computing
Analyze performance of GPU implementations
Benchmark software stacks across training and inference scenarios
Evaluate and understand capabilities of frontier models
Collaborate with team members and other partners
What we need to see:
Pursuing MSc or PhD in Computer Science, Artificial Intelligence, Applied Math, or related field
Excellent programming in Python, debugging, performance analysis, and test design skills
Strong algorithms and mathematical fundamentals
Good understanding of Deep Learning fundamentals
Ability to work independently and manage your own development effort
Good communication and documentation habits
Ways to stand out from the crowd:
Deep Learning experience
Experience with DL Frameworks (PyTorch preferred)andLarge Language Models
Experience with model compression techniques such as pruning, NAS, distillation, and quantization
Knowledge of CPU and/or GPU architecture
First-author publication in a top-tier deep learning or AI conference
משרות נוספות שיכולות לעניין אותך