

Share
What you’ll be doing:
Write safe, scalable, modular, and high-quality (C++/Python) code for our core backend software for LLM inference.
Perform benchmarking, profiling, and system-level programming for GPU applications.
Provide code reviews, design docs, and tutorials to facilitate collaboration among the team.
Conduct unit tests and performance tests for different stages of the inference pipeline.
What we need to see:
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent experience.
Strong coding skills in Python and C/C++.
Knowledgeable and passionate about machine learning and performance engineering.
Proven project experiences in building software where performance is one of its core offerings.
Ways to stand out from the crowd:
Solid fundamentals in machine learning, deep learning, operating systems, computer architecture and parallel programming.
Research experience in systems or machine learning.
Project experience in modern DL software such as PyTorch, CUDA, vLLM, SGLang, and TensorRT-LLM.
Experience with performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.
We strongly encourage you to include sample projects (e.g. Github) that demonstrate the qualifications above.
You will also be eligible for equity and .
These jobs might be a good fit

Share
What you'll be doing:
In this position, you will be responsible for verification of high-speed coherent interconnect design, architecture and golden models.
You will be responsible for micro-architecture using sophisticated verification methodologies.
As a member of our verification team, you'll understand the design & implementation, define the verification scope, develop the verification infrastructure (Testbenches, BFMs, Checkers, Monitors), complete test/coverage plans, and verify the correctness of the design. This role will collaborate with architects, designers, emulation, and silicon verification teams to accomplish your tasks.
What we need to see:
Bachelors or Master’s Degree (or equivalent experience)
3+ years of relevant verification experience.
Experience in architecting test bench environments for unit level verification.
Background in verification using random stimulus along with functional coverage and assertion-based verification methodologies.
Prior Design or Verification experience of Coherent high-speed interconnects.
Knowledge of industry standard interconnect protocols like PCIE, CXL, CHI will be useful.
Strong background developing TB's from scratch using SV and UVM methodology is desired.
C++ programming language experience, scripting ability and an expertise in System Verilog.
Exposure to design and verification tools (VCS or equivalent simulation tools, debug tools like Debussy, GDB).
Strong debugging and analytical skills.
Experienced communication and interpersonal skills are required. A history of mentoring junior engineers and interns a huge plus.
You will also be eligible for equity and .

Share
What you’ll be doing:
Designing, implementing, and maintaining creative software solutions.
Working closely with other teams on new projects, features, and improvements of existing products.
Creatively applying fundamentals of AI, data engineering, data science and data visualization to enhance efficiency in generating performance data.
What we need to see:
Bachelor’s degree in computer science.
8+ years of experience.
Desire to improve code quality by learning and applying computer science fundamentals, algorithms, and data structures.
Comfort with teamwork, collaboration, and a desire to reach across functional borders to develop new partnerships.
Active experience with Python.
Ways to stand out from the crowd:
Background in data science and data engineering.
Experience leveraging large language models for services that generate analytics or code.
Comfort with training, testing and evaluating machine learning models.
Usage of SQL and NoSQL databases technologies.
Passion for spectacular visual experiences from computer graphics.
You will also be eligible for equity and .

Share
What you will be doing:
Implement language and multimodal model inference as part of NVIDIA Inference Microservices (NIMs).
Contribute new features, fix bugs and deliver production code to TRT-LLM, NVIDIA’s open-source inference serving library.
Profile and analyze bottlenecks across the full inference stack to push the boundaries of inference performance.
Benchmark state-of-the-art offerings in various DL models inference and perform competitive analysis for NVIDIA SW/HW stack.
Collaborate heavily with other SW/HW co-design teams to enable the creation of the next generation of AI-powered services.
What we want to see:
PhD in CS, EE or CSEE or equivalent experience.
3+ years of experience.
Strong background in deep learning and neural networks, in particular inference.
Experience with performance profiling, analysis and optimization, especially for GPU-based applications.
Proficient in C++, PyTorch or equivalent frameworks.
Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
Ways to stand out from the crowd:
Proven experience with processor and system-level performance optimization.
Deep understanding of modern LLM architectures.
Strong fundamentals in algorithms.
GPU programming experience (CUDA or OpenCL) is a strong plus
You will also be eligible for equity and .

Share
What you’ll be doing:
Develop verification infrastructure (testbenches, BFMs, checkers, monitors, randoms)
Come up with, review and drive test plan execution for planned features
Understand the performance requirements of your IP, come up with, review and drive performance testplan for your IP
Ensure code and functional coverage of all the RTL which you will verify.
Work with and enable FPGA and software teams to ensure that software is tested.
Plan for and be involved with post-silicon verification and debug.
What we need to see:
BS / MS or equivalent experience.
3+ years of ASIC verification experience of complex design units displaying good attention to detail, teamwork, problem solving and shown success
Exposure to design and verification tools (VCS or equivalent simulation tools, debug tools like Debussy, GDB).
Background with System Verilog and UVM based methodology for ASIC verification.
Ways to stand out from the crowd:
Strong C/C++ programming experience
Prior Design or Verification experience of dynamic memory controllers (ddr{2, 3, 4, 5}, lpddr{2, 3,4,5, 6})
Strong debugging and problem solving skills.
Scripting knowledge (Python/Perl/shell).
Good interpersonal skills and ability & desire to work as a part of a team.
You will also be eligible for equity and .

Share
What you will be doing:
Taking part in the development of the NVIDIA's AI platform for training, fine-tuning and serving latest and greatest AI models with the best performance and efficiency.
Designing and building solutions for scheduling large scale AI training and inference workloads on GPU clusters over many cloud infrastructure.
Exploring and finding solution for open problems like industry-scale resource management, GPU scheduling, performance prediction, and live workload migration.
Work with and contribute to adjacent teams like TensorRT/Dynamo inference engine, ML compiler, KAI/Grove scheduler, Lepton cloud etc.
What we need to see:
Bachelor's degree or equivalent experience in Computer Science, Computer Engineering, relevant technical field.
5+ years of experience.
Experience building large scale systems from scratch. Prior experience in container-based deployment systems like Kubernetes is beneficial.
Strong coding skills in programming languages like Python, Go, Rust and/or C/C++.
Solid foundation in other computer science and computer engineering topics: algorithms and data structures, operating systems, computer architecture, etc. Strong understanding of AI and related technologies is a huge plus.
Most importantly, ability to quickly grasp new concepts and thrive in evolving situations.
Ways to stand out from the crowd:
Graduate-level education or relevant practical background, particularly in research, is beneficial.
Practical experience in building and optimizing AI applications is highly desired.
Proficiency in container software such as containerd, CRI-O, Linux namespace, CRIU, and NVIDIA GPU technology such as CUDA graphs, Driver/runtime is greatly advantageous.
You will also be eligible for equity and .

Share
What you’ll be doing:
Write safe, scalable, modular, and high-quality (C++/Python) code for our core backend software for LLM inference.
Perform benchmarking, profiling, and system-level programming for GPU applications.
Provide code reviews, design docs, and tutorials to facilitate collaboration among the team.
Conduct unit tests and performance tests for different stages of the inference pipeline.
What we need to see:
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent experience.
Strong coding skills in Python and C/C++.
2+ years of industry experience in software engineering or equivalent research experience.
Knowledgeable and passionate about machine learning and performance engineering.
Proven project experiences in building software where performance is one of its core offerings.
Ways to stand out from the crowd:
Solid fundamentals in machine learning, deep learning, operating systems, computer architecture and parallel programming.
Research experience in systems or machine learning.
Project experience in modern DL software such as PyTorch, CUDA, vLLM, SGLang, and TensorRT-LLM.
Experience with performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.
We strongly encourage you to include sample projects (e.g. Github) that demonstrate the qualifications above.
You will also be eligible for equity and .

Share
What you’ll be doing:
Write safe, scalable, modular, and high-quality (C++/Python) code for our core backend software for LLM inference.
Perform benchmarking, profiling, and system-level programming for GPU applications.
Provide code reviews, design docs, and tutorials to facilitate collaboration among the team.
Conduct unit tests and performance tests for different stages of the inference pipeline.
What we need to see:
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent experience.
Strong coding skills in Python and C/C++.
Knowledgeable and passionate about machine learning and performance engineering.
Proven project experiences in building software where performance is one of its core offerings.
Ways to stand out from the crowd:
Solid fundamentals in machine learning, deep learning, operating systems, computer architecture and parallel programming.
Research experience in systems or machine learning.
Project experience in modern DL software such as PyTorch, CUDA, vLLM, SGLang, and TensorRT-LLM.
Experience with performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.
We strongly encourage you to include sample projects (e.g. Github) that demonstrate the qualifications above.
You will also be eligible for equity and .
These jobs might be a good fit