

observability systems fordata centersenabling EDA workflowsEDA workloads.You will develop, deploy, andability solutions for multipleCPU and
Be Doing:
Collaborate with HW, and SW engineering teams to deliver observability solutions that meet their needs in EDA clusters.
Develop, test, and deploy data collectors, pipelines, visualization and retrieval services.
Define data collection and retention policies to balance network bandwidth, system load, and storage capacity costs with data analysis requirements.
Work in a diverse team to provide operational and strategic data to empower our engineers and researchers to improve performance, productivity, and efficiency.
Continuously improve quality, workloads, and processes through better observability.
What We Need to See:
Experience developing large scale, distributed observability systems.
Ability to collaborate with data scientists, researchers, and engineering teams to identify high value data for collection and analysis.
Experience with turning raw data into actionable reports
Experience with observability platforms such as Apache Spark, Elastic/Open Search, Grafana, Prometheus, and other similar open-source tools
Python programming experience and use of API calls
Passion for improving the productivity of others
Excellent planning and interpersonal skills
Flexibility/adaptabilityworking in a dynamic environment with changing requirements
MS (preferred) or BS in Computer Science, Electrical Engineering, or related field or equivalent experience.
8+ years of proven experience.
Ways To Stand Out from The Crowd:
Background in computer science, EDA software, open-source software, infrastructure technologies, and GPU technology.
Prior experience in infrastructure software, production application software development, software development, release and support methodology and DevOps
Experience in the management of datacenters and large-scale distributed computing
Experience working with EDA developers
Consistent track record of driving process improvements and measuring efficiency and a passion for sharing knowledge and experience driving complex projects end-to-end.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Lead, mentor, and scale a high-performing engineering team focused on deep learning inference and GPU-accelerated software.
Drive the strategy, roadmap, and execution of NVIDIA’s inference frameworks engineering, focusing on SGLang.
Partner with internal compiler, libraries, and research teams to deliver end-to-end optimized inference pipelines across NVIDIA accelerators.
Oversee performance tuning, profiling, and optimization of large-scale models for LLM, multimodal, and generative AI applications.
Guide engineers in adopting best practices for CUDA, Triton, CUTLASS, and multi-GPU communications (NIXL, NCCL, NVSHMEM).
Represent the team in roadmap and planning discussions, ensuring alignment with NVIDIA’s broader AI and software strategies.
Foster a culture of technical excellence, open collaboration, and continuous innovation.
What we need to see:
MS, PhD, or equivalent experience in Computer Science, Electrical/Computer Engineering, or a related field.
6+ years of software development experience, including 3+ years in technical leadership or engineering management.
Strong background in C/C++ software design and development; proficiency in Python is a plus.
Hands-on experience with GPU programming (CUDA, Triton, CUTLASS) and performance optimization.
Proven record of deploying or optimizing deep learning models in production environments.
Experience leading teams using Agile or collaborative software development practices.
Ways to Stand out from The Crowd
Significant open-source contributions to deep learning or inference frameworks such as PyTorch, vLLM, SGLang, Triton, or TensorRT-LLM.
Deep understanding of multi-GPU communications (NIXL, NCCL, NVSHMEM) and distributed inference architectures.
Expertise in performance modeling, profiling, and system-level optimization across CPU and GPU platforms.
Proven ability to mentor engineers, guide architectural decisions, and deliver complex projects with measurable impact.
Publications, patents, or talks on LLM serving, model optimization, or GPU performance engineering.
You will also be eligible for equity and .

As a Research Scientist specializing in Generative AI for Physical AI, you'll be at the forefront of developing next-generation algorithms that bridge the gap between virtual and physical realms. You'll work with state-of-the-art technology and have access to massive computational resources to bring your ideas to life.
What you'll be doing:
Pioneer revolutionary generative AI algorithms for physical AI applications, with a focus on advanced video generative models and video-language models
Architect and implement sophisticated data processing pipelines that produce premium-quality training data for Generative AI and Physical AI systems
Design and develop cutting-edge physics simulation algorithms that enhance Physical AI training
Scale and optimize large-scale training systems to efficiently harness the power of 20,000+ GPUs for training foundation models
Author influential research papers to share your groundbreaking discoveries with the global AI community
Drive innovation through close collaboration with research teams, diverse internal product groups, and external researchers
Build lasting impact by facilitating technology transfer and contributing to open-source initiatives
What we need to see:
PhD in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
Deep expertise in PyTorch and related libraries for Generative AI and Physical AI development
Strong foundation indiffusion, vision language and reasoning models and their applications
Proven experience with reinforcement learning algorithms and implementations
Robust knowledge of physics simulation and its integration with AI systems
Demonstrated proficiency in 3D generative models and their applications
Ways to stand out from the crowd:
Publications or contributions to major AI conferences (ICLR, NeurIPS, ICML, CVPR, ECCV, SIGGRAPH, ICCV, etc.)
Experience with large-scale distributed training systems
Background in robotics or physical systems
Open-source contributions to prominent AI projects
History of successful research-to-product transitions
You will also be eligible for equity and .

evaluate and improve state-of-the-art performance techniques in production Large Language Model deployments, and he
What you’ll be doing:
Develop innovative GPU and system architectures to extend the state of the art in AI Inference performance and efficiency
Model, analyze and prototype key deep learning algorithms and applications
Understand and analyze the interplay of hardware and software architectures on future algorithms and applications
Write efficient software for AI Inference, including CUDA kernels, framework level code, and application level code
Collaborate across the company to guide the direction of AI, working with software, research and product teams
What we need to see:
A MS or PhD in a relevant discipline (CS, EE, Math) or equivalent experience, with 5+ years or relevant experience
Strong mathematical foundation in machine learning and deep learning
Expert programming skills in C, C++, and Python
Familiarity with GPU computing (CUDA or similar) and HPC (MPI, OpenMP)
Strong knowledge and coursework in computer architecture
Ways to stand out from the crowd:
Background with systems-level performance modeling, profiling, and analysis
Experience in characterizing and modeling system-level performance, executing comparison studies, and documenting and publishing results
Experience in optimizing AI Inference workloads with CUDA kernel development
You will also be eligible for equity and .

What you'll be doing:
Design, build and optimize agentic AI systems for the CUDA ecosystem.
Co-design agentic system solutions with software, hardware and algorithm teams; influence and adopt new capabilities as they become available.
Develop reproducible, high-fidelity evaluation frameworks covering performance, quality and developer productivity.
Collaborate across the AI stack—from hardware throughcompilers/toolchains,kernels/libraries, frameworks, distributed training, andinference/serving—andwith model/agent teams.
What we need to see:
Bachelor’s degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); MS or PhD preferred.
5+ years of industry or academia experience with AI systems development; exposure to building foundational models, agents or orchestration frameworks; hands-on experience with deep learning frameworks and modern inference stacks.
Strong C/C++ and Python programming skills; solid software engineering fundamentals.
Experience with GPU programming and performance optimization (CUDA or equivalent).
Ways To Stand Out From The Crowd:
Track record building/evaluating deep learning models, coding agents and developer tooling.
Demonstrated ability to optimize and deploy high-performance models, including on resource-constrained platforms.
Deep expertise in GPU performance optimizations, evidenced by benchmark wins or published results.
Publications or open-source leadership in deep learning, multi-agent systems, reinforcement learning, or AI systems; contributions to widely used repos or standards.
You will also be eligible for equity and .

The Deep Learning Software Team is seeking a Senior Technical Program Manager to lead software initiatives and develop Gen AI models enabling NVIDIA’s most advanced AI researchers and engineers to create the future of computing. This leader will guide engineering programs using the best industry processes well suited to our fast pace and rapidly expanding roadmap.
What You'll Be Doing:
Engage with cross-company partners to plan programs and coordinate teams to meet key business objectives
Guide engineering programs in all aspects of program management – planning, forecasting, documenting, scheduling, effective meetings, multi-faceted prioritization, management of dependencies, reporting, and effective handling of critical and blocking issues
Guide engineering teams in the use of agile methodologies
Develop and implement metrics for measuring program effectiveness and improvement areas, collect and analyze data in support of planning and data driven decisions
Report on overall program status, providing insights and recommendations to senior management
Drive organizational alignment and efficiency by coordinating with multi-functional leads and streamliningprocesses
Work with multi-functional matrixed teams
Guide teams designing for advanced, complex, competing and often conflicting customerrequirements
Moderate technical discussions to successful conclusions
Act as liaison between developers and customers, between technical and non-technical audiences
Cultivate a culture of continuous improvement, finding opportunities for processenhancements
What We Need To See:
Bachelor’s degree (or equivalent experience) in a related field with demonstrated program management expertise and mastery of technical and management practices
10+ years program management experience including proven ability managing global projects, adaptable to multiple time zones
Demonstrated skill in engaging and moderating successful engagements with engineering partners and vendors
Exceptional communication and presentation skills for diverse technical and non-technical audiences with strong problem-solving and conflict management skills
In-depth understanding of software engineering principles and quality requirements in enterprise systems
Strong multitasking abilities with a focus on thoroughness and rapid context switching
Knowledge of agile methodologies and tools, project planning, and task tracking tools
Experience in AI training environments and resource capacity planning
Proactive in identifying and implementing efficient changes in software engineering and release management
Excellent organizational skills and ability to use project management tools (e.g. Jira, Aha!, Confluence) and distributed version control systems (e.g. Git)
Ways To Stand Out From The Crowd:
Background in computer science, machine learning, deep learning, open source software, and GPU technology
Prior experience in production application software development, release and support methodology and DevOps
Prior experience in the management of customer workflows using large scale distributed computing and working with AI researchers or directly training and evaluating AI models
Consistent track record of driving process improvements and measuring efficiency
A passion for sharing knowledge and experience driving complex projects end-to-end
You will also be eligible for equity and .

What you will be doing:
Building an end-to-end agentic AI applications that solve real-world enterprise problems across various industries.
Serve as the primary technical domain expert for pre- and post-sale for partners, embedding deeply with them to design and deploy Generative AI solutions at scale. Maintain strong relationships with leadership and technical teams to drive adoption, and successful utilization of NVIDIA GenAI platforms.
Accelerate partner/customer time to value by providing repeatable reference architecture guidance, building hands-on prototypes, and advising on standard methodologies for scaling solutions to productions.
Establish the scope, success metrics, and evaluation criteria for partner-led customer projects, ensuring alignment to standardized and reproducible GPU-accelerated workflows.
Enable strategic partners to build their own Professional Services, platforms and products by integrating and accelerating using NVIDIA technologies for high-impact customer workloads. You will proactively find opportunities to drive deeper adoption and utilization of NVIDIA's Generative AI products.
Codify knowledge and operationalize technical success practices to help partners scale impact across industries and workloads.
What we need to see:
MS or PhD degree in Computer Science/Engineering, Machine Learning, Data Science, Electrical Engineering or a closely related field (or equivalent experience).
5+ years of meaningful work experience in deploying AI models at scale as a Software Engineer or Deep Learning engineer.
Consistent track record of building enterprise-grade agentic AI systems using open-source models and solid foundation in deep learning, with a particular emphasis on LLM and VLM.
Hands-on experience with LLM and agentic frameworks (NeMo Agent Toolkit, LangChain, Semantic Kernel, Crew.ai, AutoGen) and evaluation and observability platforms. Comfortable building prototypes or proofs of concept
Strong coding development and proficiency in Python, C++ and Deep Learning frameworks (PyTorch, or TensorFlow).
Excellent communication and presentation skills to effectively collaborate with both internal executives, partners and customers.
Ways to stand out from the crowd:
Demonstrate expertise in building applications and systems using NeMo Framework, Nemotron, Dynamo, TensorRTLLM, NIMs, AI Blueprints. And actively contribute to the open-source community.
Take end-to-end ownership of projects, proactively acquiring new skills or knowledge as needed to drive success.
Excel in fast-paced environments, adeptly managing multiple workstreams and prioritizing for the highest customer impact.
Understanding of different advanced agent architectures and emerging communication protocols (MCP, OpenAI Agentic SDK, or Google A2A).
NVIDIA GPUs and system software stacks (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink and others.
You will also be eligible for equity and .

observability systems fordata centersenabling EDA workflowsEDA workloads.You will develop, deploy, andability solutions for multipleCPU and
Be Doing:
Collaborate with HW, and SW engineering teams to deliver observability solutions that meet their needs in EDA clusters.
Develop, test, and deploy data collectors, pipelines, visualization and retrieval services.
Define data collection and retention policies to balance network bandwidth, system load, and storage capacity costs with data analysis requirements.
Work in a diverse team to provide operational and strategic data to empower our engineers and researchers to improve performance, productivity, and efficiency.
Continuously improve quality, workloads, and processes through better observability.
What We Need to See:
Experience developing large scale, distributed observability systems.
Ability to collaborate with data scientists, researchers, and engineering teams to identify high value data for collection and analysis.
Experience with turning raw data into actionable reports
Experience with observability platforms such as Apache Spark, Elastic/Open Search, Grafana, Prometheus, and other similar open-source tools
Python programming experience and use of API calls
Passion for improving the productivity of others
Excellent planning and interpersonal skills
Flexibility/adaptabilityworking in a dynamic environment with changing requirements
MS (preferred) or BS in Computer Science, Electrical Engineering, or related field or equivalent experience.
8+ years of proven experience.
Ways To Stand Out from The Crowd:
Background in computer science, EDA software, open-source software, infrastructure technologies, and GPU technology.
Prior experience in infrastructure software, production application software development, software development, release and support methodology and DevOps
Experience in the management of datacenters and large-scale distributed computing
Experience working with EDA developers
Consistent track record of driving process improvements and measuring efficiency and a passion for sharing knowledge and experience driving complex projects end-to-end.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך