

Share
What you’ll be doing:
Design, build, and run cloud infrastructure services in scope to meet our business goals performing integrations, migrations, bringups, updates, and decommissions as necessary.
Participate in the definition of our internal facing service level objectives and error budgets as part of our overall observability strategy.
Eliminate toil or automate it where the ROI of building and maintaining automation is worth it.
Practice sustainable blameless incident prevention and incident response while being a member of an on-call rotation.
Consult with and provide consultation for peer teams on systems design best practices.
Participate in a supportive culture of values-driven introspection, communication, and self-organization
What we need to see:
Proficiency in one or more of the following programming languages: Python or Go
BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics) or equivalent experience.
5+ years of relevant experience in infrastructure and fleet management engineering.
Experience with infrastructure automation and distributed systems design developing tools for running large scale private or public cloud systems at scales requiring fully automated management and under active customer consumption in production.
A track record demonstrating a mix of initiating your own projects, convincing others to collaborate with you, and collaborating well on projects initiated by others.
In-depth knowledge in one or more of the following: Linux, Slurm, Kubernetes, Local and Distributed Storage, and Systems Networking.
Ways to stand out from the crowd:
Demonstrating a systematic problem-solving approach, coupled with clear communication skills and a willingness to take ownership and get results such as experience driving a build / reuse / buy decision.
Experience working with or developing bare metal as a service (BMaaS) associated systems. For example, vending BMaaS, or Slurm running on containers, or vending Kubernetes clusters. Experience working with or developing multi-cloud infrastructure services. Experience teaching reliability engineering (e.g. SRE) and/or other scale-oriented cloud systems practices to peers and/or other companies (e.g. CRE). Experience in running private or public cloud systems based on one or more of Kubernetes, OpenStack, Docker or Slurm.
Experience with accelerated compute and communications technologies such BlueField Networking, Infiniband topologies, NVMesh, and/or the NVIDIA Collective Communication Library (NCCL).
Experience working with a centralized security organization to prioritize and mitigate security risks. Prior experience in a ML/AI focused role or on a team matching specific keywords is welcome but not required.
You will also be eligible for equity and .
These jobs might be a good fit

Share
What you’ll be doing:
100% kernel coding role
Own end-to-end design and development, challenging existing paradigms and exploring innovative approaches for RDMA and high-speed TCP-based networks.
Collaborate closely with cross-functional teams to define and implement robust networking algorithms, data management strategies, and distributed systems principles.
Contribute to architecture, integration, and alignment with both on-prem and cloud-native platforms.
Optimize system performance and reliability through in-depth analysis and low-level tuning.
Stay up to date with the latest industry trends and contribute to open-source projects.
What we need to see:
B.S. or M.S. degree in Computer Science or Electrical Engineering (or equivalent experience).
12+ years experience in development
Proven professional experience in designing and developing distributed systems; advantage for experience in block storage and networking systems, advantage for cloud environments.
Strong proficiency in C/C++ programming. Experienced with Linux Kernel internals including block subsystem, IO stack, memory management, and scheduling.
Familiarity with storage protocols and standards, especially NVMe.
Knowledge of networking fundamentals and experience in Linux-based networking environments.
Familiarity with RDMA technologies, including Infiniband, RoCE, or iWARP, and experience with RDMA programming models, control and data paths.
Knowledge of cloud computing concepts, including virtualization, scalability, and data management.
Ways To Stand Out From The Crowd:
Excellent communication skills and a collaborative mindset.
Perseverance and determination in debugging complex problems.
You will also be eligible for equity and .
These jobs might be a good fit

Share
What you'll be doing:
Define and Prove: Collect insights and build market analysis that provides the data to enter or change new markets through a clear value proposition for physical AI reasoning capabilities
Market intelligence: Understand robotic, autonomous vehicle, and industrial automation developers, partners, and the physical AI ecosystem
Build and Deliver: Collaborate with research and engineering to promote the customer's needs through setting feature priorities across roadmaps
Sense and Respond: Work closely with customers, build surveys, present at conferences, understand product quality and supervise critical features, improvements, and bugs.
Marketing content creation: Work with marketing to define positioning that enables the creation of technical content, including blog posts, webinars, developer tutorials, and more to communicate the product value proposition
Product launches: Define the go-to-market strategy and provide guidance to the cross-functional implementation of the plan across marketing, public relations, and sales.
What we need to see:
BS or MS in Computer Science, Engineering, or other technical field (or equivalent experience).
12+ years of product management, or similar, experience at a technology company
Passionate about working at the intersection of ground breaking research and practical product development, driving teams to translate breakthrough innovations into shippable software, services, libraries, and SDKs.
Proven track record of collaboration across teams, customers, and partners
Extensive experience working in highly matrixed environments, able to lead through influence and corral and align many teams on a common vision
Outstanding interpersonal and public communication skills with a shown ability to articulate a value proposition to technical and non-technical audiences.
Ways to stand out from the crowd:
Experience working with multimodal Large Language Models, specifically with curating data, training or fine-tuning, and using reinforcement learning
Thorough understanding of physics, robotics and/or autonomous vehicles
Extensive knowledge of 3D graphics, computer vision, spatial computing pipelines, and related tools/ecosystems (e.g., game engines, VFX tools, simulation platforms)
Background in motion capture, virtual production, or real-time rendering workflows
You will also be eligible for equity and .
These jobs might be a good fit

Share
What you’ll be doing:
Lead positioning, messaging and launches for key data science and processing products
Develop technical content such as blogs, videos, and presentations to help data scientists embrace NVIDIA’s SDKs
Support go-to-market for NVIDIA hardware platforms for data processing as part of cross-functional team
Conduct competitive and ecosystem analysis to develop positioning and focus promotion
Engage with the data science community directly and present key technologies at events
Derive insights from customer adoption trends and share with internal teams
What we need to see:
8+ years’ experience with launching, go-to-market and scaling technical product adoption for data scientists or developers.
Bachelors degree in a scientific or technical field (or equivalent experience).
Knowledge of data science packages and machine learning frameworks.
You love and thrive in a cross-functional organization, by collaborating with peers across teams and functions.
Excellent written and spoken communication skills.
Experience presenting at meetings, conferences, and webinars.
Strong grasp of content marketing and social media methodologies.
Comfortable reviewing Python code used in presentations, demos, and articles.
Ways to stand out from the crowd:
Clearly communicate your understanding of NVIDIA’s strategy and technology.
Measurable impact (metrics, awards, key accomplishments) you can share.
Growing portfolio of blogs and social posts with links to your relevant work, including technical writing, social media presence, videos, and research papers.
You will also be eligible for equity and .
These jobs might be a good fit

Share
What You Will Be Doing:
Maintain and develop Kubernetes operators and our Container Storage Interface (CSI) plugin.
Develop a web-based solution that manages, operates and monitors our distributed storage.
Work closely with other teams to define and implement new APIs.
What We Need to See:
B.Sc., M.Sc. or Ph.D. in Computer Science, or related discipline, or equivalent experience.
8+ years of experience in web development (both client and server)
Proven experience with Kubernetes (K8s), including developing or maintaining operators and/or CSI plugins.
Experience scripting withPython, Bashor similar.Experience with nodejs is a must
At least 5 years of experience working in aLinux OSenvironment
You’re smart and a quick learner
You do what it takes to get the job done
Passionate about coding and big challenges
Ways to stand out from the crowd:
NodeJSfor the server side: dominant modules are async & express. Kafka, MongoDB, K8s
JavaScriptframeworks:React, jQuery, c3j
HTML5, CSS3, Bootstrap and Websockets;Scripting (Python and Bash) as well as Git and Linux
You will also be eligible for equity and .
These jobs might be a good fit

Share
What you’ll be doing:
Understand and Supervise Our Performance Measurement Needs: Work with customers, sales and field to understand key established and potential workloads for our Accelerated platforms, and areas and market segments that need to be more competitive. Translate these to
Become an Expert on Performance Comparison: Advise performance teams on how to develop like-for-like workload comparisons. Ensure methodologies are robust, accurate, and relevant, leading to reliable performance benchmarks.
Tell Our Story, from Marketing to R&D: Work with technical marketing and develop internal reporting content that effectively communicates performance outcomes. Ensure content is clear, accurate, and tailored to the audience, whether it be internal partners or potential clients.
Understand Our Customers and Help Them Succeed: Effectively engage with key customers and engineers to conduct performance and workload studies. Work closely with customers to understand their needs and provide insights and solutions based on performance data.
Help the Team Grow and Scale: Supervise current performance measurement and collection processes. Advise on and implement strategies to make these processes more scalable and efficient, ensuring they can adapt to evolving workload demands and technology changes. Keep data fresh and current.
Understand the Competitive Landscape: Stay informed of competing products and SKUs, modeling their KPIs. Keep a pulse on industry trends and advancements in silicon technology, compilers, software, libraries, and performance measurement. Use this knowledge to advise strategies and practices within the team.
Keep Our Projects Running Efficiently: Prioritize resources throughout the company in use for performance measurement, and identify gaps in coverage.
Share Your Knowledge: At the intersection of compilers, tools, silicon, and platform architecture, build detailed technical documentation to address common performance concerns and optimizations.
What we need to see:
Bachelor’s Degree in Computer Science, Computer Engineering, or equivalent experience.
10+ years of Product Management or Product-adjacent supporting experience.
Organized and methodical; able to define key product metrics, structure and track them, and keep business data up-to-date.
Ability to produce clear, concise communication appropriate to various audiences: break down sophisticated technical performance summaries to brief updates for management, customer stories for marketing usage, or deep analysis to inform engineering changes.
Proven understanding of software optimization, profiling, and performance analysis.
Experience in engaging with customers for proof-of-concept and pre-sales activities.
Strong project management skills with an emphasis on scalability and organization.
Excellent communication and teamwork skills, with the ability to translate technical concepts for a non-technical audience.
Strong analytical abilities.
You will also be eligible for equity and .
These jobs might be a good fit

Share
What you'll be doing:
Create products to help researchers and production model builders
Develop product strategy, roadmaps, and go-to-market plans
Collaborate with internal and external customers to build product-based roadmaps for training/post training software
Work with leadership to align with and drive company strategy
What we need to see:
Experience with training/post training and optimization software (ex. PyTorch distributed, torchtitan, VeRL, Nemo Framework, etc.)
Demonstrable knowledge of GenAI or machine learning concepts, particularly around model training, performance optimization, and software development and delivery
Experience with large scale distributed systems
BS or MS degree in Computer Science, Computer Engineering, or similar experience (or equivalent experience)
15+ years of technical product management, or similar, experience at a technology company
Strong communication and interpersonal skills
Ways to Stand Out from the crowd:
Experience leading GenAI/RecSys research to production at scale
Working on Open Source & Github-first developer products with deep customer interactions
Knowledge of GPU architecture, HW/SW co-design, and performance profiling
You will also be eligible for equity and .
These jobs might be a good fit

Share
What you’ll be doing:
Design, build, and run cloud infrastructure services in scope to meet our business goals performing integrations, migrations, bringups, updates, and decommissions as necessary.
Participate in the definition of our internal facing service level objectives and error budgets as part of our overall observability strategy.
Eliminate toil or automate it where the ROI of building and maintaining automation is worth it.
Practice sustainable blameless incident prevention and incident response while being a member of an on-call rotation.
Consult with and provide consultation for peer teams on systems design best practices.
Participate in a supportive culture of values-driven introspection, communication, and self-organization
What we need to see:
Proficiency in one or more of the following programming languages: Python or Go
BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics) or equivalent experience.
5+ years of relevant experience in infrastructure and fleet management engineering.
Experience with infrastructure automation and distributed systems design developing tools for running large scale private or public cloud systems at scales requiring fully automated management and under active customer consumption in production.
A track record demonstrating a mix of initiating your own projects, convincing others to collaborate with you, and collaborating well on projects initiated by others.
In-depth knowledge in one or more of the following: Linux, Slurm, Kubernetes, Local and Distributed Storage, and Systems Networking.
Ways to stand out from the crowd:
Demonstrating a systematic problem-solving approach, coupled with clear communication skills and a willingness to take ownership and get results such as experience driving a build / reuse / buy decision.
Experience working with or developing bare metal as a service (BMaaS) associated systems. For example, vending BMaaS, or Slurm running on containers, or vending Kubernetes clusters. Experience working with or developing multi-cloud infrastructure services. Experience teaching reliability engineering (e.g. SRE) and/or other scale-oriented cloud systems practices to peers and/or other companies (e.g. CRE). Experience in running private or public cloud systems based on one or more of Kubernetes, OpenStack, Docker or Slurm.
Experience with accelerated compute and communications technologies such BlueField Networking, Infiniband topologies, NVMesh, and/or the NVIDIA Collective Communication Library (NCCL).
Experience working with a centralized security organization to prioritize and mitigate security risks. Prior experience in a ML/AI focused role or on a team matching specific keywords is welcome but not required.
You will also be eligible for equity and .
These jobs might be a good fit