

Share
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What will you be doing:
You will bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions that address these needs.
Develop automated workflows for issue detection and root cause analysis and closely collaborate with operators to debug sophisticated, full-stack AI cluster problems. We will bring to bear the findings for product improvements!
Deliver compelling technical presentations and lead hands-on demos or training. You'll also handle evaluation deployments (POC/POV) and ensure smooth, reliable installations by staying engaged and encouraging throughout the customer journey.
What we need to see:
Bachelor of Science or equivalent experience
8+ years of networking experience in enterprise or service provider environments, with strong hands-on expertise in routing and switching.
Proficient in scripting and automation using Python or similar languages, with strong Linux expertise.
Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles.
Exceptional oral, written, and presentation skills for clearly communicating complex technical topics.
Demonstrated ability to collaborate effectively across teams, partnering with operations, engineering, and product development
Ways to stand out from the crowd:
Experience with data center infrastructure and cloud architectures
Background in network performance monitoring or observability
Previous experience working at a technological start-up
These jobs might be a good fit

Share
computing for more than 25 years.a unique legacy of innovationfueled by great technology—and amazing people. Today,
You will define how AI models are deployed and scaled in production using the NVIDIA Spectrum-X Networking Platform, influencing decisions from inter-node communication and
Be Doing:
Lead research and development of end-to-end networking solutions for distributed AI training and inference at scale, with a focus on job completion time, failure resiliency, telemetry, scheduling, andplacement.
Analyze current deployments, develop prototypes, and recommend architectural improvements.
Stay abreast of the latest research; become the team’s authority in emerging networking techniques and technologies.
Design, simulate, and validate new systems using novel, scalable network simulator NSX.
Develop and test prototypes on large-scale GPU clusters (e.g., Israel-1).
Collaborate across hardware, firmware, and software teams to translate ideas into real networking product features.
Publish patents and present research at leading conferences.
What We Need to See:
M.Sc. or PhD (preferred) in Computer Science, Electrical/Computer Engineering, or related field—or B.Sc. with research experience andpublications.
5+ years of relevant experience.
Deep expertise in networking and communication internals (NCCL, RDMA, congestion control, routing).
Strong software engineering skills in C++ and/or Python.
Excellent system-level design and problem-solving abilities.
Outstanding communication and collaboration skills across technical domains.
Ways to Stand Out from the Crowd:
Proven passion for solving sophisticated technical problems and delivering impactful solutions.
Record of publications in top-tier conferences.
Experience in designing and building large-scale AI training clusters.
Post-PhD research experience
Practical understanding of deep learning systems, GPU acceleration, and AI model execution flows.

Share
What you will be doing:
Develop a technical strategy to ensure Nvidia platforms adoption for selected industries, focusing on your priority industries and use cases
Establish relationships with technical leaders in organizations and communities, specifically developers, startups and ISVs within the industries.
Evangelize and develop our leadership position by accelerating the availability of GPU-accelerated AI and data Science applications in the specified market by helping developers understand the value of our hardware products and SDKs in addressing critical development opportunities
Lead participation in targeted customer and industry developer events and activities. Chair technical activities spanning product divisions and sales geographies, particularly with solution architects, software developers and engineering resources, developer marketing contribute towards our local ecosystem strategy and development of our value messaging for your customers.
Be the key advisor to how we are differentiated. You will become a NVIDIA technology mentor and focal point for the software developer community.
What we need to see:
Bachelor's Degree in Engineering, Science, Technical or other related discipline or equivalent experience. Master's or Ph.D’s is preferred. Intellectual curiosity and passion for innovation.
Experience in several verticals/industries with good knowledge and trends in the industry.
Expertise in CUDA programming, GPU platforms and Deep Learning and Machine Learning frameworks.
You will show a deep understanding of who and how to engage the developer’s product and engineering organizations with at least 8 years related experience.
5+ years’ experience in an AI and ML software development environment or working with developers in these areas; and at least 3 years’ experience in business development activities.
Able to work independently and possess excellent communication skills to drive customer and internal engagements.
Demonstrate ability to influence, evangelize and persuade at both operational and executive level (including engineering/ product management) to achieve a targeted outcome.
Execute and accelerate strategic decisions
Ability to effectively deliver value propositions for specific and targeted industries.
Ensure a positive experience for external customers and partners while working cross functionally within our organization.
Ways to stand out from the crowd:
Experience working on AI Deep Learning and Machine Learning Applications, AI Model Training/Inferencing and other GPU related technologies and application domain.
Strong technical understanding of Data Analytics, Generative AI, Embedded System/Jetson.
Experience in network communication protocol is an added advantage.
Strong analytical, problem solving, and negotiation skills and the ability to use data analysis to support strategic decisions
Excellent organizational, planning, and execution skills

Share
What you’ll be doing:
Crafting and developing enterprise-grade systems with a strong focus on scalability, reliability, and performance.
Building and optimizing microservices-based architectures using Kubernetes and cloud-native technologies.
Collaborating closely with backend engineers, product managers, and other partners to deliver impactful solutions.
Writing clean, maintainable, and testable code in Go, contributing to our CI/CD pipelines.
Conducting code and build reviews to uphold high-quality standards and mentor team members.
Leading the development and implementation of advanced identity management systems that secure NVIDIA’s innovative AI and GPU cloud.
Developing scalable multi-tenant solutions that allow our diverse clientele to harness the power of NVIDIA’s platforms securely and efficiently.
Collaborating with multi-functional teams to integrate identity and access management features seamlessly into our products, from cloud services to edge computing devices.
What we need to see:
B.Sc. in Computer Science or a related field (or equivalent experience).
5+ years of experience
Experience in backend software development, including system design and architecture.
Proficiency in at least one backend programming language (Go preferred).
Strong knowledge in microservices architecture, RESTful APIs, and relational databases.
Proficient knowledge of security guidelines and experience applying them in large-scale systems.
Expertise in implementing OAuth, OIDC, SAML, and other modern authentication protocols - Advantage
Ways to stand out from the crowd:
Expertise in Kubernetes internals and advanced cloud-native technologies.
Experience working in Linux environments with knowledge of networking, security, and virtualization.
Contributions to open-source projects or active participation in tech communities.
Agile approach and familiarity with standard methodologies.

Share
What you'll be doing:
The person will be part of the NVIDIA AIR team that is building the SaaS/IaaS platform for digital twin of AI data centers.
The responsibility specifically is for DevOps, infrastructure and Site Reliability Engineering (SRE) requirements for AIR.
Focus on efficiency by automating repetitive workflows.
Working on microservices based architecture.
Deploying and troubleshooting non-disruptive cloud operations with an emphasis on secure production infrastructure.
Continuous evaluation of existing system and driving improvements.
Managing deployment/upgrade for Operating Systems, Kubernetes(k8s) clusters and/or or other orchestration tools.
Day to day support for engineering activities with CI/CD tools like git, Jenkins.
Efficiently multi-tasking on the different tracks to efficiently address evolving priorities .
What we need to see:
BSc in Engineering/ Relevant Certifications/ equivalent experience.
5+ years of experience in complex microservices basedarchitectures
Highly skilled in Kubernetes and Docker
Experience in IaaS environment - deploying, configuring, and administering Linux-based bare metal servers
Strong networking background (VLANs, routing, VPNs)
Experience with relational databases(MySQL) and SQL.
Experienced with modern deployment architecture for non-disruptive cloud operations including blue green and canary rollouts
Infrastructure as code (IaC) skills in frameworks like Ansible & Terraform
Expert in AWS
Knows best practices and discipline of managing and monitoring a highly available and secure production infrastructure
Ways to stand out from the crowd:
Strong expertise in Infrastructure as a Service (IaaS)
Skills in Linux/Unix Administration
Experience with Prometheus/Grafana.
Experience with APM tools like Dynatrace, Datadog, AppDynamics, New Relic, etc.
Implemented robust metrics collection and alerting

Share
What you'll be doing:
Work in a combined design and verification team which develops some of the switch silicon core units.
Build reference models, verify and simulate chip blocks/entities according to specifications.
Work closely with multiple teams within organizations such as Architecture, Micro- Architecture, and FW.
What we need to see:
4+ years of experience in RTL design or RTL verification.
Previous experience in networking - an advantage.
B.Sc. in Electrical Engineering or Computer Engineering.
A team player with good communication and interpersonal skills.

Share
What you’ll be doing:
Enhance NVIDIA's GPU Networking offerings for accelerating AI workloads, such as NVIDIA Dynamo or NVIDIA NIXL.
Develop and evaluate new technologies, innovations relevant for scientific, Deep Learning, and data-intensive workloads.
Create proof-of-concept to evaluate and drive such new technologies.
Work on impactful projects involving state-of-the-art high-performance computing software and hardware.
Designing and implementing services, runtime systems, and applications over SDK
Partner and collaborate with other forward-thinking team members and external researchers
What we need to see:
Hold a B.Sc. or M.Sc. or Ph.D. in Computer Science, Electrical or Computer Engineering from a leading university.
0-2 years of industry experience (or equivalent) in system programming or related fields.
Background in algorithm design, system programming, and computer architecture.
Strong programming and software development skills.
A teammate with a can-do attitude, high energy and excellent interpersonal skills.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Proven research track record.
Experience and passion for system architecture,CPU/GPU/Memory/Storage/Networking.
Stellar communication skills.
Knowledge in Deep Learning frameworks and AI communication libraries (NCCL, UCX, MPI and equivalents).

Share
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What will you be doing:
You will bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions that address these needs.
Develop automated workflows for issue detection and root cause analysis and closely collaborate with operators to debug sophisticated, full-stack AI cluster problems. We will bring to bear the findings for product improvements!
Deliver compelling technical presentations and lead hands-on demos or training. You'll also handle evaluation deployments (POC/POV) and ensure smooth, reliable installations by staying engaged and encouraging throughout the customer journey.
What we need to see:
Bachelor of Science or equivalent experience
8+ years of networking experience in enterprise or service provider environments, with strong hands-on expertise in routing and switching.
Proficient in scripting and automation using Python or similar languages, with strong Linux expertise.
Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles.
Exceptional oral, written, and presentation skills for clearly communicating complex technical topics.
Demonstrated ability to collaborate effectively across teams, partnering with operations, engineering, and product development
Ways to stand out from the crowd:
Experience with data center infrastructure and cloud architectures
Background in network performance monitoring or observability
Previous experience working at a technological start-up
These jobs might be a good fit