

Share
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What will you be doing:
You will bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions that address these needs.
Develop automated workflows for issue detection and root cause analysis and closely collaborate with operators to debug sophisticated, full-stack AI cluster problems. We will bring to bear the findings for product improvements!
Deliver compelling technical presentations and lead hands-on demos or training. You'll also handle evaluation deployments (POC/POV) and ensure smooth, reliable installations by staying engaged and encouraging throughout the customer journey.
What we need to see:
Bachelor of Science or equivalent experience
8+ years of networking experience in enterprise or service provider environments, with strong hands-on expertise in routing and switching.
Proficient in scripting and automation using Python or similar languages, with strong Linux expertise.
Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles.
Exceptional oral, written, and presentation skills for clearly communicating complex technical topics.
Demonstrated ability to collaborate effectively across teams, partnering with operations, engineering, and product development
Ways to stand out from the crowd:
Experience with data center infrastructure and cloud architectures
Background in network performance monitoring or observability
Previous experience working at a technological start-up
These jobs might be a good fit

Share
What you’ll be doing:
Work on NVIDIA current and next generation of Networking devices and GPU.
Build and create Firmware Phy solutions for our new SerDes and physical linkup flow.
Work closely with our costumer support team to build solutions for our costumers
Work with the architecture, HW, and SW design teams
Debug and screen HW/FW/SW issues
Lead data-driven discussions about the product functionality and areas for improvement
Define implement and maintain FW algorithm to control the Silicon
Take an active part in silicon bring-up and SW development phases
What we need to see:
B.Sc. or M.Sc. in Electrical or Computer Engineering
3+ years of relevant experience
Proficient programming in C
Experience working in Git.
Debugging experience and ability to investigate and triage difficult problems in embedded FW
Good communication skills and the ability to work with people across several countries
Ability to work with interrupts and dynamic environment with good spirit.
Excellent English verbal and written communication skills
Ways to stand out from the crowd:
Proficient in Python
Good understanding of SerDes operation
Experience with developing the physical layer of communication protocols
Knowledgeable of Hardware/Software Development Process
Strong collaborative and interpersonal skills, with an ability to successfully guide and influence

Share
What you’ll be doing:
Crafting and developing enterprise-grade systems with a strong focus on scalability, reliability, and performance.
Building and optimizing microservices-based architectures using Kubernetes and cloud-native technologies.
Collaborating closely with backend engineers, product managers, and other partners to deliver impactful solutions.
Writing clean, maintainable, and testable code in Go, contributing to our CI/CD pipelines.
Conducting code and build reviews to uphold high-quality standards and mentor team members.
Leading the development and implementation of advanced identity management systems that secure NVIDIA’s innovative AI and GPU cloud.
Developing scalable multi-tenant solutions that allow our diverse clientele to harness the power of NVIDIA’s platforms securely and efficiently.
Collaborating with multi-functional teams to integrate identity and access management features seamlessly into our products, from cloud services to edge computing devices.
What we need to see:
B.Sc. in Computer Science or a related field (or equivalent experience).
5+ years of experience
Experience in backend software development, including system design and architecture.
Proficiency in at least one backend programming language (Go preferred).
Strong knowledge in microservices architecture, RESTful APIs, and relational databases.
Proficient knowledge of security guidelines and experience applying them in large-scale systems.
Expertise in implementing OAuth, OIDC, SAML, and other modern authentication protocols - Advantage
Ways to stand out from the crowd:
Expertise in Kubernetes internals and advanced cloud-native technologies.
Experience working in Linux environments with knowledge of networking, security, and virtualization.
Contributions to open-source projects or active participation in tech communities.
Agile approach and familiarity with standard methodologies.

Share
What you'll be doing:
The person will be part of the NVIDIA AIR team that is building the SaaS/IaaS platform for digital twin of AI data centers.
The responsibility specifically is for DevOps, infrastructure and Site Reliability Engineering (SRE) requirements for AIR.
Focus on efficiency by automating repetitive workflows.
Working on microservices based architecture.
Deploying and troubleshooting non-disruptive cloud operations with an emphasis on secure production infrastructure.
Continuous evaluation of existing system and driving improvements.
Managing deployment/upgrade for Operating Systems, Kubernetes(k8s) clusters and/or or other orchestration tools.
Day to day support for engineering activities with CI/CD tools like git, Jenkins.
Efficiently multi-tasking on the different tracks to efficiently address evolving priorities .
What we need to see:
BSc in Engineering/ Relevant Certifications/ equivalent experience.
5+ years of experience in complex microservices basedarchitectures
Highly skilled in Kubernetes and Docker
Experience in IaaS environment - deploying, configuring, and administering Linux-based bare metal servers
Strong networking background (VLANs, routing, VPNs)
Experience with relational databases(MySQL) and SQL.
Experienced with modern deployment architecture for non-disruptive cloud operations including blue green and canary rollouts
Infrastructure as code (IaC) skills in frameworks like Ansible & Terraform
Expert in AWS
Knows best practices and discipline of managing and monitoring a highly available and secure production infrastructure
Ways to stand out from the crowd:
Strong expertise in Infrastructure as a Service (IaaS)
Skills in Linux/Unix Administration
Experience with Prometheus/Grafana.
Experience with APM tools like Dynatrace, Datadog, AppDynamics, New Relic, etc.
Implemented robust metrics collection and alerting

Share
What you will be doing:
STA analysis of blocks/top-level according to specifications under challenging constraints targeting for the best power, area, and performance.
Be exposed and work on a variety of challenging designs (including high cell count and HS blocks). Resolving complex timing and congestion problems.
Daily work involves all aspects of static timing analysis - constraints, environment, models generation and timing ECO generation for block level and full chip level.
Taking part inflows development.
What we need to see:
B.SC. in Electrical Engineering/Computer Engineering.
2-5 years of experience as STA engineer.
Ability to quickly adapt to new technology and go deep into new areas
Strong communication skills
Great teammate.
Drive new solutions based on any issues that arise
Ways to Stand Out From the Crowd:
Knowledge in physical design flows and methodologies (PNR, STA, physical verification).
Familiarity with physical design EDA tools (such as Synopsys, Cadence, etc.).

Share
What you'll be doing:
Work in a combined design and verification team which develops some of the switch silicon core units.
Build reference models, verify and simulate chip blocks/entities according to specifications.
Work closely with multiple teams within organizations such as Architecture, Micro- Architecture, and FW.
What we need to see:
4+ years of experience in RTL design or RTL verification.
Previous experience in networking - an advantage.
B.Sc. in Electrical Engineering or Computer Engineering.
A team player with good communication and interpersonal skills.

Share
What you’ll be doing:
Enhance NVIDIA's GPU Networking offerings for accelerating AI workloads, such as NVIDIA Dynamo or NVIDIA NIXL.
Develop and evaluate new technologies, innovations relevant for scientific, Deep Learning, and data-intensive workloads.
Create proof-of-concept to evaluate and drive such new technologies.
Work on impactful projects involving state-of-the-art high-performance computing software and hardware.
Designing and implementing services, runtime systems, and applications over SDK
Partner and collaborate with other forward-thinking team members and external researchers
What we need to see:
Hold a B.Sc. or M.Sc. or Ph.D. in Computer Science, Electrical or Computer Engineering from a leading university.
0-2 years of industry experience (or equivalent) in system programming or related fields.
Background in algorithm design, system programming, and computer architecture.
Strong programming and software development skills.
A teammate with a can-do attitude, high energy and excellent interpersonal skills.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Proven research track record.
Experience and passion for system architecture,CPU/GPU/Memory/Storage/Networking.
Stellar communication skills.
Knowledge in Deep Learning frameworks and AI communication libraries (NCCL, UCX, MPI and equivalents).

Share
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What will you be doing:
You will bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions that address these needs.
Develop automated workflows for issue detection and root cause analysis and closely collaborate with operators to debug sophisticated, full-stack AI cluster problems. We will bring to bear the findings for product improvements!
Deliver compelling technical presentations and lead hands-on demos or training. You'll also handle evaluation deployments (POC/POV) and ensure smooth, reliable installations by staying engaged and encouraging throughout the customer journey.
What we need to see:
Bachelor of Science or equivalent experience
8+ years of networking experience in enterprise or service provider environments, with strong hands-on expertise in routing and switching.
Proficient in scripting and automation using Python or similar languages, with strong Linux expertise.
Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles.
Exceptional oral, written, and presentation skills for clearly communicating complex technical topics.
Demonstrated ability to collaborate effectively across teams, partnering with operations, engineering, and product development
Ways to stand out from the crowd:
Experience with data center infrastructure and cloud architectures
Background in network performance monitoring or observability
Previous experience working at a technological start-up
These jobs might be a good fit