The point where experts and best companies meet
Share
What you'll be doing:
Design and implement changes in NVIDIA SW stack to enhance system level resiliency and reliability at datacenter scale with thousands of GPUs. Focusing on adding features that bolster system level availability, early fault detection and faster recovery.
You will follow the devices all the way through the development process to datacenter systems, customer desktops, notebooks, workstations, and gaming console products that are used throughout the world.
Be heavily involved in architecture definition and early modeling, simulation required to create our groundbreaking products
Multiple opportunities to collaborate and communicate effectively with teams from all around the globe
What we need to see:
BS or MS degree in Computer Engineering, Computer Science, or equivalent experience
Background in solving problems that apply to large complex systems deployed at scale.
Strong C/C++ programming skills as well as having shown initiative in pursuing independent coding projects
Familiarity with computer system architecture, microprocessor, and microcontroller fundamentals (caches, buses, memory controllers, DMA, etc.)
Strong Operating systems fundamentals with Kernel experience on Linux or Windows systems
8+ years of meaningful software development experience
Ways to stand out from the crowd:
Background and strength with sophisticated system-level debugging is invaluable
Experience working on system level reliability and resiliency features.
Familiarity with system level security concepts
Experience with embedded system SW concepts.
You will also be eligible for equity and .
These jobs might be a good fit