Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Nvidia Senior Resiliency Safety Architect 
United States, California 
80412419

01.12.2024

What you'll be doing:

  • Collaborate with architects, unit designers and software engineers to ensure alignment on verification requirements.

  • Develop and implement comprehensive architecture verification testplans for resiliency and functional safety features

  • Execute Architecture Testplan by developing test content, working with Software and Architecture teams to enable, run, and debug tests on Architecture models. Support test debug on RTL, emulation, and silicon.

  • Run simulations to analyze Architectural Vulnerability Factor, Liveness of on-die memory, and Fault Injection

  • Develop diagnostics software components for Resiliency and Safety to run on NVIDIA GPUs.

  • Develop and automate fault models to simulate various fault types (e.g., transient faults, stuck-at faults) in both RTL and gate-level netlists.

  • Collaborate with safety engineering teams to define metrics to ensure adherence to functional safety standards

  • Optimize hardware and software features to improve system robustness, performance, and security.

  • Model and analyze RAS metrics like Failures in Time and Availability; and Safety metrics like Diagnostic Coverage and PMHF

What we need to see:

  • Master’s or PhD degree in Computer Engineering, Electrical Engineering or closely related degree or equivalent experience.

  • At least 5+ years of relevant experience.

  • Familiarity with computer system architecture, microprocessors, and microcontroller fundamentals (caches, coherence, buses, direct memory access, etc.).

  • Strong knowledge and industry expertise in multiple aspects of GPU/SoC architecture definition - Clocks, Resets, Boot Sequence, Power Management, Interrupts, Memory Controller, Virtualization, Security, System Performance, IO technologies, High Speed links likePCIE/CXL/USB/Networking,Camera Interfaces, etc; Multimedia accelerator pipelines.

  • Proficiency inVerilog/SystemVerilogRTL simulations and debug. Ability to setup testbench and integrate various components.

  • Scripting and automation with Python or similar.

  • Proficiency in C/C++.

  • Excellent interpersonal skills and ability to collaborate with on-site and remote teams.

  • Strong debugging and analytical skills.

  • Be self-driven and results oriented.

Ways to stand out from the crowd:

  • Experience with resiliency and functional safety.

  • Exposure to Fault Injection Simulations

  • Familiarity with GPU and SOC Architectures, Machine Learning/Deep Learning concepts

  • Programming with CUDA

You will also be eligible for equity and .