What You'll Be Doing:
Innovate & Build – Design and implement novel test plans, tools, and automation frameworks to validate GPU functionality, performance, and reliability in complex datacenter environments.
Safeguard Data Integrity – Develop groundbreaking stress tests and methodologies to detect, characterize, and eliminate silent data errors.
Build the Future of Hardware – Partner with architecture and silicon construction teams to influence system and chip-level features that improve diagnostics, debuggability, and root-cause analysis.
Deep Dive Debugging – Analyze test results, investigate complex failures, and drive solutions in close collaboration with design, firmware, and software teams.
Lead & Mentor – Provide technical leadership, guide junior engineers, and shape validation strategy across datacenter product lines.
What We Need to See:
BS/MS in Electrical Engineering, Computer Engineering, Computer Science, or related field (or equivalent experience).
8+ years of experience in hardware validation, test development, or datacenter hardware engineering.
Expert programming skills in Python and/or C/C++ for automation and tool development.
Deep Linux/Unix expertise, including advanced shell scripting.
Strong knowledge of server architecture: CPUs, GPUs, PCIe, networking, and storage.
A hard-working, proactive approach with a proven ability to own and deliver complex projects.
Ways to Stand Out From the Crowd:
Hands-on experience with NVIDIA GPU architecture (Hopper, Ampere) and software stack (CUDA, NCCL).
Experience testing high-speed interconnects such as NVLink or InfiniBand.
Familiarity with AI/ML or HPC benchmarking and stress-testing tools.
Proven track record of identifying and resolving critical bugs in pre-production hardware.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך