Finding the best job has never been easier
Share
Collaborating with your peers across various engineering groups, you will successfully launch new boards for NVIDIA GPU Accelerated Server Platforms (HGX/DGX) to production. These purpose-built systems are optimized for the growing Deep Learning, Artificial Intelligence, and Analytics environments. With world-class technology enablingnever-been-seen-beforeperformance levels, NVIDIA’s HGX/DGX portfolio is arguably the most complicated Server platform ever developed by humans. This product family represents the company’s fastest growing line of business as well as its largest total available market opportunity. You will bring to bear your knowledge of Server architectures, CPU baseboards and GPU technology in order to productize new GPU boards for Server architectures with GPU-accelerated clusters. Your responsibilities will include planning and establishing processes, defining test requirements and optimizing the production line to deliver new NVIDIA GPU boards. You will also be instrumental in helping the team to achieve the desired cost and quality metrics considered best-in-class.
What you will be doing:
Leverage your in-depth experience with high-speed signals to plan and develop new diagnostic tests and debug procedures for next gen products.
Use your knowledge of system power-up and handshakes during boot to debug complex interactions between HW, FW and SW on faulty boards.
Recommend, drive and ensure compliance to DFx requirements for robust signal integrity performance as related to layout, mechanical components, assembly procedures, etc.
Own a product or series of products end-to-end through the entire product lifecycle; your role would be to ensure successful production ramps are achieved working through a large matrixed team.
Develop and deliver test specs for system level manufacturing screens for all new products to meet the required HW coverage, quality and product requirements for various business units.
Collaborate with CM to define product assembly line, number of test stations and number of assembly fixtures, optimized for cost and throughput.
Craft creative solutions and WARs through volume data analysis and lab experimentation to solve challenging yield and test problems seen on the production floor.
Lead optimization and continuous improvement efforts on the production screen spec definition processes to minimize waste and meet test time, yield, DPPM requirements.
Support customer facing and quality teams during customer escalations to understand the issue and fix gaps identified in coverage.
What we need to see:
BS/MS in EE/CE/CS or equivalent experience
8+ years of experiences in HW design ordiagnostics/validationor manufacturing test of PCIe IPs, Chips or Systems
Proficient in HW interfaces, including PCIe (Gen4+), InfiniBand, I3C/I2C, SPI, USB, etc.
In depth understanding of HPC server architecture and Out-of-Band management
Strong problem-solving and trouble-shooting expertise; and institutionalizing root-cause analysis
Experience in defining test and validation specifications for complex HW systems
Motivated to continually improve/optimize processes
Self-initiative, strong interpersonal skills, and flexibility to adapt to new technologies
Ways to stand out from the crowd:
Prior experience in HW board/system electrical design, HW device drivers or HW diagnostics software development
On-hand experience in debugging and triaging HW faults using testing equipment and Linux commands/tools
Proficient in Python or Shell scripting for HW testing automation and log parsing
Familiar with FPGA implementation, FW secure-boot and encrypted images
You will also be eligible for equity and .
These jobs might be a good fit