Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Nvidia Senior Datacenter Product Development Engineer 
United States, California 
401389636

24.06.2024

Collaborating with your peers across various engineering groups, you will successfully launch new boards for NVIDIA GPU Accelerated Server Platforms (HGX/DGX) to production. These purpose-built systems are optimized for the growing Deep Learning, Artificial Intelligence, and Analytics environments. With world-class technology enablingnever-been-seen-beforeperformance levels, NVIDIA’s HGX/DGX portfolio is arguably the most complicated Server platform ever developed by humans. This product family represents the company’s fastest growing line of business as well as its largest total available market opportunity. You will bring to bear your knowledge of Server architectures, CPU baseboards and GPU technology in order to productize new GPU boards for Server architectures with GPU-accelerated clusters. Your responsibilities will include planning and establishing processes, defining test requirements and optimizing the production line to deliver new NVIDIA GPU boards. You will also be instrumental in helping the team to achieve the desired cost and quality metrics considered best-in-class.


What you will be doing:

  • Leverage your in-depth experience with high-speed signals to plan and develop new diagnostic tests and debug procedures for next gen products.

  • Use your knowledge of system power-up and handshakes during boot to debug complex interactions between HW, FW and SW on faulty boards.

  • Recommend, drive and ensure compliance to DFx requirements for robust signal integrity performance as related to layout, mechanical components, assembly procedures, etc.

  • Own a product or series of products end-to-end through the entire product lifecycle; your role would be to ensure successful production ramps are achieved working through a large matrixed team.

  • Develop and deliver test specs for system level manufacturing screens for all new products to meet the required HW coverage, quality and product requirements for various business units.

  • Collaborate with CM to define product assembly line, number of test stations and number of assembly fixtures, optimized for cost and throughput.

  • Craft creative solutions and WARs through volume data analysis and lab experimentation to solve challenging yield and test problems seen on the production floor.

  • Lead optimization and continuous improvement efforts on the production screen spec definition processes to minimize waste and meet test time, yield, DPPM requirements.

  • Support customer facing and quality teams during customer escalations to understand the issue and fix gaps identified in coverage.

What we need to see:

  • BS/MS in EE/CE/CS or equivalent experience

  • 8+ years of experiences in HW design ordiagnostics/validationor manufacturing test of PCIe IPs, Chips or Systems

  • Proficient in HW interfaces, including PCIe (Gen4+), InfiniBand, I3C/I2C, SPI, USB, etc.

  • In depth understanding of HPC server architecture and Out-of-Band management

  • Strong problem-solving and trouble-shooting expertise; and institutionalizing root-cause analysis

  • Experience in defining test and validation specifications for complex HW systems

  • Motivated to continually improve/optimize processes

  • Self-initiative, strong interpersonal skills, and flexibility to adapt to new technologies

Ways to stand out from the crowd:

  • Prior experience in HW board/system electrical design, HW device drivers or HW diagnostics software development

  • On-hand experience in debugging and triaging HW faults using testing equipment and Linux commands/tools

  • Proficient in Python or Shell scripting for HW testing automation and log parsing

  • Familiar with FPGA implementation, FW secure-boot and encrypted images

You will also be eligible for equity and .