מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
What you’ll be doing:
Responsible for the development and execution of NVIDIA HGX/DGX platform test plan on OS, FW and CUDA SW stack from design doc.
Installing and testing various systems OS, system firmware and software stack including Windows & Linux
Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
Leverage AI (Language Model) skills to build automation front-end and back-end framework which could interaction with human
Review partner and supplier test results and prescribe additional reliability testing on components, systems, and packaging as needed.
Work in an agile software development team with very high production quality standards.
Manage bug lifecycle and collaborate with inter-groups to drive for solutions.
What we need to see:
Bachelor’s Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field with 2+ years proven experience; or Master’s Degree.
2+ years of meaningful work experience
Proven years of automation experience using Python, Shell Script, Ansible, Jenkins
Strong OS (Ubuntu, RedHat, CentOS, SuSE, Fedora, Windows, etc.) trouble-shooting and debugging experience in a bare-metal and KVM/VMWare environment.
Experience in using AI development tools for test plans creation, test cases development and test cases automation
Ability to write test plans focusing on functional, performance, stress and negative testing.
Experience in developing CI/CD automation processes and DevOps contribution with a real passion for automation and Good teamwork with ability to work independently.
Ways to stand out from the crowd:
Experience working with NVIDIA GPU hardware is a strong plus.
Have implemented error handling for x86 based servers, online and offline health monitoring tools.
Experience of developing x86/ARM based environment
Background in parallel programming ideally CUDA/OpenCL is a plus
Strong experience in FW, BMC/OpenBMC, SBIOS, Network protocol, enterprise storage devices, Redfish - huge plus
משרות נוספות שיכולות לעניין אותך