Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Nvidia Senior Software Technical Program Manager - GPU Communication Libraries 
United States, California 
292609221

Today
US, CA, Santa Clara
time type
Full time
posted on
Posted 19 Days Ago
job requisition id

What you will be doing:

This GPU Communication Libraries role will strongly collaborate across SW Development Managers, Engineers, Product Marketing, Customer Program Management, Quality Assurance, and other logistics personnel to establish and implement streamlined processes for the development of advanced Compute Software solutions for cloud service providers and OEM customers. In this role, you will collect requirements, help define priorities, remove blockers, drive planning and scheduling for all phases of the software development lifecycle. Additionally, you'll be responsible for the continuous improvement and maintenance of all processes related to enterprise support and establish process for next-gen architecture and feature engagements to avoid missed opportunities of influencing changes in HW architecture. You will have the opportunity to partner with diverse technical groups, spanning all organizational levels.

  • Responsible for leading status meetings, proactively addressing challenges, customer concerns, and serving as primary POC for building and upholding prioritized release schedules and plans.

  • Strategically plan and partner across Nvidia teams to drive software objectives while maintaining schedules and formulating risk management strategies for risks identified across multiple parallel work streams.

  • Lead existing product development enhancements and software release processes, while collaborating with engineering management to optimize the development workflow and efficiency.

  • Translate customer requirements into actionable landmarks and tasks internally, ensuring customers are continually informed on issue statuses.

  • Drive Virtual reviews and establish continuous feedback loops by communicating benchmarking results and customer insights to product and engineering leadership.

  • Track and report large-scale performance benchmarking across all clusters. Build performance dashboards and reporting processes to monitor KPIs and surface performance trends

  • Collaborate across internal teams and third-party partners across time zones, as necessary, to resolve customer issues and oversee customer releases.

  • Partner with Customer Program Managers addressing software issues, including technical feedback from OEMs, CSPs, and partners.

What we need to see:

  • 12+ overall years of experience in the software industry with specialization in HPC networking or system software.

  • 6+ years program management experience in a similar or related role.

  • BS, MS, or Ph.D. in CS, CE, EE (related technical field) or equivalent experience.

  • Hands on experience with software development for hardware platforms or communication runtime or high performance networking with demonstrated success in delivering these complex products to customers.

  • Proficiency in Agile software development methodologies.

  • Proven experience to creatively resolve technical and resource issues, and think strategically and tactically building consensus to ensure program success

  • Comprehensive understanding of software engineering principles, including experience with widely-adopted configuration management tools andproductivity-enhancingtools and automation processes.

  • Exceptional attention to detail and a demonstrated capacity for multitasking, in a dynamic environment with shifting priorities and changing requirements.

  • Strong communication and technical presentation skills and ability to work independently and actively with minimal guidance.

  • Previous experience coordinating activities between HW and SW organizations

Ways to stand out from the crowd:

  • Solid understanding of the Deep Learning Framework ecosystem for Training and Inference

  • Solid understanding of operating systems, datacenter servers, graphics principles and standards.

  • Background with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).

  • Knowledge of a modern programming language is desired as well as depth in HPC and ML/DL fundamentals

  • Background with RDMA, high-performance networking technologies (InfiniBand, RoCE, Ethernet, EFA), network architecture and network topologies.

You will also be eligible for equity and .