Expoint - all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Nvidia Senior Solutions Architect Continuous Bringup Optimization- NVIS 
Japan, Yokohama 
143885386

Today
Japan, Remote
time type
Full time
posted on
Posted 5 Days Ago
job requisition id

What you'll be doing:

  • Lead the hands-on analysis, optimization, and performance tuning of complex GPU-accelerated systems and AI workloads, ensuring high availability and efficiency across customer data centers.

  • Engage with NVIDIA strategic customers to drive AI infrastructure initiatives, support deployment success, and influence long-term platform adoption.

  • Serve as a senior technical authority on NVIDIA GPU, DPU, and networking technologies, contributing to architecture reviews and guiding infrastructure decisions at scale.

  • Collaborate with internal Engineering, Product, and Sales teams to align customer deployments with NVIDIA’s technology roadmap and business objectives.

  • Establish and refine monitoring and optimization methodologies using analytics, telemetry, and automation to detect bottlenecks and improve infrastructure resiliency.

  • Participate in post-deployment reviews, incident retrospectives, and strategic planning sessions to shape the customer experience and feed insights into NVIDIA’s infrastructure strategy.

  • Complete and lead complex technical projects from initial design through implementation and continuous improvement, ensuring alignment to SLAs and mitigation of technical risks.

  • Support business growth by identifying AI infrastructure opportunities in cloud and enterprise environments and driving technical initiatives that showcase NVIDIA’s leadership in this space.

What we need to see:

  • 10+ years of experience in large-scale data center service operations with a focus on infrastructure performance, backed by a Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field.

  • Strong analytical, solving problems, and decision-making skills, capable of identifying root causes, driving continuous improvement, and delivering resilient technical solutions.

  • Strong communication, time management, and organizational skills, with the ability to lead complex projects, guide technical teams, and meet important metrics.

  • Preferred certifications in data center, server, or networking technologies, and a willingness to travel up to 25% for customer engagements and team collaboration.

  • Proficiency in system-level aspects, encompassing Operating Systems, Linux kernel drivers, GPUs, NICs, and hardware architecture.

  • Demonstrated expertise in cloud orchestration software and job schedulers, including platforms like Kubernetes, Docker Swarm, and HPC-specific schedulers such as Slurm.

  • Familiarity with cloud-native technologies and their integration with traditional infrastructure is crucial.

  • Proficiency in both Japanese and English, with the ability to communicate complex technical topics clearly across multicultural teams and with customers.

Ways to stand out from the crowd:

  • Deep familiarity with AI infrastructure and workflows, including training/inference pipelines, MLOps/DevOps tools, containerization (Docker, Kubernetes), and large-scale system deployments.

  • Knowledge of data center infrastructure operations, including safety, security, environmental controls, and standard operating procedures.

  • Proven expertise in scaling complex systems, with deep experience in automation, orchestration, and performance optimization across compute, storage, and networking layers.

  • Good interpersonal and collaboration skills, with the ability to lead discussions, influence outcomes, and build positive relationships with both internal and external collaborators.