Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Nvidia Solutions Architect - AI Factory 
South Korea, Seoul 
330283183

02.07.2025
Korea, Seoul
time type
Full time
posted on
Posted 28 Days Ago
job requisition id

What you will be doing:

  • Maintain an up-to-date understanding of the philosophy, architecture, and deployment methods of various evolving NVIDIA Reference Architectures—e.g., NVIDIA DGX SuperPOD Reference Architecture, NVIDIA Cloud Partner Reference Architecture, and NVIDIA Enterprise Reference Architecture.

  • Analyze and understand the requirements of customer-initiated AI training or inference clusters.

  • Identify the NVIDIA Reference Architecture that best matches customer needs and effectively communicate its value proposition to collaborators.

  • Facilitate seamless communication between NVIDIA's internal deployment teams and customers during the implementation of AI clusters based on Reference Architectures.

  • Provide hands-on technical support to developers after the AI Factory has been deployed, ensuring that AI training and inference workloads run effectively on the infrastructure.

What we need to see:

  • Bachelor’s degree or higher in Computer Science, Computer Engineering, or a related technical field.

  • Solid understanding of basic principles behind cluster orchestration, such as compute resource provisioning and dynamic prioritization based on user demand.

  • Minimum of 3 years of hands-on experience operating AI training or inference clusters that leverage Kubernetes with NVIDIA GPUs.

  • Proficiency in key technologies including: Container Runtime Interface (CRI), Container Network Interface (CNI), Calico, NVIDIA GPU Operator, NVIDIA Network Operator, and Kubeflow Training Operator.

Ways to stand out from the crowd:

  • Foundational knowledge and experience with network technologies—such as InfiniBand and Ethernet—in AI cluster environments, including compute fabric interconnects between GPU servers, storage fabric integration, and in-band networks for system administration.

  • Familiarity with the role of storage in AI training/inference clusters, including hands-on experience with vector databases and leading commercial storage solutions.

  • Experience integrating MLOps platforms into Kubernetes environments, such as deploying Airflow for orchestrating distributed training workloads.