Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Nvidia Solutions Architect Inference Deployments 
United States, California 
206717865

Yesterday
US, CA, Santa Clara
time type
Full time
posted on
Posted Yesterday
job requisition id

What you'll be doing:

  • Help customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on Kubernetes for large language models (LLMs) and generative AI workloads.

  • Enhance performance tuning usingTensorRT/TensorRT-LLM,

  • Collaborate with multi-functional teams (engineering, product) and offer technical mentorship to customers implementing AI at scale.

  • Architect zero-downtime deployments, autoscaling (e.g., HPA or equivalent experience with custom metrics), and integration with cloud-native tools (e.g., OpenTelemetry, Prometheus, Grafana).

What we need to see:

  • 5+ Years in Solutions Architecture with a proven track record of moving AI inference from POC to production on Kubernetes.

  • Experience architecting GPU allocation using NVIDIA GPU Operator and NVIDIA NIM Operator. Troubleshoot sophisticated GPU orchestration, optimize with Multi-Instance GPU (MIG), and ensure efficient utilization in Kubernetes environments.

  • Proficiency with TensorRT-LLM, Triton, and TensorRT for model optimization and serving.

  • Success stories optimizing LLMs for low-latency inference in enterprise environments.

  • BS or equivalent experience in CS/Engineering.

Ways to stand out from the crowd:

  • Prior experience deploying NVIDIA NIM microservices for multi-model inference.

  • Serverless Inference, knowledge of FaaS patterns (e.g., Google Cloud Run, AWS Lambda, NVCF) with NVIDIA GPUs.

  • NVIDIA Certified AI Engineer or similar.

  • Active contributions to Kubernetes SIGs or AI inference projects (e.g., KServe, Dynamo, SGLang or similar).

  • Familiarity with networking concepts which support multi-node inference such as MPI, LWS or similar.

You will also be eligible for equity and .