Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Nvidia Software Architect NIM Factory 
United States, Texas 
395394408

Yesterday
US, CA, Santa Clara
US, SC, Remote
time type
Full time
posted on
Posted 2 Days Ago
job requisition id

What you'll be doing:

  • Define the end-to-end technical architecture for the NIM Factory, from container build systems and CI/CD to Kubernetes deployment patterns and runtime optimization.

  • Drive technical strategy and roadmap, making high-impact decisions on frameworks, technologies, and standards that empower dozens of engineering teams.

  • Architect and influence the design of workflow orchestration systems that underpin the NIM factory.

  • Guide and support senior engineers throughout the organization in building a culture centered on technical excellence and innovation.

  • Advocate for guidelines in software development, encompassing API composition, automation, observability, and secure supply chain management.

  • Collaborate with leadership across research, backend, SRE, and product to align technical vision with product goals and influence technical roadmaps.

What we need to see:

  • 15+ years of experience building large-scale, production distributed systems.

  • Consistent track record in a technical leadership or architect role, setting technical direction, and implementing.

  • Deep architectural expertise in cloud-native technologies, including Kubernetes, containers, and microservices.

  • Exceptional ability to mentor, and grow senior engineers with a passion for raising the technical bar of the entire organization.

  • Proficiency in languages like Python for building tooling and services.

  • Experience architecting solutions for GPU-accelerated or other high-performance computing workloads.

  • Excellent communication and collaboration skills, with the ability to articulate complex technical concepts to diverse audiences and drive consensus.

  • A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.

Ways to stand out from the crowd:

  • Hands-on with LLM inference stacks (Triton Inference Server, TensorRT-LLM, vLLM).

  • Experience optimizing large-model serving (KV cache sharding/paging, tensor/sequence parallelism, speculative decoding, dynamic batching).

  • Experience architecting next-generation container build systems or CI/CD platforms at scale.

  • Background with workflow orchestration engines (e.g., Temporal, Airflow) for complex, distributed processes.

  • Expertise in designing multi-tenant, multi-cluster, or edge/air-gapped deployment architectures.

You will also be eligible for equity and .