Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Nvidia System Software Engineer Content Delivery 
United States, Texas 
698759361

Yesterday
US, CA, Santa Clara
US, CA, Remote
US, DC, Remote
time type
Full time
posted on
Posted 8 Days Ago
job requisition id


What you’ll be doing:

  • Evolve the learning platform by enabling new features and workflows, translating pedagogical and user experience visions into technical implementations and systemic improvements, delivering a fantastic learning experience.

  • Collaborate with course designers, defining capabilities and practices and assessing technology for diverse hands-on courses and labs, building new development environments, and optimizing cost-effectiveness and reliability.

  • Extend and develop the course content publishing pipeline, improving content-focused automated content validation, guiding testing efforts, and tailoring content distribution systems to ensure the integrity, quality, and performance of learning materials.


What we need to see:

  • Bachelor’s degree in Computer Science, a related technical field, or equivalent experience.

  • Over 8 years of DevOps experience optimizing, deploying, and running containerized applications (Docker, Kubernetes) across AWS, Azure, and GCP, including hands-on work with EKS, AKS, and GKE.

  • Practical experience in building, automated testing, and deploying GPU accelerated software to diverse environments, including SCM automation.

  • Proficient in Python and Linux shell scripting for automation, application development, system administration, and problem resolution/triage.

  • Validated experience architecting, implementing, and managing cloud infrastructure using Terraform.

  • Demonstrated ability as a meticulous problem-solver with strong analytical skills, capable of diagnosing and resolving complex technical challenges.

  • Excellent communication, teamwork, and collaboration skills, with an ability to articulate technical concepts clearly to diverse audiences and lead technical responses during incidents.


Ways to stand out from the crowd:

  • Proven experience designing and implementing event-driven architectures using pub/sub patterns (e.g., AWS SNS / SQS, Google Pub / Sub, Azure Service Bus).

  • Knowledge of generative AI architectures (LLMs, diffusion models) and concepts such as Retrieval Augmented Generation (RAG) and vector databases.

  • Hands-on experience with the NVIDIA AI stack (NeMo, Triton Inference Server, TensorRT), with Production experience with NVIDIA NIM as a strong plus.

  • Experience in applying SRE principles to automate, enhance reliability, and improve performance in managed software development environments.

  • Familiarity with Python applications and libraries such as Pytorch and Pandas, common error behaviors in GPU compute environments, and how they manifest in common frameworks as well as experience with services commonly used in DLI training, including Build and Triton.

You will also be eligible for equity and .