Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Nvidia Senior Software Engineer - NIM Production Automation 
Vietnam, Thái Nguyên Province, Thái Nguyên 
782921877

Today
Vietnam, Ho Chi Minh City
Vietnam, Hanoi
time type
Full time
posted on
Posted 5 Days Ago
job requisition id

You will apply your expertise to develop highly available services that make effective use of the thousands of GPU involved in this operation. Your services provide the best-in-class performance, accuracy and availability. We are looking for technical talent to design, build, operate and improve our capabilities to produce NIMs at scale, including the underlying infrastructure, pipelines, inference backends, Docker build, test harness, metrics, performance engineering, log ingestion, and more.

What you'll be doing:

  • Design, build, and optimize containerized inference execution for various AI applications, ensuring efficiency and scalability. These applications may run in container orchestration platforms like Kubernetes to enable scalable and robust deployment.

  • Develop and deploy automation applications and microservices (e.g., in Python, Go) supporting the NIM factory.

  • Ensure the performance, scalability, and availability of NIMs and the automation infrastructure through comprehensive performance measurement, monitoring, and optimization.

  • Implement and manage CI/CD pipelines for automated testing and deployment.

  • Apply container and orchestration expertise (Docker, Kubernetes) to create and optimize the basic building blocks of NIMs and automation tooling.

  • Collaborate, brainstorm, and improve the designs of inference solutions with a broad team of software engineers, researchers, SREs, and product management.

  • Mentor and collaborate with team members and other teams to foster growth and development. Demonstrate a history of learning and enhancing both personal skills and those of colleagues.

What we need to see:

  • A history of using advanced programming skills (e.g., Python, Go) to build distributed compute systems, backend services, microservices, and cloud technologies.

  • Experience productionizing and deploying various types of AI models (e.g., foundation models, computer vision, speech recognition).

  • Experience implementing robust CI/CD pipelines for automated testing and deployment.

  • Effective experience working with multi-functional teams, principals, and architects across organizational boundaries.

  • Mentorship and the ability to grow teams and team members.

  • Deep technical expertise in distributed containerized applications using Docker, Kubernetes, Cloud Endpoints, Helm, and Prometheus.

  • Passion for building scalable and performant microservice applications.

  • Excellent interpersonal skills and the flexibility to lead multi-functional efforts.

  • Proven experience debugging and analyzing the performance of distributed microservices or cloud systems.

  • A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.

  • 6+ years of demonstrated experience in developing performant microservices, cloud software, and/or tooling roles.

Ways to stand out from the crowd:

  • Experience with multiple container engines, internals of the container image and runtime.

  • Prior experience in building and deploying containers for Microservices, Cloud, and On-prem deployments.

  • Background with large-scale full-stack development.

  • Experience delivering event-driven applications using services such as Temporal, Kafka, Redis, or similar.

  • Experience with deploying AI inferencing workloads, benchmarking and testing AI models.