Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Nvidia Senior Software Engineer - Inference Service 
United States, California 
452006747

26.08.2025
US, CA, Santa Clara
time type
Full time
posted on
Posted 3 Days Ago
job requisition id

What you'll be doing:

  • Contribute to the design and development of a scalable, robust, and reliable platform for serving AI models for inference as a service.

  • Develop and implement systems for dynamic GPU resource management, autoscaling, and efficient scheduling of inference workloads.

  • Build and maintain the core infrastructure, including load balancing and rate limiting, to ensure the stability and high availability of inference services.

  • Implement APIs for model deployment, monitoring, and management for a seamless user experience.

  • Collaborate with engineering teams to integrate deployment, monitoring, and performance telemetry into our CI/CD pipelines.

  • Build tools and frameworks for real-time observability, performance profiling, and debugging of inference services.

  • Work with architects to define and implement best practices for long-term platform evolution.

  • Contribute to NVIDIA's AI Factory initiative by building a foundational platform that supports model serving needs.

What we need to see:

  • BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience)

  • 12+ years of software engineering experience with expertise in distributed systems or large-scale backend infrastructure.

  • Strong programming skills in Python, Go, or C++ with a track record of building production-grade, highly available systems.

  • Proven experience with container orchestration technologies like Kubernetes.

  • A strong understanding of system architecture for high-performance, low-latency API services.

  • Experience in designing, implementing, and optimizing systems for GPU resource management.

  • Familiarity with modern observability tools (e.g., DataDog, Prometheus, Grafana, OpenTelemetry).

  • Demonstrated experience with deployment strategies and CI/CD pipelines.

  • Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.

Ways to stand out from the crowd:

  • Experience with specialized inference serving frameworks

  • Open-source contributions to projects in the AI/ML, distributed systems, or infrastructure space.

  • Hands-on experience with performance optimization techniques for AI models, such as quantization or model compression.

  • Expertise in building platforms that support a wide variety of AI model architectures.

  • Strong understanding of the full lifecycle of an AI model, from training to deployment and serving.

You will also be eligible for equity and .