Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Nvidia Senior DL Algorithms Engineer - Cosmos 
United States, Texas 
589185754

Yesterday
US, CA, Santa Clara
US, CA, Remote
time type
Full time
posted on
Posted 7 Days Ago
job requisition id

What you will be doing:

  • Optimize deep learning models for low-latency, high-throughput inference, with a focus on LLMs, VLMs, diffusion models, and World Foundation Models (WFMs) designed for physical AI applications.

  • Convert, deploy, and optimize models for efficient inference using frameworks such as TensorRT, TensorRT-LLM, vLLM, and SGLang.

  • Understand, analyze, profile, and optimize performance of deep learning and physical AI workloads on state-of-the-art NVIDIA GPU hardware and software platforms

  • Implement and refine components and algorithms for efficient serving of LLMs, VLMs, and WFMs at datacenter scale, leveraging technologies like Dynamo.

  • Collaborate with research scientists, software engineers, and hardware specialists to ensure seamless integration of cutting-edge AI models from training to deployment

  • Contribute to the development of automation and tooling for NVIDIA Inference Microservices (NIMs) and inference optimization, including creating automated benchmarks to track performance regressions

What we want to see:

  • Master’s or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related field (or equivalent experience).

  • 3+ years of professional experience in deep learning, applied machine learning, or physical AI development.

  • Strong foundation in deep learning algorithms, including hands-on experience with LLMs, VLMs, and multimodal generative models such as World Foundation Models.

  • Deep understanding of transformer architectures, attention mechanisms, and inference bottlenecks.

  • Proficient in building, optimizing, and deploying models using PyTorch or TensorFlow in production-grade environments.

  • Solid programming skills in Python and C++.

  • Experience with model quantization and modern inference optimization techniques (e.g., KV cache, in-flight batching, parallelization mapping).

  • Strong fundamentals in GPU performance analysis and profiling tools (e.g., Nsight, nsys profiling).

  • Familiarity with serving models using Triton Inference Server and PyTriton via Docker.

Ways to stand out from the crowd:

  • Proven experience deploying LLMs, VLMs, diffusion models, or World Foundation Models (WFMs) at scale in real-world applications, especially for robotics or autonomous vehicles.

  • Hands-on experience with model optimization and serving frameworks, such as: TensorRT, TensorRT-LLM, vLLM, SGLang, and ONNX.

  • Direct experience with NVIDIA Cosmos, Isaac Sim, Isaac Lab, or Omniverse platforms for synthetic data generation and physical AI simulation.

  • Experience with data curation pipelines and tools like NVIDIA NeMo Curator for large-scale video data processing and model post-training.

  • Deep understanding of distributed systems for large-scale model inference and serving.

You will also be eligible for equity and .