Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Nvidia DL Performance Software Engineer - LLM Inference 
Canada, Ontario, Old Toronto 
398590704

31.08.2025
Canada, Toronto
time type
Full time
posted on
Posted 2 Days Ago
job requisition id

What you’ll be doing:

  • Write safe, scalable, modular, and high-quality (C++/Python) code for our core backend software for LLM inference.

  • Perform benchmarking, profiling, and system-level programming for GPU applications.

  • Provide code reviews, design docs, and tutorials to facilitate collaboration among the team.

  • Conduct unit tests and performance tests for different stages of the inference pipeline.

What we need to see:

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent experience.

  • Strong coding skills in Python and C/C++.

  • 2+ years of industry experience in software engineering or equivalent research experience.

  • Knowledgeable and passionate about machine learning and performance engineering.

  • Proven project experiences in building software where performance is one of its core offerings.

Ways to stand out from the crowd:

  • Solid fundamentals in machine learning, deep learning, operating systems, computer architecture and parallel programming.

  • Research experience in systems or machine learning.

  • Project experience in modern DL software such as PyTorch, CUDA, vLLM, SGLang, and TensorRT-LLM.

  • Experience with performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.


We strongly encourage you to include sample projects (e.g. Github) that demonstrate the qualifications above.

You will also be eligible for equity and .

Applications for this job will be accepted at least until September 2, 2025.