Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Nvidia DevTech Engineer - Windows LLM GenAI Open-Source Ecosystem 
Germany, North Rhine-Westphalia 
338407283

01.12.2024

For our team in Wuerselen we are now looking for a Developer Technology Engineer to ...

  • contribute to the LLM & GenAI open-source ecosystem to enable Windows AI enthusiasts and developers with innovative models and functionality as well as speed-of-light performance on RTX.

  • engage with our strategic partners and internal teams to overcome the challenges arising when deploying modern LLM & GenAI architectures on local workstations.

What you’ll be doing:

  • Improve Windows LLM & GenAI user experience on NVIDIA RTX by working on feature and performance enhancements of OSS software, including but not limited to projects like PyTorch, llama.cpp, ComfyUI.

  • Engage with internal product teams and external OSS maintainers to align on and prioritize OSS enhancements.

  • Work closely with internal engineering teams and external app developers on solving local end-to-end LLM & Generative AI GPU deployment challenges, using techniques like quantization or distillation.

  • Apply powerful profiling and debugging tools for analyzing most demanding GPU-accelerated end-to-end AI applications to detect insufficient GPU utilization resulting in suboptimal runtime performance.

  • Conduct hands-on trainings, develop sample code and host presentations to give good guidance on efficient end-to-end AI deployment targeting optimal runtime performance.

  • Guide developers of AI applications applying methodologies for efficient adoption of DL frameworks targeting maximal utilization of GPU Tensor Cores for the best possible inference performance.

  • Collaborate with GPU driver and architecture teams as well as NVIDIA research to influence next generation GPU features by providing real-world workflows and giving feedback on partner and customer needs.

What we need to see:

  • 5+years of professional experience in local GPU deployment, profiling and optimization.

  • BS or MS degree in Computer Science, Engineering, or related degree.

  • Strong proficiency in C/C++, Python, software design, programming techniques.

  • Familiarity with and development experience on the Windows operating system.

  • Proven theoretical understanding of Transformer architectures - specifically LLMs and Generative AI - and convolutional neural networks.

  • Experience working with open-source LLM and GenAI software, e.g. PyTorch or llama.cpp.

  • Experience with CUDA and NVIDIA's Nsight GPU profiling and debugging suite.

  • Strong verbal and written communication skills in English and organization skills, with a logical approach to problem solving, time management, and task prioritization skills.

  • Excellent interpersonal skills.

  • Some travel is required for conferences and for on-site visits with external partners.

Ways to stand out from the crowd:

  • Experience with GPU-accelerated AI inference driven by NVIDIA APIs, specifically cuDNN, CUTLASS, TensorRT.

  • Confirmed expert knowledge in Vulkan and / or DX12.

  • Familiarity with WSL2, Docker.

  • Detailed knowledge of the latest generation GPU architectures.

  • Experience with AI deployment on NPUs and ARM architectures.