המקום בו המומחים והחברות הטובות ביותר נפגשים
What you'll be doing:
Performance optimization, analysis, and tuning of DL models in various domains like LLM, Recommender, GNN, Generative AI.
Scale performance of DL models across different architectures and types of NVIDIA accelerators.
Contribute features and code to NVIDIA’s inference benchmarking frameworks, TensorRT, Triton and LLM software solutions.
Work with cross-collaborative teams across generative AI, automotive, image understanding, and speech understanding to develop innovative solutions.
What we need to see:
Masters or PhD or equivalent experience in relevant field (Computer Engineering, Computer Science, EECS, AI).
At least 3 years of relevant software development experience.
You'll need excellent C/C++ programming and software design skills. SW Agile skills are helpful and Python experience is a plus.
Prior experience with training, deploying or optimizing the inference of DL models in production is a plus.
Prior experience with performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU is a plus.
GPU programming experience (CUDA or OpenCL) is a plus.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך