Share
What you’ll be doing:
Implement deep learning models from multiple data domains (CV, NLP/LLMs, ASR, TTS, RecSys and others) in multiple DL frameworks (PyT, JAX, TF2, DGL and others)
Implement and test new SW features (Graph Compilation, reduced precision training) that use the most recent HW functionalities.
Analyze, profile, and optimize deep learning workloads on state-of-the-art hardware and software platforms.
Collaborate with researchers and engineers across NVIDIA, providing guidance on improving the design, usability and performance of workloads.
Lead best-practices for building, testing, and releasing DL software.
Contribute to creation of large scale benchmarking system, capable of testing thousands of models on vast diversity of hardware and software stacks.
What we need to see:
3+ years of experience in DL model implementation and SW Development.
BSc, MS or PhD degree in Computer Science, Computer Architecture or related technical field.
Excellent Python programming skills.
Extensive knowledge of at least one DL Framework (PyTorch, TensorFlow, JAX, MxNet) with practical experience in PyTorch required.
Strong problem solving and analytical skills.
Algorithms and DL fundamentals.
Docker containerization fundamentals.
Ways to stand out from the crowd:
Experience in performance measurements and profiling.
Experience with containerization technologies such as Docker.
GPU programming experience (CUDA or OpenCL) is a plus but not required.
Knowledge and love for DevOps/MLOps practices for Deep Learning-based product’s development.
Experience with CI systems (preferably GitLab).
These jobs might be a good fit