The point where experts and best companies meet
Share
What you’ll be doing:
Conduct applied research and design innovative algorithms in the space of image/ video and vision-language foundation models.
Stay abreast of the latest research papers, breakthroughs in multi-modality/ foundation models research and implement these to iteratively improve NVIDIA models.
Developing AI infrastructure for large scale training and evaluation pipelines for foundation models.
Drive the gathering, building, and auto-labeling pipelines for annotation of datasets to train Domain specific SOTA VLMs and FMs.
Develop, Train, Fine-tune, and Deploy Vision-Language and Visual Foundation models for varied usecases including smart cities, industrial manufacturing, gaming etc.
Apply alignment techniques such as instruction tuning, reinforcement learning from human feedback (RLHF), and parameter efficient fine-tuning such as p-tuning, adaptors, LoRA, and so on to improve use cases.
Collaborate closely with Research teams to develop and bring SOTA models to product.
Mentor and guide junior team members, encouraging a collaborative and innovative team culture.
What we need to see:
MS or PhD in Computer Science, Computer Engineering or Electrical Engineering or related field in Deep Learning, Machine Learning and Computer Vision or equivalent experience.
5+ years of algorithm development experience, excellent implementation skills in the Gen AI domain.
Experience in training vision-foundation models including ViT, LLaVA, CLIP, Diffusion and VLMs.
Hands-on experience with deep learning frameworks (e.g., TensorFlow, PyTorch) and proficiency in modern software development practices (version control, testing, CI/CD).
Publications in top-tier AI conferences or contributions to open-source projects is a plus.
Excellent communication skills and the ability to collaborate efficiently in a cross-functional, distributed team environment.
You will also be eligible for equity and .
These jobs might be a good fit