Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Nvidia Solutions Architect - Generative AI 
United States, California 
548493734

02.07.2025
US, CA, Santa Clara
time type
Full time
posted on
Posted 30+ Days Ago
job requisition id

What You Will Be Doing:

  • Architect end-to-end generative AI applications for the Digital Marketing Organization with a focus on LLM deployment and RAG workflows.

  • Get hands-on and use advanced Python programming knowledge to make valuable contributions at both the application and infrastructure levels.

  • Provide technical leadership and guidance on standard methodologies for training LLMs and implementing RAG-based solutions.

  • Work with our primary collaborators, NVIDIA’s Marketing Team, to understand their requirements and deliver tailored solutions to their requests as well as partner with the Digital Marketing Org’s AI Development Team and other development resources to complete projects.

  • Collaborate closely with our globally dispersed development, MLOps, product, engineering, and business teams.

  • Implement strategies for efficiently and effectively implementing AI workflows and agents to achieve optimal performance using NVIDIA’s hardware and software platforms.

  • Lead workshops and design sessions with our Digital Marketing Development Teams to define and refine generative AI solutions focused on LLMs and RAG workflows.

  • Design and implement RAG-based workflows to enhance content generation and information retrieval.

  • Work closely with NVIDIA engineering and product teams to provide feedback and contribute to the evolution of generative AI software.

  • Work closely with the Digital Marketing Org’s Web and Platform Teams to integrate RAG workflows into their applications and systems.

What We Need To See:

  • Master's or Ph.D. in Computer Science, Artificial Intelligence, or a related field; or equivalent experience in building and deploying AI-powered solutions at scale.

  • 8+ years of hands-on experience in a technical role, including experience with generative AI.

  • Advanced proficiency in Python programming, with the ability to contribute at both the application and infrastructure levels.

  • Knowledge of building Agentic frameworks and multi-agent applications using Langchain, Langgraph, etc.

  • Hands-on experience with or understanding of NVIDIA’s hardware and software technologies (e.g. CUDA, Triton, TensorRT, NeMo, RAPIDS, etc.)

  • Proven record of successfully deploying and optimizing LLM models for inference in production environments.

  • In-depth understanding of state-of-the-art language models, such as modern open models (e.g. Llama, Mistral) and proprietary APIs (e.g. ChatGPT, Claude, Gemini).

  • Expertise in training and fine-tuning LLMs using NVIDIA NeMo Framework and other popular frameworks.

  • Strong knowledge of cloud and datacenter GPU systems

  • Excellent communication and collaboration skills with the ability to articulate complex technical concepts to both technical and non-technical team members.

  • Experience leading workshops, training sessions, and communicating technical solutions to diverse audiences.

Ways To Stand Out From The Crowd:

  • Experience in deploying LLM models in cloud environments (e.g., AWS, Azure, GCP) and on-premises infrastructure.

  • Experience working with any agentic models/frameworks.

  • Working experience with Observability and Evaluation tools

  • Familiarity with containerization technologies (e.g., Docker) and orchestration tools (e.g., ECS, Kubernetes) for scalable and efficient model deployment.

  • Hands-on experience with NVIDIA GPU technologies, and GPU cluster management, and ability to design and implement scalable and efficient workflows for LLM training and inference on GPU clusters

You will also be eligible for equity and .