Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Nvidia AI Application Developer 
China, Shanghai 
83574980

Today
China, Shanghai
time type
Full time
posted on
Posted 4 Days Ago
job requisition id

We are seeking a skilled developer to build production-grade AI applications, focusing on LLM-based agents and tool-using systems. You will integrate large language models (LLMs), retrieval-augmented generation (RAG), and external tools/APIs on GPU-accelerated stacks, enhancing agent frameworks for reliability, scalability, and safety.

What You’ll Be Doing:

  • Design, implement, and deploy AI-powered features using LLMs, including autonomous and multi-agent workflows.

  • Build agent toolchains, including planning, tool/function calling, memory management, RAG integration, and enterprise API connectivity.

  • Enhance agent frameworks with custom planners, routers, concurrency control, state management, and retry mechanisms.

  • Develop evaluation and observability systems to monitor agent performance (success rates, tool-call accuracy, latency, cost, traces).

  • Implement safety and compliance measures, including content filtering, PII handling, and policy enforcement using guardrail frameworks.

  • Optimize inference pipelines for GPU performance, latency, and cost; deploy via microservices and APIs.

  • Manage CI/CD, containerization, and deployment; maintain monitoring, logging, and alerting; and produce clear documentation.

What we need to see:

  • BS or MS in Computer Science, Electrical/Computer Engineering, or a related field.

  • 2–3 years of experience building AI applications, with at least 1 year focused on developing LLM-based agents (e.g., tool use, function calling, ReAct-style reasoning, RAG integration).

  • Strong programming skills in Python and one of C++, JavaScript, or TypeScript.

  • Experience with an agent framework such as LangChain Agents/LangGraph, AutoGen, CrewAI, Semantic Kernel, or Haystack Agents.

  • Proficiency in creating custom tools/functions, integrating external APIs, and working with async workflows and retries.

  • Practical experience with privacy, responsible AI practices, prompt engineering, and content filtering.

  • Familiarity with PyTorch or TensorFlow, REST/gRPC, Docker, cloud platforms (AWS, Azure, or GCP), and databases (SQL/NoSQL, vector DBs).

Ways to stand out from the crowd:

  • Experience customizing agent frameworks (e.g., planners, routers, memory, tool management, conversation state machines).

  • Expertise in multi-agent systems, workflow orchestration, or event-driven designs; familiarity with structured output (e.g., JSON schema, OpenAPI).

  • Knowledge of evaluation and guardrail systems (e.g., NeMo Guardrails, Guardrails AI, custom evaluation harnesses, A/B testing, telemetry).

  • Experience optimizing GPU inference with tools like NVIDIA Triton Inference Server, TensorRT, or RAG tooling.

  • Skills in retrieval efficiency (e.g., ANN, indexing), caching, or cost-aware inference.