Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Nvidia AI Application Developer
China, Shanghai
83574980

15.10.2025

China, Shanghai

time type: Full time

posted on: Posted 4 Days Ago

job requisition id

We are seeking a skilled developer to build production-grade AI applications, focusing on LLM-based agents and tool-using systems. You will integrate large language models (LLMs), retrieval-augmented generation (RAG), and external tools/APIs on GPU-accelerated stacks, enhancing agent frameworks for reliability, scalability, and safety.

What You’ll Be Doing:

Design, implement, and deploy AI-powered features using LLMs, including autonomous and multi-agent workflows.
Build agent toolchains, including planning, tool/function calling, memory management, RAG integration, and enterprise API connectivity.
Enhance agent frameworks with custom planners, routers, concurrency control, state management, and retry mechanisms.
Develop evaluation and observability systems to monitor agent performance (success rates, tool-call accuracy, latency, cost, traces).
Implement safety and compliance measures, including content filtering, PII handling, and policy enforcement using guardrail frameworks.
Optimize inference pipelines for GPU performance, latency, and cost; deploy via microservices and APIs.
Manage CI/CD, containerization, and deployment; maintain monitoring, logging, and alerting; and produce clear documentation.

What we need to see:

BS or MS in Computer Science, Electrical/Computer Engineering, or a related field.
2–3 years of experience building AI applications, with at least 1 year focused on developing LLM-based agents (e.g., tool use, function calling, ReAct-style reasoning, RAG integration).
Strong programming skills in Python and one of C++, JavaScript, or TypeScript.
Experience with an agent framework such as LangChain Agents/LangGraph, AutoGen, CrewAI, Semantic Kernel, or Haystack Agents.
Proficiency in creating custom tools/functions, integrating external APIs, and working with async workflows and retries.
Practical experience with privacy, responsible AI practices, prompt engineering, and content filtering.
Familiarity with PyTorch or TensorFlow, REST/gRPC, Docker, cloud platforms (AWS, Azure, or GCP), and databases (SQL/NoSQL, vector DBs).

Ways to stand out from the crowd:

Experience customizing agent frameworks (e.g., planners, routers, memory, tool management, conversation state machines).
Expertise in multi-agent systems, workflow orchestration, or event-driven designs; familiarity with structured output (e.g., JSON schema, OpenAPI).
Knowledge of evaluation and guardrail systems (e.g., NeMo Guardrails, Guardrails AI, custom evaluation harnesses, A/B testing, telemetry).
Experience optimizing GPU inference with tools like NVIDIA Triton Inference Server, TensorRT, or RAG tooling.
Skills in retrieval efficiency (e.g., ANN, indexing), caching, or cost-aware inference.

Full job details

These jobs might be a good fit

IBM