Help customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on Kubernetes for large language models (LLMs) and generative AI workloads. Enhance performance tuning usingTensorRT/TensorRT-LLM,. Collaborate with multi-functional teams (engineering,...