Design and evolve scalable architectures for multi-node LLM inference across GPU clusters. Develop infrastructure tooptimize latency,throughput, and cost-efficiency of serving large models in production. Collaborate with model, systems, compiler, and...