Design and prototype scalable software systems that optimize distributed AI training and inference—focusing on throughput, latency, and memory efficiency. Develop and evaluate enhancements to communication libraries such asNCCL,UCX, andUCC, tailored...