Analyze state-of-the-art AI models, identifying key performance bottlenecks and opportunities at the kernel level. Develop, optimize, and evaluate both hand-tuned and compiler-generated kernels for inference workloads, balancing speed and flexibility....