Bachelor’s/Master’s in Computer Science, Electronics Engineering, Mathematics, or related field with 6–12 years of experience.
Knowledge of ML/DL is a Must including LLM Architecture, Transformer, Attention, Low precision Data type (fp8/fp4), Quantization Techniques, Open source upstreaming, Inference Serving etc.
Hands-on experience with ML/DL models and distributed training/inference using PyTorch, Tensorflow, vLLM/SGLang or similar frameworks.
Strong skills in performance debugging, numerical analysis, and regression tracking in validation environments.
Strong skills in validation framework design and test case development for Kernel providers like Cutlass, Triton etc.
Proficiency in Strong C++ development expertise (C++17/STL, Gtest).
Solid Linuxdevelopment/debugging
Understanding of distributed systems, HPC/GPU scaling, MPI/torchrun/Fully Sharded Data Parallel/Tensor Parallel, and high-performance networking(Ethernet/InfiniBand).
Skilled in Docker/Kubernetes, virtualization, performance benchmarking, and automation.
Strong analytical, problem-solving, and communication skills with ability to work across architecture, development, and validation teams.