Bachelor’s/Master’s in Computer Science, Electronics Engineering, Mathematics, or related field with 6–12 years of experience.
Knowledge of ML/DL is a Must including LLM Architecture, Transformer, Attention, Low precision Data type (fp8/fp4), Quantization Techniques, Open source upstreaming, Inference Serving etc.
Hands-on experience with ML/DL models and distributed training/inference using PyTorch, Tensorflow, vLLM/SGLang or similar frameworks.
Strong skills in performance debugging, numerical analysis, and regression tracking in validation environments.
Strong skills in validation framework design and test case development for High level frameworks like SGLang/vLLM
Proficiency in Python (NumPy, SciPy, Pandas, PyTest)
Solid Linuxdevelopment/debuggingexperience (git, cmake, gdb, strace, perf), and familiarity with Git/GitHub/Gerrit workflows and CI/CD automation.
Understanding of distributed systems, HPC/GPU scaling, MPI/torchrun/Fully Sharded Data Parallel/Tensor Parallel, and high-performance networking(Ethernet/InfiniBand).
Skilled in Docker/Kubernetes, virtualization, performance benchmarking, and automation.
Strong analytical, problem-solving, and communication skills with ability to work across architecture, development, and validation teams.