What you will do:
Design performance test plans for new MLPerf Training and Inference suites (datacenter & edge) covering LLMs primarily.
Execute accuracy, performance and power submissions; triage regressions and drive improvements using LoadGen logs, traces, and other profiling tools.
Build and maintain highly optimized container-based harnesses (Podman/Kubernetes) for benchmark execution, dataset pre-processing steps and compliance checks for reproducible CI execution.
Profile kernels, GPU runtimes and distributed collectives; propose patches to Red Hat AI software stack to remove bottlenecks revealed by MLPerf™ results.
Represent Red Hat in MLCommons™ working groups; upstream fixes and new benchmark proposals.
Present results and best practices at premier open source and industry conferences; author technical blogs and white papers that translate benchmark data into customer value.
What you will bring:
5+ years of relevant industry experience in performance engineering or ML infrastructure
Hands-on MLPerf (Training or Inference) harness work
Strong Python & Bash proficiency plus one systems language (Go/Rust/C++)
Expert Linux skills (cgroups, scheduler, perf, NUMA, GPU drivers, etc)
Experience with container orchestration (Kubernetes, OpenShift)
Performance-profiling literacy (nvprof, Nsight Systems, perf, eBPF)
Clear written & spoken English
The following will be considered a plus:
Master’s or PhD in Computer Science, AI, or a related field
Prior MLPerf submission ownership or MLCommons working-group membership
Deep knowledge of LLM inference runtimes
Contributions to PyTorch or TensorFlow performance patches
משרות נוספות שיכולות לעניין אותך