Finding the best job has never been easier
Share
What you will do:
Execute performance and scalability benchmarks against various components of the RHEL/OpenShift AI platform to drive improvements, detect regressions etc through data analysis and visualization
Develop tools and automation to aid the performance benchmarking work
Collaborate with other teams to resolve performance issues
Triage, debug, and solve customer cases related to AI performance
Submit performance benchmarking results to industry consortia
Publish results, conclusions, recommendations and best practices via internal test reports, presentations, and external blogs to support our partners and customers.
Participate in internal and external conferences about your work and results
What you will bring:
Vast experience with AI technologies and frameworks (Pytorch, vllm, transformers, etc)
Experience in running performance tests, data capture, data analysis, and data visualization
Experience with systems performance engineering and metrics collection tools such as iostat, vmstat, sar, perf, pcp, prometheus, etc
Programming experience in Python
Experience working with the Linux operating system (RHEL, Fedora or CentOS preferred)
Excellent written and verbal language skills in English
Following is considered a plus:
Masters degree or Phd in Computer Science and related fields
Experience with container technologies (podman, Kubernetes)
Knowledge of AI benchmarking suites such as MLperf
Experience working with hardware accelerators such as GPUs
Experience working on a MLOps platform
Experience managing deep learning infrastructure
These jobs might be a good fit