Share
What You Will Do
Work closely with management, product owners, developers, and quality engineers to understand product requirements and build suitable test plans to verify the performance and scale of OpenShift features and solutions for running AI workloads, such as Kubernetes Dynamic Resource Allocation (DRA), autoscaling, and operators for detection, configuration, and management of AI accelerators.
Develop sophisticated tests that simulate user workloads through comprehensive end-to-end automation, leveraging custom-built and state-of-the-art open-source tools and frameworks.
Deep dive into performance issues with the intent of discovering their root causes in complex distributed systems.
Design and develop monitoring and reporting tools for performance and scale tests and analysis.
Document your research and results clearly and concisely, and communicate findings both internally and externally.
Engage in upstream communities to help test performance and scale early and influence design and development decisions.
Triage, debug, and root cause customer issues related to OpenShift performance and scale.
Present your work and findings at internal and external conferences.
What You Will Bring
Master’s Degree in Computer Science or a related field with 1-2 years of relevant experience, or a Bachelor’s Degree in Computer Science or a related field with 3+ years of relevant experience.
Demonstrable experience, understanding, and passion for performance engineering.
Working knowledge of Kubernetes or OpenShift.
Strong programming, debugging, and profiling skills in Python and/or Golang.
Hands-on experience with performance measurement, analysis, and optimization.
Experience with distributed systems.
Very strong Linux system administration and system engineering skills.
Solid scripting skills, particularly with Bash, Python, or Ansible.
Experience working with public clouds like AWS, Azure, GCP, or IBM Cloud, as well as bare metal environments.
Experience analyzing and interpreting large volumes of test results and succinctly communicating findings through easy-to-understand graphs/charts.
Experience with collaborative software development methodologies, tools, and version control.
Knowledge of statistical analysis and experimental design techniques.
Excellent communication and interpersonal skills.
Ability to work independently and proactively seek collaboration.
The Following Are Considered a Plus:
Experience with container technologies like Podman or Docker, and familiarity with building container images.
Experience with system performance engineering and metrics collection tools like iostat, vmstat, sar, perf, and Prometheus.
Experience with monitoring and dashboarding tools like Prometheus and Grafana.
Experience with AI accelerators and tools for monitoring/managing their usage.
A demonstrated history of contributing to open-source projects.
Presentation skills and public speaking abilities for conferences and demonstrations.
The salary range for this position is $104,080.00 - $166,320.00. Actual offer will be based on your qualifications.
Pay Transparency
● Comprehensive medical, dental, and vision coverage
● Flexible Spending Account - healthcare and dependent care
● Health Savings Account - high deductible medical plan
● Retirement 401(k) with employer match
● Paid time off and holidays
● Paid parental leave plans for all new parents
● Leave benefits including disability, paid family medical leave, and paid military leave
These jobs might be a good fit