Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Red hat Principal Software Engineer - OpenShift AI Model Training 
Ireland 
893056222

26.06.2024

Job Summary:

The future of the AI industry is open with extensive opportunities, and RHOAI is a strategic investment area for Red Hat. You'll join an ecosystem that fosters continuous learning, career growth, and professional development. This hands-on experience is a great way for you to get first-hand exposure to the AI landscape.

What you will do

  • Lead Red Hat’s participation in machine learning related upstream communities to ensure the technologies work on OpenShift and can be integrated with RHOAI

  • Architect and lead implementation of scalable open source solutions for Data Scientists to leverage distributed computing capabilities to train their Machine Learning models, running on OpenShift

  • Act as a MLOps SME within Red Hat by supporting customer facing discussions, presenting at technical conferences, and evangelizing OpenShift AI within the internal community of practice

  • Architect and design new features for open source communities such as Kueue (https://github.com/kubernetes-sigs/kueue) , KubeRay (https://github.com/ray-project/kuberay) , PyTorch, KubeFlow (https://www.kubeflow.org) , and CodeFlare (https://github.com/project-codeflare/codeflare)

  • Provide technical vision and leadership on critical and high impact projects

  • Mentor, influence, and coach a distributed team of engineers

  • Present at OpenShift/Kubernetes, and AI/ML related technology conferences and internally within the AI/ML communities of practice

What you will bring

  • An existing contributor in one or more MLOps open source projects such as Ray/KubeRay, KubeFlow, Pytorch, Katib (https://github.com/kubeflow/katib)

  • Experience training and tuning ML models using tools like Ray, Kubeflow training operator (https://github.com/kubeflow/training-operator) , Katib, MLFlow (https://github.com/mlflow/mlflow) , or similar

  • Advanced level of experience with Kubernetes

  • Advanced level knowledge and experience in development in Go or Python

  • Excellent system understanding and troubleshooting capabilities

  • Solid innovation skills and a passion for technology

  • Technical leadership acumen in a global team environment & mentorship experience

  • Passion for writing and maintaining reliable code

  • Excellent written and verbal communication skills; good English language skills

The following will be considered a plus:

  • Bachelor's degree in statistics, mathematics, computer science, operations research, or a related quantitative field, or equivalent expertise; Master’s or PhD

  • Experience in engineering, consulting or another field related to distributed model training or data processing in a customer environment or supporting a data science team

  • Highly experienced in OpenShift