Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Qualcomm MLOps/DevOps Engineer - ML Platform Cork 
Ireland, Cork 
38067521

30.08.2024

QT Technologies Ireland Limited

Job Area:

Engineering Group, Engineering Group > Systems Engineering

About The Role

You will work closely with cross-functional teams, including data scientists, software engineers, and infrastructure specialists, to ensure the smooth operation and scalability of our ML infrastructure. Your expertise in MLOps, DevOps, and knowledge of GPU clusters will be vital in enabling efficient training and deployment of ML models.

Responsibilities will include:

  • Architect, develop, and maintain the ML & Data platform to support training and inference of ML models.

  • Design and implement scalable and reliable infrastructure solutions for NVIDIA clusters both on premises and AWS Cloud.

  • Collaborate with data scientists and software engineers to define requirements and ensure seamless integration of ML and Data workflows into the platform.

  • Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment.

  • Monitor and troubleshoot system performance, identifying and resolving issues to ensure the availability and reliability of the ML platform.

  • Implement and maintain CI/CD pipelines for automated model training, evaluation, and deployment using technologies like ArgoCD and Argo Workflow.

  • Implement and maintain monitoring stack using Prometheus and Grafana to ensure the health and performance of the platform.

  • Manage AWS services including EKS, EC2, VPC, IAM, S3, and EFS to support the platform.

  • Implement logging and monitoring solutions using AWS CloudWatch and other relevant tools.

  • Stay updated with the latest advancements in MLOps, distributed computing, and GPU acceleration technologies, and proactively propose improvements to enhance the ML platform.

What are we looking for:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

  • Proven experience as an MLOps/DevOps Engineer or similar role, with a focus on large-scale ML and/or Data infrastructure and GPU clusters.

  • Strong expertise in configuring and optimizing NVIDIA DGX clusters for deep learning workloads.

  • Proficient in using the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana.

  • Solid programming skills in languages like Python, Go and experience with relevant ML frameworks (e.g., TensorFlow, PyTorch).

  • In-depth understanding of distributed computing, parallel computing, and GPU acceleration techniques.

  • Familiarity with containerization technologies such as Docker and orchestration tools.

  • Experience with CI/CD pipelines and automation tools for ML workflows (e.g., Jenkins, GitHub, ArgoCD).

  • Experience with AWS services such as EKS, EC2, VPC, IAM, S3, and EFS.

  • Experience with AWS logging and monitoring tools.

  • Strong problem-solving skills and the ability to troubleshoot complex technical issues.

  • Excellent communication and collaboration skills to work effectively within a cross-functional team.

We would love to see:

  • Experience with training and deploying models for Automated Driving.

  • Knowledge of ML model optimization techniques and memory management on GPUs.

  • Familiarity with ML-specific data storage and retrieval systems.

  • Understanding of security and compliance requirements in ML infrastructure.

Where you will be working

A gateway to Europe, Cork airport provides access to almost 50 international destinations including transatlantic air routes.

What's on Offer

Apart from working in an open, relaxed and collaborative space, you will enjoy:

  • Salary, stock and performance related bonus

  • Maternity/Paternity Leave

  • Employee stock purchase scheme

  • Matching pension scheme

  • Education Assistance

  • Relocation and immigration support (if needed)

  • Life, Medical, Income and Travel Insurance

  • Subsidised memberships for physical and mental well-being

  • Bicycle purchase scheme

  • Employee run clubs, including, running, football, chess, badminton + many more

Minimum Qualifications:

• Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 2+ years of Systems Engineering or related work experience.

Master's degree in Engineering, Information Systems, Computer Science, or related field and 1+ year of Systems Engineering or related work experience.

PhD in Engineering, Information Systems, Computer Science, or related field.

*References to a particular number of years experience are for indicative purposes only. Applications from candidates with equivalent experience will be considered, provided that the candidate can demonstrate an ability to fulfill the principal duties of the role and possesses the required competencies.

Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.