Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

SAP DataLake AI Platform Operation Engineer 
China, Shanghai 
373524615

04.07.2024

What You'll Do:

----------------

  • Infrastructure Operation : Utilize OpenStack-based IaaS resources and optimize their provisioning to ensure efficient infrastructure operations.
  • Cross-Node Resource Management: Manage Kubernetes clusters across different regions and availability zones, ensuring optimal performance for use-cases and shared services while minimizing resource consumption.
  • Logging, Auditing, and Metrics : Implement distributed logging solutions using Loki and OpenSearch. Configure auditing for each use-case and collect Prometheus-based metrics from both platform services and use-cases.
  • Dashboarding and Monitoring: Develop dashboards tailored to specific needs and monitor the platform using the dashboard tools you create.
  • Support Platform Use-Cases : Assist use-case development teams in maximizing the platform's capabilities for their projects.
  • TCO Management: Automate the calculation of the total cost of ownership for platform infrastructure and licenses, and allocate these costs to each specific use-cases.
  • Collaboration, Documentation, and Training : Collaborate with peers across regions to support various projects, document new changes, and provide training to platform users.

What You Bring:

----------------

  • Bachelor's degree in Computer Science, Engineering, or a related field; advanced degrees are a plus.
  • Basic understanding of GPU-based computing concepts, and familiarity with AI/ML frameworks and tools such as CUDA, Kubeflow, Spark, or PyTorch.
  • Solid knowledge of Kubernetes and container orchestration concepts.
  • Proficiency in coding languages (e.g., Python, Go, Shell) for automation and infrastructure management.
  • Proven experience in infrastructure and operations management for cloud service solutions.
  • Strong problem-solving skills and the ability to diagnose and resolve complex technical issues.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams.
  • Strong attention to detail and the ability to manage multiple priorities in a fast-paced environment.