Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Apple AIML - Site Reliability Engineer ML Platform & Technology 
Singapore 
263508220

01.06.2024
Description
- Monitor production, staging and development environments for a myriad of services in an agile and dynamic organization. - Employ metrics for data driven solutions for reliability, performance and service insights- Design, implement, and extend automation tools for monitoring, logging, ML and data processing pipelines- Resolve future needs for capacity and investigate new features and products.- Strong problem solving ability will be used daily; a successful Engineer will take steps on self-initiative basis to isolate issues and resolve root cause through investigative analysis. - Responsible for writing justifications, incident reports, best practices documentation and solution specifications.
Key Qualifications
  • 2 or more years of experience in a Site Reliability Engineering, observability or ML Ops focused role supporting internet services and distributed systems
  • Proficiency in using Go, Python or other higher-level languages for automation, observability and infrastructure management
  • Experience building and supporting telemetry, observability and logging solutions for incident, cost and performance management
  • Experience with infrastructure or dashboards as code and provisioning tools for Kubernetes and cloud based services
  • Working knowledge of open source or commercial monitoring and observability frameworks and platforms such as ELK, Splunk, OpenCensus, Datadog
  • Working knowledge of ML Ops systems and tools advantageous
  • Good interpersonal skills shown through previous projects or assignments
Education & Experience
Bachelor Degree in Computer Science or Computer Engineering or equivalent