Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

SAP DevOps Observability Engineer 
Mexico, Nuevo León 
930413584

25.07.2024

Cloud Systems Integration Engineer - Observability Specialist


About the Role:
Join our global team specializing in Reliability Engineering and Services, collaborating with DevOps Engineers and Operational Experts. As a Cloud Systems Integration Engineer - Observability Specialist, you will play a crucial role in developing and implementing cutting-edge Observability solutions powered by Big Data, streaming pipelines, Machine Learning, and Large Language models. You'll focus on integrating public and private cloud solutions into the SAP ecosystem, optimizing alerts, metrics, and logs using AI-driven Observability solutions. You will work as part of an implementation project, with an eye toward integration and operationalization, where you will ultimately join the core SRE teams in support of the environments. You will be expected to bridge the gap between Infrastructure, Platform, and Application Observability from the point of first contact. You will also support troubleshooting during major incidents related to our global cloud infrastructure, ensuring excellence in triage and resolution. You will help the team to reduce critical KPI's around MTTD/MTTR, Signal to Noise Ratio, and other relevant metrics using these advanced methods.


Key Responsibilities:

  1. Collaborate with cross-functional teams following Agile methodologies like SCRUM.
  2. Prioritize and deliver high-quality developments within tight timelines.
  3. Build expertise in hyperscaler provider architectures and API integration models.
  4. Ensure seamless operations and maximum uptime for our services.
  5. Participate in On-Call rotational coverage, including weekends and holidays.
  6. Share knowledge and drive hyperscaler adoption and integration.
  7. Support ongoing Observability and Monitoring enhancements/development across the SAP Cloud Ecosystem.
  8. Assist SRE teams in Reliability Services.

Required Skills:

  1. Rapid adoption of cutting-edge technologies.
  2. Advanced analytical and problem-solving abilities.
  3. Strong team player with exceptional communication skills.
  4. Self-driven with a sense of urgency to resolve issues efficiently.
  5. Proficient in spoken and written English.

Experience:

  • Development: 4+ years of professional or enterprise development experience.
  • Strong knowledge of Python & JavaScript programming.
  • Experience in REST API implementation (Flask or FastAPI).
  • Microservice-based development expertise.
  • DevOps:
  • CI/CD pipelines using Azure, Jenkins, or similar tools.
  • Hands-on experience with Docker containers & Kubernetes.
  • Public cloud environments (GCP/AWS/Azure).
  • Solid grasp of JSON, YAML, & Github.
  • Enterprise/Service Provider Data Center Architecture.
  • Familiarity with Fault Monitoring and Performance Management tools.
  • Hyperscalers:
  • Certifications with public cloud providers (GCP, AWS, Azure, IBM, Alibaba Cloud, etc.).
  • Adoption and integration methodologies between cloud solutions
  • Observability data ingestion and pipelines working knowledge.
  • Algorithms, data structures & patterns.
  • Preferred:
  • Experience with Elasticsearch, Splunk, or similar platforms.
  • Web development frameworks knowledge.
  • Familiarity with Terraform, HelmChart, Ansible, or similar tools.
  • Understanding of Kubeflow, MLFlow, Dataflow, or similar technologies.

Education:
Bachelor's or equivalent in Software Engineering, Computer Science, or related fields.
Industry Technical Certifications (CCNA, CKA, RHCE, AZ-900, etc.) and ITIL courseware are beneficial.