Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Ford Site Reliability Engineer 
Mexico, State of Mexico, Nezahualcóyotl 
415005127

11.09.2024

SRE Software Engineer is responsible

MAJOR RESPONSIBILITIES

  • They will be utilizing Observability and Monitoring tools to detect and resolves issues effecting positive user experience
  • The engineer will also be responsible for automating alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime
  • Splunk query language and Monitored Database Connection Health by using Splunk DB connect health dashboards, log parsing, complex Splunk searches, including external table lookups, Splunk data flow, components, features and product capability.
  • Observability: Implement comprehensive monitoring and alerting solutions using GCP monitoring services and external services
  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
  • Build vital and efficient tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability focused on Observability.
  • Configure dashboards, alerts, and notifications to ensure timely identification and resolution of issues.
  • Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions
  • Monitor Server, network infrastructure and application performance metrics, and identify patterns and trends to improve system performance and reliability
  • Develop and integrate tools for logging, monitoring, and alerting to enhance visibility into system performance
  • Participate in strategic planning for the technology roadmap, including scalability, cost-effectiveness, and risk management considerations related to observability infrastructure

BACKGROUND REQUIREMENTS

  • 6+ years of SRE observability engineering experience
  • 6+ years of experience in observability best practices working with Dynatrace or similar tools (NewRelic, DataDog, AppDynamics, or other similar APM suites), delivering solutions across all environments, and integrating platforms and applications with monitoring and APM tools.
  • Knowledge of CI/CD tools such as Puppet, Jenkins, Terraform, Ansible
  • Should have a minimum 4 to 5 years' working experience in OpenShift and Docker/K8s
  • Proficiency in implementing monitoring and observability solutions using GCP monitoring services such as Cloud Monitoring, Logging, and Tracing
  • Deep understanding of IT infrastructure monitoring and observability best practices
  • Experience with gathering and organizing large amounts of data to use for instrumentation into an Enterprise monitoring solution.
  • Experience with recommending baseline monitoring thresholds and performance monitoring KPIs and SLAs
  • Experience of at least 4 + years of experience in development of Grafana Dashboards, develop Metrics / monitoring Standardization - Metrics, collection, Dashboards with Grafana a must
  • 3-5 years of experience with SQL and familiarity with at least one managed Kubernetes platforms (EKS, AKS, GKE)
  • Strong background in software engineering, with expertise in relevant programming languages (like Python, Java, Go) and cloud platforms (like AWS, GCP, Azure)
  • Experience with container orchestration tools like Kubernetes

COMPETENCIES AND SKILLS

  • Strong interpersonal, and organizational skills
  • Strong verbal and written skills
  • Attention to detail
  • Excellent time management
  • Extraordinary teamwork and collaborative skills-Own Working Together