Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Applied Materials Lead Architect – SRE & Observability 
India, Karnataka 
164677264

Yesterday

As a Lead Architect – SRE & Observability, you will play a key leadership role in designing, scaling, and governing monitoring and observability platforms, while ensuring the reliability of infrastructure and application services. You will lead cross-functional initiatives, establish technical standards, and drive automation, telemetry, and incident response maturity across the enterprise.

Key Responsibilities:

  • Monitoring & Observability (CAMO Focus)

  • Architect and lead end-to-end observability strategies (logs, metrics, traces) across on-premises, private, and public cloud environments.
  • Manage and mature enterprise observability solutions across complex architectures.
  • Define standards for telemetry data collection, correlation, and alerting for distributed systems.
  • Collaborate with application and infrastructure teams to ensure instrumentation coverage and SLO/SLI definition.
  • Lead the migration and consolidation of legacy monitoring platforms to modern observability stacks.
  • Enable proactive problem detection, root cause analysis, and capacity forecasting using analytics and AI/ML insights.
  • Site Reliability Engineering (SRE Focus)

  • Define and implement SRE principles (SLIs/SLOs, error budgets, chaos testing, postmortems, etc.) across supported services.
  • Design and manage infrastructure automation, CI/CD pipelines, AI/ML solutions, runbooks, and self-healing systems.
  • Lead incident response coordination during major outages and drive post-incident analysis and systemic fixes.
  • Collaborate with DevOps, Cloud, and Security teams to enforce resiliency, observability, and reliability as core design principles.
  • Mentor junior SREs and CAMO engineers to grow technical and operational expertise.

Technical Skills:

  • Expertise in designing and implementing observability frameworks including logs, metrics, and traces across hybrid environments (on-premises, private cloud, public cloud).
  • Strong understanding of distributed systems, microservices architecture, and telemetry pipelines.
  • Proficiency in infrastructure automation and configuration management using tools like Terraform, Ansible, and scripting languages (Python, Shell, etc.).
  • Experience with CI/CD pipelines, incident response automation, and self-healing systems.
  • Familiarity with container orchestration platforms (e.g., Kubernetes) and virtualization technologies.

Functional Knowledge:

  • Experience in implementing cyber asset management and security observability principles.
  • Familiarity with AIOPS, ITSM, CAASM tools and configuration management databases.
  • Exposure to compliance and governance frameworks such as CIS, NIST for cyber resilience, observability and alerting.
  • Relevant certifications in observability, cloud platforms, SRE, or security domains.
Qualifications:
  • Bachelor’s or Master’s degree in computer science, Engineering, or related field.10-15 years of experience in IT Operations, SRE, DevOps, or Monitoring Engineering roles.
  • Strong expertise in modern observability platforms and telemetry pipelines.
  • Experience with hybrid environments including virtualization, container orchestration, and cloud platforms.
  • Proven track record in automation, telemetry governance, and infrastructure as code.
  • Excellent incident management, communication, and stakeholder engagement skills.

Interpersonal Skills

  • Communicates difficult concepts and negotiates with others to adopt a different point of view

Full time

Assignee / Regular