Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Microsoft Senior Site Reliability Engineer 
India, Telangana, Hyderabad 
901900740

10.12.2024

M365's COSMIC team designs, builds, and operates a global scale managed-runtime environment based on Azure Kubernetes Service for the benefit of Microsoft Substrate service and developers. COSMIC could be compared to a ‘Kubernetes PaaS’. Our charter builds and maintains solutions that enable substrate service teams onboarding to Cosmic Linux platform to focus on their own scenarios and business requirements rather than worrying about common infrastructure components like Deployment, Upgrades, Security, Observability, Debuggability etc.

Required Qualifications:

  • 6+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
  • Experience with or exposure to Agile and iterative development processes.

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Cloud and services experience, with Azure cloud experience.
  • Working knowledge on Kubernetes.
Responsibilities
  • Keep the platform components updated incorporating the dependencies from other applications/tech stacks and debug any issues arising out of such upgrades/updates.
  • Continuously improve our platform by identifying patterns in service alerts / incidents and building solutions for auto-remediation.
  • Build dashboard/alerts for faster identification of issues and keeping the system health in check.
  • Collaborate with cross-functional teams to define, design, and ship new features to keep the platform health stable.