Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Senior Site Reliability Engineer
United States, Washington
723209312

10.09.2024

Microsoft is looking for a Senior Site Reliability Engineer (SRE) to support and expand Viva Engage. Viva Engage (formerly Yammer) is the industry-defining social networkfor the enterprise

is responsible forkeeping the services reliable as we scale and modernize our tech stack. We need a SRE who knows how to manage the conflicting priorities of keeping things running today while making sure we have the architecture we need for the future.

technology, outsized individual impact - with the advantages of working for one of the most successful software companies in the world. We believe in mission-driven work and in this post-Covid world, our platform has become more indispensable than ever as it fosters connection and a sense of belonging among remote teams.

Required Qualifications:

6+ years technical experience in software engineering, network engineering, or systems administration
- OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Citizenship & Citizenship Verification:This position requires verification of U.S. citizenship due to citizenship- based legal restrictions. Specifically, this position supports United statesfederal,state, and/or local(or applicable country) United States government agency customers and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport.

Preferred Qualifications:

Experience applying SRE principles in a large production environment.
Proficiencyin cloud computing platforms (e.g., AWS, Azure, GCP) and related services (e.g., EC2, S3, VPC, IAM, Lambda).
Expertisein automation tools and frameworks (e.g., Terraform, Ansible, Chef, Puppet) and scripting languages (e.g., Python, Bash).
Deep understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) and incident response processes.
Problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
Effective communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Responsibilities

Participate in on-call rotations and incident responses throughout product development and operations cycles. On-call will require responding to support requests after normal business hours to include the weekends and/or holidays in a designated Microsoft office.
Monitor system performance and proactivelyidentifyand resolve issues to ensure high availability and performance.
Develop andmaintainautomation tools and processes for deployment, monitoring, and configuration management.
Apply troubleshooting skills, debugging tools, andexamineslogs, telemetry, and other methods to verify assumptions and customer impact. Proactively and reactively address findings with customer and/or service engineering efficiently via written and verbal communications.
Lead blameless postmortems for root cause and production resiliency.
Consult with developers to design services that scale in Azure.
Mentor team members and contribute to the overall growth and development of the SRE team.
Stay current with industry trends, emerging technologies, and best practices in site reliability engineering and cloud computing.

Embody our

These jobs might be a good fit

Microsoft Senior Site Reliability Engineer United States, Georgia, Atlanta

Apple Senior Site Reliability Engineer United States, Washington, Seattle

IBM

IBM Senior Site Reliability Engineer United States, Washington, Bellevue