Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Senior Site Reliability Engineer 
India, Karnataka, Bengaluru 
900478595

17.07.2024

As an SRE, you will be part of a team that ensures the buildout ofand data planeservices andtools for poweringcloud scalenetworking. You will work with a breadth of partners across Microsoft, including developers,datacenter technicians, supply chain managers, and business leaders, to rapidly debug and resolve issues that delay the buildout process. You will also drive continuous improvements with these teams to prevent repeats and address common classes of issues across the Azure software stack through design reviews and problem management.

Required/Minimum Qualifications:

  • Bachelor's Degree in Computer Science, Information Technology, or related field and 8+ year(s) technical experience in software engineering, network engineering, service engineering, or systems engineering or equivalent experience.
  • Excellent communication skills and team player with ability to build solid relationships at all levels, across teams, geographies, and business functions.
  • Experience in solving broad business problems and analytical skills.
  • Excellent judgment, decision-making skills, and the ability to work under continual deadline pressure when the situation is ambiguous.
  • Strong negotiation skills and conflict management skills.

Additional or Preferred Qualifications:

  • 8+ years of Automation and/or Development experience.
  • 5+ years configuring, managing, and/or operating cloud services and networking technologies.
  • Excellent customer service, organizational, prioritization, multitasking, communication, and leadership skills.
  • #azurecorejobs#AzNetIDC#AzureNetworkingIDC


Responsibilities
  • evelop foundational understanding of service and system design, technology interactions, infrastructure functions, and dependencies at scale. Contribute to identifying optimal technology configurations and assist in implementing reliable, scalable, and high-performance solution to build and operate the service.
  • Monitor and analyze telemetry data to identify failures affecting system availability, reliability, performance, and efficiency. Use insights to drive proactive adjustments.
  • Lead and coordinate troubleshooting efforts with peers and service teams to resolve incidents and problems promptly. Take ownership of critical incidents, ensuring rapid resolution and minimizing impact, while serving as an escalation point for specific technical areas. Lead post-mortem analyses to identify root causes, propose preventive measures, and drive continuous improvement.
  • Lead reliability projects to improve system performance, such as optimizing network, implementing auto-scaling, and enhancing monitoring. Proactively adapt to new trends and technologies to improve service availability, reliability, and performance, while maintaining consistency in monitoring and operations at scale.
  • Proactively analyze capacity needs and scale systems to accommodate growth beyond routine assessments. Contribute to system stability and scalability by aligning infrastructure with strategic business goals.
  • Actively seek opportunities to automate repetitive tasks. Whether it’s provisioning resources, managing configurations, or handling deployments, they drive efficiency through automation. Develop scripts, tools, and processes to prevent issues, boost user productivity, and automate tasks for long-term service enhancements.
  • Guide junior team members, ensuring effective communication and timely resolution.
  • Collaborate with engineering, program management, and operations teams to optimize network and evolve services.