4+ years of experience( 4-6 Years) in Azure cloud engineering, Azure network, Azure compute, Azure Storage services and Azure DevOps CI/CD .
Advanced skills in PowerShell scripting, Azure PowerShell and YAML. Expertise in Log analytics and data analysis with Kusto.
Proficiency in AVD infrastructure, virtual desktop, and virtual app management
Strong knowledge of identity and security, particularly with Azure Active Directory, Active Directory, Federation Services, and Replication.
Familiarity with Agile methodologies and Site Reliability Engineering (SRE) principles
Experience with automated deployments and building pipelines.
Ability to develop tools, automation scripts, and enhancements for services/products to manage software in production.
Solid understanding of monitoring, alerting, and observability philosophies and best practices. Troubleshooting with sys-internal tools like Windbg and ProcMon etc.
Strong collaboration skills with the ability to influence cross team cooperation.
Excellent verbal and written communication skills.
Responsibilities
Provide technical engineering for a cross-functional, high-visibility operations team supporting the Virtualization platform for Microsoft’s corporate/partner’s network.
Identify and drive the implementation of automation opportunities to improve service health, manageability, reliability, and telemetry.
Own, triage, investigate, and resolve service issues, emphasizing broad communication, learning, and teaching throughout the process.
Read, write, configure, design, and script end-to-end service telemetry, alerting, and self-healing capabilities for platforms.
Author functional and technical documentation, communicating at a deep technical level with product engineering, project management, and operations teams to optimize products, improve infrastructure, and evolve services.
Stay current on new technologies, methods, and procedures, including Test-Driven Development, Continuous Integration, and Continuous Deployment practices.
Participates in on-call rotations to resolve live site incidents, minimize customer impact, and document solutions and insights that inform ongoing improvements to infrastructure, code, tools, and/or processes that prevent the recurrence of similar issues.