Job Responsibilities:
- Lead Major Incidents or Problem Calls by delegating tasks, escalating when needed and documenting actions taken
- Develop, communicate, and present results to senior leadership both verbally and in writing on a proactive basis
- Provide customers with status updates and give them confidence we are managing the incident
- Lead post event analysis with detailed documentation, plans for prevention and action item follow-ups
- Assist engineers with troubleshooting incidents of all technical levels
- Manage day-to-day operations for incidents (analyze and track) while summarizing findings to leadership and holding peers accountable for delivery dates
- Manage escalations and create self-service opportunities for our clients to improve MTTR
- Work with peers and the team for technical and professional growth
- Work with our clients, customers, and stakeholders to improve MTTR, quality of work, and create lasting relationships
- Negotiate shift preference which is Tuesday – Saturday, but willing to negotiate for the right candidate.
Required Qualifications, Capabilities, and Skills:
- Formal training or certification on infrastructure engineering concepts and 5+ years applied experience.
- Minimum of 10 years in a senior technical leadership role leading engineering, SRE, or operations teams.
- Expertise in multiple Infrastructure technologies below:
- Data centers – Cisco (ACI) or VXLAN with Cisco and or Arista
- Server load balancing with F5 – local or global traffic management
- Firewalls -Fortinet
- Presenting metrics and accomplishments using data analytics
- Network management and tooling – SevOne, Splunk, Cisco Nexus Dashboard, and or DNA (Digital Network Architecture), SNMP (Simple Network Management Protocol)
- Able to read and explain TCP/IP traces.
- Ability to organize work for teams of employees, so there is clearly documented accountability.
- Strong understanding of infrastructure components and how they are tracked and managed in various systems.
- Facilitate postmortem meetings and document problem next steps.
- Experience with data analytics, reporting, and looking for patterns in incidents.
- Technological, organizational, and/or operational change management.
Preferred Qualifications, Capabilities, and Skills:
- Strong communication skills to lead major incidents with clients, engineers, and vendors.
- Proven ability to drive results with Agile methodologies and Jira (Align).
- SD WAN solutions or service provider level WAN experience.
- CI/CD pipelines, Python, & Ansible programming languages
- ITIL and operational governance
- Experience with AWS, Azure, or GCP operations.