You will lead the charge in resolving network incidents, ensuring that our customers experience minimal disruption and maintain confidence in our services. You will be adept at managing high-stakes situations, leading incident resolution calls with precision and urgency, and communicating effectively with senior leadership. Your commitment to follow-through will guarantee that incidents are thoroughly addressed and that preventative measures are implemented to avoid recurrence.
Job responsibilities
- Manages technical areas within infrastructure engineering, collaborates with others across different technical areas, and anticipates the needs of multiple disciplines within the function
- Manages and implements corporate and divisional operational plans and provides broad direction to teams, team leads, and supervisors
- Direct and manage major incidents or problem calls by delegating tasks, escalating when necessary, and documenting actions taken. Ensure timely resolution and minimal business impact.
- Lead post-event analysis with detailed documentation, develop plans for prevention, and ensure action item follow-ups are completed.
- Manages escalations and create self-service opportunities for clients to improve Mean Time to Repair (MTTR).
- Manages key senior stakeholders and ensures teams deliver in accordance with compliance standards, risk and security, service level agreements, and business requirements
- Drives multiple complex projects, processes, and initiatives
- Provide high-level technical guidance and support to engineers during incident troubleshooting, ensuring best practices are followed.
- Mentors and coaches junior engineers and technologists
- Champions the firm’s culture of diversity, equity, inclusion, respect for team members and prioritizes diverse representation
Required qualifications, capabilities, and skills
- Formal training or certification in infrastructure engineering concepts with 10+ years of applied experience. In addition, 5+ years of experience leading technologists to manage, anticipate and solve complex technical items within your domain of expertise.
- Experience managing cross-functional teams of infrastructure engineers
- Experience hiring, developing, and recognizing talent
- In-depth knowledge and expertise in multiple infrastructure technologies, including: Data centers – Cisco (ACI) or VXLAN with Cisco and/or Arista, Server load balancing with F5 – local or global traffic management, Firewalls – Fortinet, Presenting metrics and accomplishments using data analytics, Network management and tooling – SevOne, Splunk, Cisco Nexus Dashboard, and/or DNA (Digital Network Architecture), SNMP (Simple Network Management Protocol)
- Ability to read and explain TCP/IP traces.
- Strong understanding of infrastructure components and their management in various systems.
- Experience in facilitating postmortem meetings and documenting problem next steps.
- Proficiency in data analytics, reporting, and identifying patterns in incidents.
Preferred qualifications, capabilities, and skills
- Strong communication skills to lead major incidents with clients, engineers, and vendors.
- Proven ability to drive results using Agile methodologies and Jira (Align).
- Knowledge of CI/CD pipelines, Python, and Ansible programming languages.
- Familiarity with ITIL and operational governance.
- Experience with AWS, Azure, or GCP operations.