מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
As a Principal Site Reliability Engineer you will:
Partner with leaders in product, engineering, business, and operations to identify and address risks, vulnerabilities, and limits in our end-to-end systems
Technically lead and mentor the SRE team with a focus towards improving the availability, reliability, and observability of Nike’s digital platforms while reducing the burden of toil using tooling, automation, or process change
Use your technical expertise to identify training and up-skilling opportunities, monitor industry trends, and define new reliability patterns for the broader organization
Influence systems design decisions and patterns across business-value engineering teams, infrastructure teams, and architecture
Make the life of on-call engineers safe by delivering deep observability, actionable alerts and runbooks, and iterative Service Level Objectives that truly align with consumer experience
Strategically define a multi-year roadmap in collaboration with peer engineering teams, geo partners, and product management teams
Identify, curate, implement, and adapt key metrics for end-to-end system health and performance
WHO YOU WILL WORK WITH
The Principal Site Reliability Engineer will work alongside a talented team of Site Reliability Engineers focused on delivering reliable and observable software used by millions of athletes* around the world. You will be a part of the Resilience Engineering organization which includes Site Reliability Engineering, Quality & Release Engineering, Accessibility Engineering, and High Availability/Disaster Recovery. This role reports to the Senior Director, Reliability Engineering
In order to deliver Reliability Engineering goals, you will partner and influence at multiple levels of not only Global Technology (Director up to CTO), but across business units and geographical locations.
WHAT YOU BRING
12+ years combined work experience as a software engineer, team lead/principal engineer, or manager leading distributed teams
Deep understanding of how to deliver large scale software with modern reliability and resilience concepts (multi-region, multi-cloud, active/active, canary deploys, synthetic testing, containers, etc.)
Hands-on experience architecting, deploying, and operating software using modern cloud-based distributed system techniques, micro-service architecture patterns, and DevOps processes
Expertise in data structures, algorithms, and complexity analysis. Experience with AI Ops, AI/ML a plus
Ability to build strong relationships with partners/stakeholders and use technical credibility and influence to drive positive outcomes
Demonstrated experience implementing Service Level Objectives, error budgets, and the associated cultural change
A history of finding and reducing toil within complex systems and processes
Experience with modern observability tooling, processes, and mindset – Splunk, SignalFx, New Relic, CatchPoint, etc. Bonus points for experience with Open Source observability stacks
A passion for learning, teaching, and mentoring
A strong desire for building and motivating teams focused on data-driven continuous improvement
משרות נוספות שיכולות לעניין אותך