How will you make an impact?
- Analyze system reliability and performance to address and prevent issues.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation and evolve systems by making code and configuration changes that improve reliability and velocity.
- Participate in on-call rotation for service disruptions
- Identify and diagnose infrastructure issues in a live production environment
- Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
- Practice sustainable incident response, blameless postmortems, and root cause analysis.
- Other duties as the manager assigns
Have you got what it takes?
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
- 8+ years experience designing, analyzing and troubleshooting large-scale distributed systems
- Sustained track record of creating major improvements in large business-critical systems around stability, security, performance, and scalability.
- Experience in one or more of the following: Java, Python, C#, or JavaScript.
- Excellent communication, analytical, and troubleshooting skills
- Ability to work independently, as well as part of a team, on multiple competing projects
- Ability to debug, profile, and optimize code and automate routine tasks.
- Can effectively facilitate cross-team work and are influential far beyond his or her individual group.
- Strong sense of ownership.
- Life-long learner able to quickly grow new frameworks, architectures, and languages
You will have an advantage if you also have:
- Experience running production systems on AWS
- A deep understanding of REST and network programming
- Experience scaling high-traffic SaaS applications
- Deep knowledge of Kubernetes
- Experience with Application Monitoring Metrics (AWS X-Ray, Cloudwatch, Datadog, etc)