Job Purpose
SRE headcount to assist with day-to-day activities supporting ESRE services related to incidents. Build actionable alerts/automation for preventing incidents, detecting performance bottlenecks, and identifying maintenance activities.
Responsibilities
- Employ deep troubleshooting skills to improve the availability, performance, and security of IMT Services.
- Collaborate with Product and Support teams to plan and deploy product releases.
- Work with Engineering leadership to build shared services that meet the requirements and need of the platform and application teams.
- Ensure services are designed with 24/7 availability and operational readiness and rigor
- Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
- Contribute to product development / engineering as needed to ensure Quality of Service of Highly Available services.
- Resolution of product/service defects or design changes, infrastructure changes, or operational changes
- Implement automated tests, automated deployments, and operational tools
Knowledge and Experience
- 3+ years of relevant experience in Production support services environment as SRE engineer
- BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
- Excellent troubleshooting skills, utilizing a systematic problem-solving approach
- Experience with elastically scalable, fault tolerance and other cloud architecture patterns
- Experience operating on AWS (both PaaS and IaaS offerings)
- Experience in both Windows (2016 R2+) and Linux
- Experience with Continuous Integration and Continuous Delivery concepts
- Good to have experience in Containerization concepts like Docker