Job Purpose
The Site Reliability Engineer will assist with day-to-day activities supporting ESRE services related to incidents. Build actionable alerts/automation for preventing incidents, detecting performance bottlenecks, and identifying maintenance activities.
Responsibilities
- Employ deep troubleshooting skills to improve the availability, performance, and security of IMT Services.
 - Collaborate with Product and Support teams to plan and deploy product releases.
 - Work with Engineering leadership to build shared services that meet the requirements and need of the platform and application teams.
 - Ensure services are designed with 24/7 availability and operational readiness and rigor
 - Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
 - Contribute to product development / engineering as needed to ensure Quality of Service of Highly Available services.
 - Resolution of product/service defects or design changes, infrastructure changes, or operational changes
 - Implement automated tests, automated deployments, and operational tools
 
Knowledge and Experience
- 3+ years of relevant experience in Production support services environment as SRE engineer
 - BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
 - Excellent troubleshooting skills, utilizing a systematic problem-solving approach
 - Experience with elastically scalable, fault tolerance and other cloud architecture patterns
 - Experience operating on AWS (both PaaS and IaaS offerings)
 - Experience in both Windows (2016 R2+) and Linux
 - Experience with Continuous Integration and Continuous Delivery concepts
 - Good to have experience in Containerization concepts like Docker
 


