Job responsibilities
Required qualifications, capabilities, and skills
- Formal training or certification on site reliability engineering concepts and 2+ years applied experience
- Has the ability to code in at least one programming language
- Familiar with site reliability concepts, principles, and practices
- Familiar with observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Experience maintaining AWS Cloud-base infrastructure AWS/LAMDA/S3/Route 53/Redshift/ECS/EC2/CloudWatch/Auto scaling
- Familiarity with containers or a common Server OS such as Linux and Windows
- Knowledge of software, applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, Incident and Change management Process, Experience/ knowledge of GOOGLE SRE Process )
- Knowledge of Tools and concepts like Control-M, Autosys, POSTMAN/Swagger, SERVICE NOW, SQL/ORACLE/My SQL / Python/ PowerShell/OOP
- Knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform, CICD/GITHUB/ Bitbucket
- Knowledge of common networking technologies like SSH, SMTP, SFTP, HTTPS
Preferred qualifications, capabilities, and skills
- Good to have Docker/ fluent
- ITIL Version 3.0 or Greater
- General Knowledge of financial industry