Job responsibilities
- Implement systems that are highly available, scalable, and self-healing
- Design, manage, and maintain tools to automate operational processes
- Automate security controls, governance processes, and compliance validation
- Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt
- Works toward becoming an expert on the applications and platforms in your remit while understanding their interdependencies and limitations
- Evolves and debug critical components of applications and platforms
- Provides comprehensive and ongoing guidance, tools, and solutions to support the firms’ growth
- Define and deploy monitoring, metrics, and logging systems on AWS and implement/Enhance infrastructure automation via IaaC using Terraform
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 3+ years applied experience
- Proficient in programming languages such as Python, Java
- Proficient in infrastructure as a code tool - Terraform
- Proficient with DevOps practices and CI/CD pipelines
- Experience with AWS monitoring and logging services , CloudWatch etc.
- Practical cloud native experience is required
- Excellent problem-solving and troubleshooting skills
- Ability to tackle design and functionality problems independently with little to no oversight
Preferred qualifications, capabilities, and skills
- Experience with deploying/maintaining Machine Learning models in a cloud environment is preferred