At least 10+ years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role
3+ years of experience leading and managing high performance SRE teams
Proven track record in leading sophisticated SRE projects, enterprise services at a large scale
Strong analytical, troubleshooting and problem solving skills
Good knowledge in at least one object oriented programming language (preferably Java , Python)
Unix Performance Monitoring & Tuning
Good understanding of Database concepts, PL/SQL and NoSql Technologies
Hands on experience with monitoring and data analysis tools (e.g., Prometheus, Splunk, Grafana, Cloudwatch)
Building and operating container orchestrating systems like Kubernetes or EKS
Deep understanding of security concepts and protocols - authentication, authorization, signing, encryption, SSL/TLS, SSH/SFTP, PKI, X509 certificates and PGP
Good fundamentals on Release Management & continuous Integration
Familiarity with modern web services architectures, cloud platforms such as AWS, GCP, Azure and distributed storage systems (ScaleIO, Amazon S3)
Ability to communicate with large cross-functional teams about various engineering topics such as system architecture, detailed design, APIs, project schedules etc.
Ability to make right trade-off choices when dealing with functional complexity, conflicting priorities and aggressive schedules
Represent the team and remove hurdles to enable each team member to operate at the highest level of efficiency and productivity
Ability to hire, mentor and manage the performance of a large team
Ability to connect with senior executives and business stakeholders
A learning attitude to continuously improve self, team and the organisation
Ability to work under pressure and manage difficult situations in a fast-paced work environment
Bachelor or Masters or equivalent experience in Computer Science or other related field
Preferred Qualifications
Java and JVM technologies runtime configurations and troubleshooting is a plus
Good fundamentals on data modelling and machine learning algorithms
Strong knowledge on securing applications, thorough understanding of OWASP top 10 risks and solutions.