Job responsibilities
- Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
- Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
- Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
- Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
- Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
- Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
- Supports the adoption of site reliability engineering best practices within your team
Required qualifications, capabilities, and skills
- Formal training, or certification on software engineering concepts and 3+ years applied experience
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
- Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
- Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
- Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
- Familiarity with troubleshooting common networking technologies and issues
- Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
- Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
- Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
- Ability to initiate and implement ideas to solve business problems
Preferred qualifications, capabilities, and skills
- Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm
- Adept in the development of automated tools, systems, and services in multiple technology domains
- Working knowledge of infrastructure components. (E.g. routers, load balancers , cloud products , container systems , compute, storage and networks)
- Excellent debugging and trouble shooting skills
- Proficiency in service-level changes to a system and troubleshooting components
- Monitoring tools and log analysis tools to manage operations
- Managing and/or influencing infrastructure services to ensure application service uptime and user experience
- Performance Fine Tuning and should be able to understand and optimize the existing logic.
- Best practices in infrastructure and application logging, monitoring, intelligent alerting, and automated self-healing