Job responsibilities
- Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
- Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
- Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
- Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
- Troubleshoot major incidents, facilitate blameless post-mortems and ensure non-recurrence of incidents through action-oriented problem resolution of ServiceNow platform and application issues.
- Design, code, test, and deliver software to automate manual operational tasks, ultimately to improve availability and stability of ServiceNow services.
- Analyze Database and system performance issues and provide solutions to improve Service availability.
- Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications.
- Implements infrastructure, configuration, and network as code for the applications and platforms in your remit.
- Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers.
- Supports the adoption of site reliability engineering best practices within your team.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 3+ years applied experience
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
- Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
- Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Experience supporting ServiceNow Platform or other enterprise products and services for internal or external clients.
- Excellent debugging and trouble shooting skills of ServiceNow services or other highly complex systems or applications.
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform.
- Demonstrates strong experience with relational databases (e.g. MySQL, Oracle).
- Experience diagnosing performance degradation (e.g. explain plans, database tuning).
- Experience in one (or more) scripting languages: JavaScript, Python, Perl, Unix Shell, Windows Shell).
Preferred qualifications, capabilities, and skills
- Solve unique and first-order problems in areas such as compute services including Containers & Serverless as well as many other AWS Services.
- A strong understanding of AWS Network Architecture and networking more generally.