About this role:
Site Reliability Engineer
In this role, you will:
- Work alongside developers as well as the business stakeholders and strive to automate the acceptance criteria
- Maintain high reliability and availability for software applications
- Automate the mundane tasks and avoid human errors
- Define SLI (Service level indicator) & SLO (service level objective) by collaborating with Product owners
- Lead incident response efforts and post-mortem analysis to prevent future occurrences.
- Write incident root cause analysis, find out the core reason behind the issue and prevent it from happening again
- Document procedures, best practices and troubleshooting FAQs.
- Debug the system and fixing the production related issues.
- Escalate / follow-up on permanent fix for development related issues.
- Handle complex operational tasks and recommends process and technology changes.
- Provide global support including troubleshooting production related issues and performing checkouts.
- Lead complex initiatives to develop infrastructure to provide solutions for business applications
- Participate in various projects intended to continually improve or upgrade the infrastructure
- Evaluate internal and external software solutions which could be leveraged to meet target state architecture goals
- Review and analyze high impact outages to ensure the proper processes and procedures are in place to avoid problems in the future
- Design, build, deploy and maintain infrastructure solutions through collaborative efforts with the team and third party vendors
- Design, code, test, debug and document programs using Agile development practices
- Make decisions in technical designs, implementation plans and identify project risks and resource requirements
- Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success
- Recommend courses of action to maintain cost effectiveness and achieve results
- Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals
- Interact with customer and vendor
Required Qualifications:
- 5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
- 5+ years of Site Reliability Engineering experience or related experience
- 5+ years of global support including advanced troubleshooting skills to resolve complex production issues
- 5+ years of resolving complex issues utilizing fundamental understanding of system components
- 5+ years of experience in tracing system interactions through various tiers.
Desired Qualifications:
- Strong understanding of the REST APIs
- Strong understanding in working of the troubleshooting tools such as Splunk, AppDynamics, and Elastic APM
- Strong experience in API Management tools such as Apigee
- Working knowledge of databases such as MongoDB, Oracle
- Strong foundation in reliability engineering principles and distributed systems behavior
- Experience defining and implementing SLOs/SLIs and using them to drive system improvements
- Demonstrated ability to design and implement observability solutions that provide actionable insights while minimizing alert fatigue
- Understand modern observability practices and experience implementing and maintaining monitoring solutions such as Prometheus/Grafana, Splunk, NewRelic, CloudWatch, and ELK in the cloud
- Strong incident response skills with experience leading incident retrospectives and driving improvements
- Excellent problem-solving abilities and experience debugging distributed systems
- Track record of successfully automating operations and reducing toil
- Strong communication skills with ability to explain complex technical concepts to audiences
- Ability to work both independently and collaboratively (in groups) in an energetic environment.
Job Expectations:
- Ability to work weekends
- Participate in on-call rotations to ensure 24/7 system availability and support.
Pay Range
$119,000.00 - $224,000.00
Wells Fargo provides eligible employees with a comprehensive set of benefits, many of which are listed below. Visit for an overview of the following benefit plans and programs offered to employees.
- Health benefits
- 401(k) Plan
- Paid time off
- Disability benefits
- Life insurance, critical illness insurance, and accident insurance
- Parental leave
- Critical caregiving leave
- Discounts and savings
- Commuter benefits
- Tuition reimbursement
- Scholarships for dependent children
- Adoption reimbursement
17 Jul 2025
Wells Fargo Recruitment and Hiring Requirements:
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.