Finding the best job has never been easier

Salesforce Senior Site Reliability Engineer
United States, California, San Francisco
631271955

08.05.2025

Job Category

Software Engineering

Job Details

In this role,

You are responsible for the high availability for the microservices supporting service mesh and ingress gateway on a large fleet of 1000+ clusters running various technologies like Kubernetes, Docker, network load balancers, service mesh, Istio and so on. You’ll gain valuable experience troubleshooting real production issues which will expand your knowledge of the architecture.
You will contribute code to drive availability improvement for services.
You will help improve the platform's visibility by implementing necessary monitoring and metrics with Prometheus, Grafana and other monitoring frameworks.
You will drive automation efforts inPython/Golang/Puppet/Jenkinsto eliminate manual work with day to day operations.
You will drive improvements to CI/CD pipelines built on Terraform, Spinnaker and Argo
You’ll implement AIOps automation, monitoring and self-healing mechanisms to proactively fix issues to reduce MTTR and Operational Toil.
You will get a chance to improve your communication and collaboration skills working with various other Infrastructure teams across Salesforce.
You will interact with a highly innovative and creative team of developers and architects.
You will evaluate new technologies to solve problems as needed.

Job Requirements:

3+ years of experience in SRE/Devops/Systems Engineering roles
Experience operating large scale cluster management systems (e.g. Kubernetes) of a mission critical service
Strong working experience with Kubernetes, Docker, Container Orchestration, Service Mesh, Ingress Gateway
Good knowledge with network technologies, such as TCP/IP, DNS, TLS termination, HTTP proxies, Load Balancers, etc.
Excellent troubleshooting skills with the ability to learn new technologies in complex distributed systems
Strong Experience in Observability tools like Prometheus, Grafana, Splunk, ElasticSearch etc.
Strong working experience with Linux Systems Administration. Good knowledge of Linux internals
Good experience inscripting/programminglanguages: Python, GoLang etc .
Experience with AWS, Terraform, Spinnaker, ArgoCD
Ability to manage multiple projects simultaneously, meet deadlines and adapt to shifting priorities
Excellent problem-solving, analytical and communication skills, with a strong ability to work effectively in a team environment

If you require assistance due to a disability applying for open positions please submit a request via this.

Posting Statement

These jobs might be a good fit

Red hat Senior Site Reliability Engineer United States, North Carolina, Raleigh

Get to the top of the "yes list" with a standout CV!

CREATE CV