Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Salesforce Senior Site Reliability Engineer 
United States, California, San Francisco 
631271955

Yesterday

Job Category

Software Engineering

Job Details

In this role,

  • You are responsible for the high availability for the microservices supporting service mesh and ingress gateway on a large fleet of 1000+ clusters running various technologies like Kubernetes, Docker, network load balancers, service mesh, Istio and so on. You’ll gain valuable experience troubleshooting real production issues which will expand your knowledge of the architecture.
  • You will contribute code to drive availability improvement for services.
  • You will help improve the platform's visibility by implementing necessary monitoring and metrics with Prometheus, Grafana and other monitoring frameworks.
  • You will drive automation efforts inPython/Golang/Puppet/Jenkinsto eliminate manual work with day to day operations.
  • You will drive improvements to CI/CD pipelines built on Terraform, Spinnaker and Argo
  • You’ll implement AIOps automation, monitoring and self-healing mechanisms to proactively fix issues to reduce MTTR and Operational Toil.
  • You will get a chance to improve your communication and collaboration skills working with various other Infrastructure teams across Salesforce.
  • You will interact with a highly innovative and creative team of developers and architects.
  • You will evaluate new technologies to solve problems as needed.


Job Requirements:

  • 3+ years of experience in SRE/Devops/Systems Engineering roles
  • Experience operating large scale cluster management systems (e.g. Kubernetes) of a mission critical service
  • Strong working experience with Kubernetes, Docker, Container Orchestration, Service Mesh, Ingress Gateway
  • Good knowledge with network technologies, such as TCP/IP, DNS, TLS termination, HTTP proxies, Load Balancers, etc.
  • Excellent troubleshooting skills with the ability to learn new technologies in complex distributed systems
  • Strong Experience in Observability tools like Prometheus, Grafana, Splunk, ElasticSearch etc.
  • Strong working experience with Linux Systems Administration. Good knowledge of Linux internals
  • Good experience inscripting/programminglanguages: Python, GoLang etc .
  • Experience with AWS, Terraform, Spinnaker, ArgoCD
  • Ability to manage multiple projects simultaneously, meet deadlines and adapt to shifting priorities
  • Excellent problem-solving, analytical and communication skills, with a strong ability to work effectively in a team environment

If you require assistance due to a disability applying for open positions please submit a request via this.

Posting Statement