Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Salesforce Senior Site Reliability Engineer 
Singapore, Singapore 
22226792

Yesterday

Job Category

Software Engineering

Job Details

Our software development focuses on enabling service owners to operate their services safely at scale, whether through paved path integrations onto observability frameworks, optimizing existing systems, designing infrastructure or eliminating work through AI/ML. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Salesforce, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. Experience with AI/ML systems, autonomous agents, or observability for intelligent platforms is a strong plus.

Required Skills

  • 5+ years of experience in Python, Go, or Java for automation, tooling, and integration.

  • Hands-on experience designing, building and operating large scale distributed systems, identifying shortcomings and optimization opportunities

  • Demonstrated experience in developing and deploying production-grade software applications or services.

  • Engineering Resiliency and Reliability: Design and develop systems,tools, and platforms that strengthen the resiliency and reliability of distributed services

  • Strong experience with AWS or GCP and services like EC2, VPC, IAM, S3, EKS.

  • Expertise in Kubernetes and modern container orchestration.

  • Deep understanding of SRE principles: SLIs/SLOs, availability, resiliency, and incident metrics (TTD, TTR).

  • Experience with AI/ML platforms, agents, or intelligent observability systems.

  • Familiarity with observability tooling: Grafana, OpenTelemetry, Zipkin/Jaeger, and TSDBs.

  • Hands-on with CI/CD pipelines and Git-based workflows.

  • Experience with IaC and config management tools: Terraform, Helm, Ansible, or Puppet.

  • Strong Linux systems knowledge and troubleshooting skills.

  • Data-driven mindset for identifying systemic issues and improving service reliability.

Responsibilities

  • Design, build, and maintain scalable backend systems and cloud-based services.

  • Write clean, testable, and efficient code following engineering best practices.

  • Develop automation and tooling to reduce manual effort and improve system reliability.

  • Enhance observability through monitoring, logging, and distributed tracing.

  • Support integration of AI-driven automation and observability platforms.

  • Work closely with product and infrastructure teams to ship features and improvements iteratively in Agile teams.

  • Define and implement SLIs/SLOs with engineering teams, driving reliability into system architecture.

  • Build automation and self-healing capabilities to reduce manual operations.

  • Operate and scale monitoring, alerting, and tracing systems for proactive issue detection.

  • Lead post incident analysis, conduct postmortems, and ensure effective root cause resolution.

  • Improve CI/CD practices to accelerate safe, frequent deployments.

  • Use data to uncover trends, inform prioritization, and drive platform improvements.

  • Collaborate on integrating AI-driven automation and observability to enhance reliability.

  • Support and scale multi-cloud, multi-region services.

Desired Skills

  • Knowledge of microservices, service mesh, or zero-trust infrastructure.

  • Experience operating in global, multi-tenant, or compliance-sensitive environments.

  • Strong written and verbal communication, with emphasis on documentation and knowledge sharing.

Unleash Your Potential

When you join Salesforce, you’ll be limitless in all areas of your life. Our benefits and resources support you to find balance and

be your best
, and our AI agents accelerate your impact so you can

If you require assistance due to a disability applying for open positions please submit a request via this.

Posting Statement