Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

JFrog Senior Site Reliability Engineer 
India, Karnataka 
634026447

30.04.2024

In this role you will be part of our Site Reliability engineering group implementing & operating robust, scalable and highly available Cloud native systems/services ensuring JFrog product infrastructure and service reliability, performance and efficiencies.Additionally you will provide enterprise capability tools and environment automation tools to drive our business.
Should be able to extend to off-business hours & weekend on-call as and when required.


As a Sr SRE in JFrog you will…
  • Support all aspects of JFrog Cloud Platform & Product operations on a day-to-day basis and maintaining continuous availability, reliability, durability, scale and up time
  • Work with Product & engineering teams to promote best practices for cloud reliability & fault tolerance enablements.
  • Defining reliability governance & operational transformation strategy, roadmap and enforcements
  • Define and build innovative solution methodologies and assets around infrastructure, cloud migration, lifecycle and deployment operations at scale.
  • Chaos engineering advocacy and adoption, mentoring to build tools and strategies for problem prevention, detection and fault mitigations. Planning & co-ordinate GameDays runs.
  • Participate in NPI applicative & platform services launch readiness, platform management and capacity planning.
  • Create sustainable systems and services through automation and uplift.
  • Balance feature development speed and reliability with well-defined Error budget implementations.
  • Develop, test, and deploy automated solutions and automated decision analytics to replace manual processes
  • Gather and analyze metrics from systems and applications to assist in performance tuning and fault finding.
To be a Sr SRE in JFrog you need…
  • Overall relevant experience of minimum 5+ years,
  • Proven experience as an SRE engineer or a similar role.
  • Strong understanding of system design principles, distributed systems, and cloud infrastructure.
  • Technology experience, including IaaS/PaaS/Serverless (Azure, GCP and/or Amazon Web Services) based on K8S, infrastructure automation/orchestration technologies, server, storage, high availability architecture.
  • Foundational understanding of Application Servers, Web Servers, WAF, networks, storage and databases.
  • Experience with infrastructure automation tools (For example: Terraform) and containerization technologies (For example: Docker, Kubernetes)
  • Proficient in programming and scripting languages (For example: Python/Go/Bash, etc.)
  • Deep knowledge of monitoring and observability tools (H Prometheus, Grafana, ELK stack/Coralogix, New Relic, etc.)
  • Hands-on Chaos engineering tools(For example:Gremlin/Litmus/Chaos ToolKit,etc..)
  • Hands-on experience with alerting tools (For example:PagerDuty/Opsgenie, etc.)
  • GitOps experience with SCM tools (For example: Git, Bitbucket, etc.)
  • Experience with orchestration tools (ex. Jenkins/Spinnaker/ArgoCD etc.)
  • Excellent understanding of Scalability/HA processes and techniques in Cloud & K8s.
  • Familiarity with security best practices and experience in designing secure systems.
  • Strong intellectual curiosity and drive for continuous improvement, able to take initiative and learn on the fly. Ability to work independently under minimal supervision
  • Strong communication and interpersonal skills to collaborate effectively with diverse teams.
  • Excellent problem-solving and troubleshooting skills.Highly team oriented & practices collaboration as a key to success
  • Experience in working in mission-critical environments & work well under pressure within a technically challenging environment
  • Ability to mentor peers & new technical hires