Finding the best job has never been easier
Share
Incident triage, Escalation and Resolution: Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery [Reduce MTTE – Mean Time to Engage], and focusing on immediate restoration [ Reduce MTTR – Mean Time to Restore] of large-scale enterprise systems.
Alert, Monitoring, Log analysis: Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools like Grafana, Prometheus, MMS, Service Now, JIRA, Dynatrace, Splunk etc [Reduce MTTD – Mean Time to Detect].
Enhance Alerting solutions: Design and implement JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight, Splunk, and xMatters.
Requires knowledge of: Monitoring and alerting tools; Monitoring metrics and key performance indicators (for example, availability, MTBF, MTTR); SLIs and SLOs (for example, request latency, availability, error rates, saturation); Distributed tracing; Alerting logic. To demonstrate awareness of the metrics used to monitor software or system performance.
Steps to perform correct analysis on the issues and engage correct teams for CPC, Dependent downstream services and Platform teams.
To handle Deployments. Streamline the deployments process and handle the responsibility as a single team. Understand and explore Post validations and back out steps to make app more resilient.
Coordinate with platform teams for non-app releases like VM upgrades, DB Maintenance, and other component environment related tasks.
Participate in rotating on-call duties and work across different time zone with a multi-national team
Responsible for timely root cause analysis [RCA] of production issues.
Develop reusable tooling and processes to drive and improve customer experience and lower operational costs.
Understand DevOps Industry best practices
Help teams to build highly Observable and Resilient systems
Collaborate with developers to capture requirements and understanding pain points
Build reusable tools, library, dashboards which can be used across DevOps/SRE teams
What you'll bring:
Bachelor's degree in Computer Science, Engineering or related discipline
3+ years of hands-on related to SRE, Operations & Development experience with Java Script, Java , Restful services, Git, Maven, Jenkins, DevOps , Containerization, Docker, Kubernetes, Azure, Google cloud, Kafka, Azure Cosmos, Azure SQL, Mega cache CI/CD ,Prometheus, Grafana, Splunk etc.
Automation and Self-healing: Demonstrate knowledge of scripting and software development for automation and self-healing of multi-cloud environments. Help enhance existing solutions by developing automation with Docker, Kubernetes and working with DevOps and Engineering partners.
Excellent end to end technical understanding of core infrastructure, cloud services, platforms, and micro-services.
Ability to effectively triage – be able to detect and determine symptom vs cause.
Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline).
Influence the design of system architecture and tactical solutions.
Familiar with log centric tooling. Produce time series data and reusable dashboards for use both during and post event.
Benefits: Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more.
For information about PTO, see
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.For information about benefits and eligibility, see
Sunnyvale, California US-04396:The annual salary range for this position is $117,000.00-$234,000.00 Bentonville, Arkansas US-09050:The annual salary range for this position is $90,000.00-$180,000.00 Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include: - Stock Minimum Qualifications...These jobs might be a good fit