Share
Key job responsibilities
Production Support & Incident Management:
• Ensure system reliability, manage incidents, troubleshoot issues, and resolve them swiftly to minimize downtime and impact.
• System Monitoring & Observability:
• Implement comprehensive monitoring systems, track performance metrics, and address anomalies proactively.
• Infrastructure Automation & Optimization:
• Automate infrastructure processes, manage continuous deployment, and ensure efficient resource allocation to support large-scale systems.
• Security Compliance:
• Work with security teams to ensure systems meet compliance standards, addressing vulnerabilities promptly.
• Chaos Engineering & Resiliency:
• Implement chaos engineering practices in non-production environments to test system resilience and enhance system robustness.
• Tool Development & Data Insights:
• Develop internal tools for automation, data analysis, reporting, and publishing insights that drive business decisions.
• Collaboration & Communication:
• Work closely with cross-functional teams (Development, QA, Operations) to ensure smooth production workflows and deliver software improvements.
• Customer Insights & Engagement:
• Create systems to aggregate, analyze, and publish customer feedback, providing actionable insights to engineering and product teams.
• Team Roadmap & Strategy:
• Contribute to the strategic roadmap by identifying, prioritizing, and executing projects that deliver high impact for the team and stakeholders.
• Managing Ambiguity:
• Understand complex technologies and diverse customer use cases to generate solutions that enhance customer experiences.
- Experience in automating, deploying, and supporting large-scale infrastructure
- Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
- Experience with Linux/Unix
- Experience with CI/CD pipelines build processes
- Experience automating and configuring systems using the Desired State Configuration (DSC) in a large enterprise environment
- Knowledge of systems engineering fundamentals (networking, storage, operating systems)
- Experience with distributed systems at scale
- 3+ years of administrative experience in networking, storage systems, operating systems and hands-on systems engineering experience
These jobs might be a good fit