

Share
Key job responsibilities
- Define and/or refine hardware requirements, participate in the development and delivery of operability-related features such as system health monitoring, diagnostics, repair, and other self-healing automation
- Develop or further existing application and system management tools and processes that reduce manual efforts and increase overall efficiency- Participate in the design and execution of production acceptance tests and new hardware evaluations
- Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed
- Participate in “on-call” rotations to resolve incidents occurring out-of-hours.
Diverse Experiences
Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.Why AWS
Work/Life BalanceMentorship and Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
- 3+ years of experience with Linux, using the command line and basic administration, and computer networking fundamentals
- 3+ Years of Operations experience working with CI/CD Pipelines and deployment systems like; Terraform, Github Actions, Jenkins, or others
- Able to troubleshoot at all levels, from network to operating systems to software applications
- 3+ Years working in Linux or other UNIX based Operating Systems
- Experience supporting cloud systems or other services. Proficient troubleshooting and anticipating problems that affect the performance, reliability, or availability of software systems
- Experience operating 24x7 high-availability, distributed software applications and performance tuning software applications and optimizing fleet utilization
- Understanding of network fundamentals (DNS, DHCP, TCP/IP, routing, load balancing, load shedding) and experience with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar)
- Experience scripting operating system tasks in Bash, Python, etc. and with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible, or similar)
These jobs might be a good fit