The point where experts and best companies meet
Share
· Develop, maintain, and operate software that automates the deployment, scaling, and operations of infrastructure and applications.
· Ensure the reliability, availability, and performance of services.
· Respond to incidents, mitigate and analyze them, and participate in a 24/7 on-call rotation.
· Create sustainable systems and services through automation and uplifts.
· Balance feature development speed and reliability with well-defined service level objectives.
· Develop and maintain solutions for operational administration, system/data backup, disaster recovery, and security/performance monitoring.
· Continuously improve system infrastructure and processes to eliminate manual intervention.
· Work closely with development teams to ensure that platforms are designed with "operability" in mind.
· Participate in system design consulting, platform management, and capacity planning.
· Create and maintain documentation on system design and operation.
· Conduct post-incident reviews and drive root cause analysis and resolution.
Minimum Qualifications
Bachelor's degree in computer science, or a related field, and 3-5+ years of relevant work experience
· Familiarity with Python
· Willingness to provide on-call support
· Accepting of limited travel for site turn-up or operational work
· Understanding of networking concepts and terminologies
· Ability to work in a global team and speak English
These jobs might be a good fit