Hands-on design, analysis, development and troubleshooting of highly distributed large-scale production systems and event-driven, cloud-based services
Primarily Linux Administration, managing a fleet of Linux and Windows VMs as part of the application solution
Infra as a code development – Terraform, shell and python
Ensuring the repeatability, traceability, and transparency of our infrastructure automation
Support on-call rotations for operational duties that have not been addressed with automation
Support healthy software development practices, including complying with the chosen software development methodology (Agile, or alternatives), building standards for code reviews, work packaging, etc.
Create and maintain monitoring technologies and processes that improve the visibility to our applications' performance and business metrics and keep operational workload in-check.
Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
Develop, collaborate, and monitor standard processes to promote the long-term health and sustainability of operational development tasks.
Participate in technical training events, game day scenarios, and professional conferences
YOU MUST HAVE
2+ Years of experience in system administration, application development, infrastructure development or related areas
2+ years of experience in Azure cloud administration and solution design.
2+ years of experience with programming in languages like Javascript, Python, PHP, Go, Java or Ruby
2+ years of in reading, understanding and writing code in the same
3+ years Mastery of infrastructure automation technologies (like Terraform, CodeDeploy, Puppet, Ansible, Chef)
2+ years expertise in container/container-fleet-orchestration technologies (like Kubernetes, AKS, EKS, Docker, Vagrant, etcd, zookeeper)
2+ years Cloud and container native Linux administration/build/management skills
WE VALUE
Versatility with troubleshooting diverse sets of hosting technologies strongly desired. These include web server platforms, application platforms, operating systems, network components, virtualization technologies, storage, and database platforms.
Expertise with cloud- continuous-deployment- based software development lifecycles (e.g. CI/CD)
Familiarity with site and infrastructure monitoring systems (like ELK, Datadog, AppDynamics, New Relic, Splunk, Sumologic, Grafana)
Strong problem solving, root cause analysis and systems engineering skills
Excellent presentation and communication skills
Ability to design and manage escalation response plans from monitoring, react, respond, remediate and retrospect in culturally aligned (proactive, customer focused, collaborative, data-driven) ways.
Demonstrated expertise building and managing highly scaled production infrastructure in the cloud (Azure required; GCP, AWS, OpenStack a plus)
Expertise with SDLC branching, SCM, and code deployment systems (Bitbucket, git/gitflow, Jenkins, CircleCI, TravisCI, etc.)
Additional Information
JOB ID: HRD236198
Category: Engineering
Location: 715 Peachtree Street, N.E.,Atlanta,Georgia,30308,United States
Exempt
Must be a US Person or able to obtain export Authorization.