המקום בו המומחים והחברות הטובות ביותר נפגשים

Honeywell Site Reliability Engineer
United States
627770977

03.07.2024

שיתוף

JOB DESCRIPTION

Key Responsibilities

Hands-on design, analysis, development and troubleshooting of highly distributed large-scale production systems and event-driven, cloud-based services
Primarily Linux Administration, managing a fleet of Linux and Windows VMs as part of the application solution
Infra as a code development – Terraform, shell and python
Ensuring the repeatability, traceability, and transparency of our infrastructure automation
Support on-call rotations for operational duties that have not been addressed with automation
Support healthy software development practices, including complying with the chosen software development methodology (Agile, or alternatives), building standards for code reviews, work packaging, etc.
Create and maintain monitoring technologies and processes that improve the visibility to our applications' performance and business metrics and keep operational workload in-check.
Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
Develop, collaborate, and monitor standard processes to promote the long-term health and sustainability of operational development tasks.
Participate in technical training events, game day scenarios, and professional conferences

YOU MUST HAVE

2+ Years of experience in system administration, application development, infrastructure development or related areas

2+ years of experience with programming in languages like Javascript, Python, PHP, Go, Java or Ruby
2+ years of in reading, understanding and writing code in the same
3+ years Mastery of infrastructure automation technologies (like Terraform, CodeDeploy, Puppet, Ansible, Chef)
2+ years expertise in container/container-fleet-orchestration technologies (like Kubernetes, AKS, EKS, Docker, Vagrant, etcd, zookeeper)
2+ years Cloud and container native Linux administration/build/management skills

WE VALUE

Versatility with troubleshooting diverse sets of hosting technologies strongly desired. These include web server platforms, application platforms, operating systems, network components, virtualization technologies, storage, and database platforms.
Expertise with cloud- continuous-deployment- based software development lifecycles (e.g. CI/CD)
Cloud database operations and deployment experience (RDS MySQL/Postgres/Aurora), Caching operations & deployment experience (memcache, Redis)
Expertise with Lean/Agile deployment processes (Blue/Green, ZDT, Canary, load balancers/DNS strategies A/B test, feature flagging methodologies)
Familiarity with site and infrastructure monitoring systems (like ELK, Datadog, AppDynamics, New Relic, Splunk, Sumologic, Grafana)
Strong problem solving, root cause analysis and systems engineering skills
Excellent presentation and communication skills
Ability to design and manage escalation response plans from monitoring, react, respond, remediate and retrospect in culturally aligned (proactive, customer focused, collaborative, data-driven) ways.
Demonstrated expertise building and managing highly scaled production infrastructure in the cloud (Azure required; GCP, AWS, OpenStack a plus)
Expertise with SDLC branching, SCM, and code deployment systems (Bitbucket, git/gitflow, Jenkins, CircleCI, TravisCI, etc.)

Additional Information

משרות נוספות שיכולות לעניין אותך

NetApp Site Reliability Engineer United States, North Carolina

הצטרפו למאות שיצרו קורות חיים ושדרגו את הקריירה שלהם

צרו קו"ח