המקום בו המומחים והחברות הטובות ביותר נפגשים
Key job responsibilities
Infrastructure System Development & Automation
- Build scalable APIs, microservices, and integration solutions to streamline infrastructure monitoring and control.
- Create automated deployment frameworks for configuring, provisioning, and managing critical infrastructure components.
Monitoring & Data Analytics- Design and implement real-time dashboards for power monitoring, HVAC performance, and network infrastructure health.
- Develop data analytics pipelines to process and analyze infrastructure telemetry, supporting predictive maintenance and anomaly detection.Operational Support & Incident Response- Automate incident detection, alerting, and response workflows to minimize downtime and improve infrastructure reliability.
- Support Sev1/Sev2 incident response, developing automated troubleshooting and remediation tools for infrastructure failures.
- Work with DCIO Engineers to enhance on-call operations through software-defined automation.
- Participate in post-incident reviews (PIRs), implementing software-driven solutions to prevent recurrence.Infrastructure Integration & Standardization- Develop and maintain OpenDCIM-based systems to track power, cooling, and network infrastructure assets.- Support integration of Fault-Managed Power (Project Constellation), Fiber Media Conversion (Project Opti-Bridge), and Split CT Power Monitoring solutions into automated workflows.- Partner with TPMs to define software solutions for infrastructure lifecycle management, remediation projects, and scalability initiatives.
- Work with DCIE Engineers to implement self-healing infrastructure capabilities through automation and AI-driven insights.
- Continuously improve developer operations (DevOps) and infrastructure automation practices within DCIE.A day in the life
As an SysDev in DCIE, you’ll begin your day reviewing infrastructure telemetry dashboards and overnight logs to identify anomalies or performance degradations across Amazon Fulfillment Centers. You’ll then sync with Technical Program Managers (TPMs) and DCIE Engineers to prioritize active workstreams, whether it’s scaling power monitoring pipelines, deploying automation for HVAC lifecycle tracking, or integrating new telemetry sources into the DCIM platform. Midday, you’ll be coding—whether building out RESTful APIs, refining predictive maintenance models, or developing automation scripts for infrastructure deployment. You’ll participate in design reviews, contribute to PIRs, and resolve automation or telemetry-related blockers raised by on-call engineers. Your day ends with sprint planning or roadmap check-ins, driving progress on scalable, self-healing infrastructure systems that improve uptime, efficiency, and global visibility for thousands of Amazon sites.
- Medical, Dental, and Vision Coverage
- Maternity and Parental Leave Options
- Paid Time Off (PTO)
- 401(k) Plan
We enable Operations Technology Solutions (OTS) by delivering high-performance power, cooling, structured cabling, edge compute, and automation solutions that ensure reliable and efficient on-premises hardware operations.Our work spans Demarcation Rooms, MDFs, IDFs, power systems (UPSs, ATSs, PDUs), fault-managed power, cooling and containment, Computers on Wheels (COWs), telecommunications, and distributed edge compute infrastructure to enhance data processing and reduce latency.
- Experience in automating, deploying, and supporting large-scale infrastructure
- Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
- Experience with Linux/Unix
- Experience with CI/CD pipelines build processes
- 3+ years of experience in software development, systems engineering, or infrastructure automation.
- Proficiency in Python, Java, Go, or another high-level programming language.
- Experience developing RESTful APIs, microservices, and cloud-based infrastructure solutions.
- Strong background in infrastructure automation (Terraform, Ansible, AWS CloudFormation, etc.).
- Familiarity with database systems (SQL, NoSQL, or Time-Series DBs like InfluxDB, Prometheus, etc.).
- Hands-on experience with monitoring and logging platforms (Grafana, Kibana, Splunk, AWS CloudWatch, etc.).
- Experience with DevOps, CI/CD, and version control (Git, GitHub, Jenkins, etc.).
- Experience with distributed systems at scale
- Experience with data center infrastructure monitoring, automation, and DCIM platforms.
- Knowledge of power systems (UPS, ATS, PDU), HVAC, and structured cabling infrastructure.
- Hands-on experience with IoT, telemetry data processing, and edge computing.
- Experience with AI/ML applications for predictive maintenance and anomaly detection.
- Background in network automation and SDN (Software-Defined Networking).
משרות נוספות שיכולות לעניין אותך