Envision and implement changes that improve system reliability
Conduct deep investigations into new technologies and resolve unexpected issues that arise during operation
Provide guidance on system architecture and security best practices
Review, digest, and distill complex code and technical topics to ensure clarity and accessibility for all engineers
Provide technical leadership, foster collaboration, and drive key initiatives to completion
Uphold team values, including engineering excellence, curiosity, bias for action, self-awareness, inclusivity, and openness
What You’ll Bring
Minimum 2+ years of relevant industry experience
Experience in developing, scaling, and maintaining infrastructure for distributed systems, including IoT applications
Proficiency in many of the following: Linux, Networking, Kubernetes, on-premises data centers, AWS, Terraform, Prometheus, Helm, GitHub Actions, PostgreSQL, and Kafka
Strong understanding of system design principles and the challenges of ensuring availability, reliability, scalability, and security in distributed software systems
Effective verbal and written communication skills
Ability to navigate uncertainty and loosely defined problem statements
Strong analytical and problem-solving skills, with the ability to evaluate trade-offs and make well-reasoned decisions
Collaborative mindset with a willingness to learn, mentor, and engage in open discussions