Position Summary
- We are seeking a highly skilled and experienced Manager, SRE Dev, to join our dynamic team. As a Manager SRE Dev, you will play a crucial role in leading our efforts to enhance the reliability, scalability, and performance of our systems through innovative development practices. Your primary focus will be on driving the development work within the Site Reliability Engineering (SRE) team, with a specific emphasis on reducing toil and automating repetitive tasks for the SRE ops organization.
Primary Responsibilities
- Manage local SRE Dev/DevOps team.
- Define and manage project plans and deliverables for the team.
- Investigate and resolve technical issues.
- Properly and quickly prioritizing and scoping requirements.
- Tracking issues and reporting statuses via platforms like Gitlab.
- Perform research and provide read outs to teams on such tools and technologies.
- Provide guidance to team members in the right and proper direction (e.g. providing guidance on how to be using different tools and which tool to use).
- The individual will work with the respective personnel to create/manage and work to complete the defined project plans and deliverables.
- Improve our Kubernetes application delivery to production.
- Design procedures for system troubleshooting and maintenance.
- Performs other related duties as assigned.
The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.
Knowledge, Skills and Abilities
- Proven expertise in software engineering with a focus on reliability, scalability, and performance.
- Proficient in programming languages such as Python, Go, Java, or similar, and adept with software development frameworks and tools.
- Deep understanding of DevOps and site reliability engineering principles, including DORA, SLOs, SLIs, error budgets, and incident management.
- Strong experience with both private and public cloud computing platforms (e.g., AWS, Google Cloud, Azure).
- Proven experience with infrastructure as code (IaC) tools like Terraform or Ansible, and configuration management systems.
- Excellent knowledge of Distributed cloud, Kubernetes, GitOps, CI/CD, and Networking.
- Strong experience with observability platforms.
- Architectural experience in constructing and overseeing large-scale cloud-based projects.
- Excellent communication and presentation skills.
- Problem-solving mindset with demonstrated capabilities.
- Strong management skills evidenced by building and leading high-performing teams.
- Ability to address technical challenges effectively.
- Excellent communication and interpersonal skills, facilitating collaboration across teams and influencing stakeholders at all organizational levels.
- Commitment to continuous learning and improvement, willing to adapt to evolving technologies and practices.
Qualifications
- 10+ years' experience in engineering with Team Lead/Management experience.
- Extensive software engineering experience with focus on DevOps/SRE.
- Excellent organizational agility and communication skills throughout the organization
Environment
- Abundance of Freedom: Experience a workplace that encourages autonomy, fostering an environment where your ideas can flourish.
- Continuous Learning: Enjoy ample opportunities for professional growth, supported by a team of great mentors with solid backgrounds across various domains.
- Collaborative Team: Join a cohesive and supportive team from day one, ensuring a seamless transition into a workspace that feels like home.
The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.