Design and modify complex terraform deployments using multiple modules, with heavy emphasis on code reuse
Design and modify Ansible roles and playbooks with a focus on idempotency. Build playbooks and roles with flexibility to use in multiple cloud environments across multiple operating systems.
Defend and explain code - Collaborate effectively with upstream teams to advocate for and defend the rationale behind your code and design choices, ensuring alignment with broader project goals and standards
Plan multi-tenant and multi-cloud Docker and Kubernetes deployments inside the framework of a cluster management platform
Design, plan, and analyze multi-cloud network connectivity for management and customer traffic
Set standards for documentation, identify areas needing updating or addition, ensure that documentation of the environment is complete and robust
Ensure that procedures and other technical documentation cover the roles and responsibilities of the program
Develop security strategy based on security policy. Determine how monitoring, logging, and alerting should be implemented across multiple cloud providers to create a cohesive security landscape across all customer and internal environments
Enhance current monitoring and logging design to ensure equal and comprehensive coverage of multiple cloud providers
Design, plan, and implement generative AI integration into the current environment across all cloud providers
KNOWLEDGE AND SKILLS:
A deep understanding of current AWS, Azure, GCP services and their respective roadmaps. The ability to take this understanding and combine it with internal product development and customer requests to plan future deployments and offerings
Expert level understanding of Hashicorp Configuration Language (HCL). Ability to analyze a large codebase with multiple module dependencies, and track branch features when subscribing to an upstream repository.
Ownership for best practices in code quality, automated testing, and continuous integration/continuous deployment (CI/CD) practices
A deep syntactical understanding of HCL, Python, and Ansible. Be able to define coding standards, code layout. The ability to turn high level product requests into Jira tickets for implementation
The ability to analyze the current Authentication and Authorization infrastructure in a multi-cloud environment and develop a road map for the future of Authentication and Authorization
Understand the existing multi-cloud, multi-tenant logging infrastructure, provide recommendations for improvements and best practices
Deep knowledge of networking concepts: routing, firewalling, load balancing, DNS, DHCP, BGP, IPSec VPN
Expert level knowledge of at least on Linux distribution. Develop linux deployment strategy, be the go to resource for Linux automation questions
Working knowledge of Jira
QUALIFICATIONS:
10+ years of experience with the following technologies:
Experience with Unix / Linux operating system internals and administration (e.g., filesystems, inodes, system calls, hardening) and networking (e.g., TCP / IP, routing, DNS, network topologies, SDN).
PREFERRED QUALIFICATIONS:
Expertise in designing, analyzing and troubleshooting large-scale distributed systems
Ability to debug and optimize code and automate routine tasks
Systematic problem-solving approach coupled with strong communication skills and a sense of ownership and drive
EDUCATION:
Bachelor’s degree in Computer Science or equivalent practical experience