Share
What you will be doing:
Collaborate on translating business objectives into actionable plans
Address operational challenges, automate processes, and iterate for efficiency
Tackle systemic reliability issues with multi-functional teams.
Monitor, optimize, and manage system performance and resources.
Institute validated practices for reliability, remediations, and troubleshooting.
Design, deploy, and automate production support, documenting essential knowledge.
Navigate intricate tasks with a deep understanding of SRE principles.
Lead cross-organizational projects from inception to completion.
Mentor and train junior engineers for professional development.
Serve as a subject matter expert in core team functions.
What we need to see:
15+ years of working experience in cloud, platform or SRE roles
A Bachelors or Masters Degree in an Engineering or Computer Science or related field or equivalent experience
Proficient in one or more programming languages: Python, Go, Perl, or Ruby.
Hands-on experience handling and scaling distributed systems in a public, private, or hybrid cloud, on-prem environment 24x7x365
Has delivered software with full understanding of deploying applications in Kubernetes clusters along with GPU and CPU pod scheduling (Ability to understand on Prem)
Has maintained and managed Micro-services relating to AI platforms (Inference, Training, Evaluation, Ingestion)
Hands-on experience in deploying, supporting, and supervising new and existing services, platforms, and application stacks.
Experience with CI/CD systems such as Jenkins, GitHub Actions, etc.
Background with Infrastructure as Code (IaC) methodologies and relevant tools.
Extensive experience working with MS Windows Server and/or Linux operating systems.
Solid communication skills, demonstrating the ability to comprehend and articulate technical issues to a non-technical audience.
Ways to stand out from the crowd:
Cloud expertise in Azure and AWS.
Passionate and experienced in AI methodologies.
Strong background in software design and development.
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
You will also be eligible for equity and .
These jobs might be a good fit