Finding the best job has never been easier
Share
You will be required to deeply understand technology landscapes, and evaluate the use of new technologies. You will be influential within your team and work with peers and senior leaders to define and revise the standards for operational excellence across systems. You will consistently tackle abstract issues that span multiple functional areas and drive your team to push for improvements that can scale across other teams, services, and platforms.
Key job responsibilities
Provide support for cluster and node management, ensuring smooth operation of LLM infrastructure.
Continuously improve and automate our cluster/capacity/maintenance upgrades.
Develop automation tools for improving operational excellence.
Work on operations and maintenance driven coding projects, primarily in Ruby, Rails, Java, Python, or shell scripts, AWS, web technologies projects.
Should have hands-on experience in Kubernetes and expertise in different AWS services. Participate in design and code reviews and identify bottlenecks.
Troubleshoot and research root causes throughly and resolve defects.
- 3+ years of administrative experience in networking, storage systems, operating systems and hands-on systems engineering experience
- Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
- Experience with Linux/Unix
- Experience with CI/CD pipelines build processes
- Experience with distributed systems at scale
These jobs might be a good fit