Server Management: Oversee daily operations of a large-scale AWS hosted Linux server, ensuring optimal performance and uptime.
User Support: Assist users with technical issues, providing timely and effective solutions to enhance their productivity. This includes issues with their Linux environment, issues in accessing it from their laptops, and other backend software issues.
Custom Tooling: Manage, develop and maintain custom Python-based tooling for server management, automation, user dashboards, and self-hosted services (such as Jupyter & Code-server).
System Maintenance: Perform regular updates, backups, and security checks to maintain system integrity.
Requirements
Minimum of 5 years of provable hands-on Linux system administration experience.
Proficiency in managing multi-user Linux environments including clients and servers.
Strong scripting skills in Python for automation and tooling.
Knowledge of networking fundamentals and security best practices.
Excellent problem-solving abilities and attention to detail.
Strong communication skills with the ability to assist users of varying technical expertise.
Ability to work independently and as part of a collaborative team.
Advantages
Experience in a machine learning or data science environment.
Certifications such as RHCSA, RHCE, or Linux+.
Familiarity with containerization technologies (e.g., Docker, Kubernetes).
Our Values
Our values are at the heart of everything we do. We believe great solutions are built through a great community.
Advance Inclusion
Be Accountable Together- We proudly own our actions and our results, taking initiative to ensure our work gets over the finish line as a team.
Continuously Learn- We challenge ourselves for the sake of getting better as individuals, as teams, and as an organization to deliver for our partners.
Debate and Commit- We respectfully and openly debate to strengthen our ideas and build shared conviction - once we decide, we go all in, together.
Dream Big and Act- We boldly tackle complex problems, pressure-test solutions in real-time, and adapt with speed and energy.