Architect, implement, and maintain advanced automation solutions for provisioning, configuration, and monitoring of engineering tools infrastructure, ensuring scalability, resilience, and high availability
Oversee and support the Atlassian application stack (Jira, Confluence, Bitbucket), GitHub, Artifactory and Polarion, taking ultimate accountability for uptime, performance, and user satisfaction. Manage configurations, OSLC plugin integrations, workflows, reports, templates, permissions, re-indexing, and restoration processes
Lead, mentor, and manage a team of SREs and technical contributors, fostering a culture of accountability, technical excellence, and collaboration. Delegate tasks effectively, set clear goals, and provide guidance to ensure successful project delivery and operational stability
Partner with Github, Artifactory, Polarion, Mattermost and Atlassian tool users to promptly address issues, gather feedback, and implement solutions. Work closely with development and operations teams to integrate engineering tools seamlessly into CI/CD pipelines
Participate in and oversee an on-call rotation, driving rapid incident response and resolution to minimize downtime. Lead post-incident reviews to identify root causes and implement preventive measures
Monitor system health, troubleshoot complex issues, and deploy proactive strategies to prevent disruptions. Conduct performance analysis and capacity planning to anticipate future needs and optimize resource utilization
Manage regular backups, upgrades, and patch cycles for engineering tools, ensuring compliance with security standards and operational stability
Develop and maintain comprehensive documentation, runbooks, and best practices. Promote knowledge sharing within the team and across the organization to enhance tool adoption and administration efficiency
Collaborate with leadership to define long-term strategies for tool infrastructure, aligning with organizational growth and technical objectives. Assess and integrate new technologies to enhance reliability and efficiency
Drive the adoption of automation frameworks and modern practices, mentoring the team in scripting and tool development to reduce manual effort and improve system reliability
What You’ll Bring
Bachelor’s Degree in Computer Science, Information Technology, or a related field (or equivalent experience)
Extensive experience in the installation, configuration, development, debugging, support, and upgrades of GitHub Enterprise and Atlassian tools (Jira, Confluence, Bitbucket)
Proficiency in managing and automating Confluence Spaces, permissions, and Jira projects, with a track record of optimizing user workflows
Deep knowledge of Polarion administration, including templates, workflows, permissions, OSLC integrations, and High Availability setups (HA experience highly desirable)
Strong programming and scripting skills (Python, Shell, Golang) with hands-on experience in automation frameworks like Ansible for administration, monitoring, and custom plugin/workflow development
Expertise in containerization (Docker) and orchestration (Kubernetes), with practical application in production environments
Familiarity with monitoring and logging tools such as Prometheus, Grafana, and Splunk to ensure observability and performance insights
Proven ability to diagnose and resolve cooplex issues across storage, OS, network, virtualization, and application/database stacks
Demonstrated experience leading technical teams, with a focus on mentoring, coaching, and fostering professional growth
Strong project management skills, with the ability to prioritize tasks, manage resources, and deliver on deadlines in a fast-paced environment