Software engineering is a core discipline at F5 for many roles. As a software engineer specializing in site reliability, you will bring software engineering and automated solution mindset to your work.
The Site Reliability Engineer will be responsible for ensuring the reliability, availability, and scalability of critical systems and SaaS platforms. Systems under the care of a SRE must operate effectively and reliably through scalable builds and deployments, frequent releases, and complex architectures that encompass modern technologies. You will work closely with technical and non-technical teams throughout the organization to facilitate the design and implementation of scalable solutions, drive automation initiatives, and monitor and maintain the
performance of critical systems.
What You’ll Do
- Apply modern engineering principles and practices to operational functions and employ this methodology throughout the full system lifecycle; from initial concept and architecture through deployment, daily operation, and overall optimization, and apply these practices to refining existing systems.
- Support and maintain technology systems to ensure optimal performance, reliability, and security.
- Scale systems sustainably through mechanisms such as automation and evolve systems by fostering changes that improve velocity.
- Troubleshoot and resolve complex issues, including systems failures, connectivity problems, and performance bottlenecks.
- Partner with cross-functional teams to design and implement scalable and robust system architecture to improve services.
- Investigate various open source and proprietary technologies, components, libraries, tools etc. and help build a highly available, highly scalable and easily manageable system.
- Apply observability and data skills to proactively measure system performance, diagnosing services/needs and quickly identify solutions.
- Participate in service operation and RCA activities and assist with defining SLOs and SLIs for business stakeholders.
- Implement and enforce security best practices to protect our systems, data, and infrastructure against unauthorized access, cyber threats, and vulnerabilities.
- Create and maintain comprehensive knowledge bases for system documentation, including standard operating procedures, configurations, and troubleshooting guides, to support end-users' ability to use the systems effectively.
- Participate in on-call rotation.
- Responsible for upholding F5’s Business Code of Ethics and for promptly reporting violations of the Code or other company policies.
- Performs other related duties as assigned.
The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.
This role may require occasional after-hours work to manage system upgrades or respond to critical incidents.
What You’ll Bring
- A security-first approach to managing resources in cloud platforms, SaaS platforms, containerization, and orchestration.
- Proficiency in Agile delivery, DevSecOps principles and associated tools and technologies.
- Knowledge in technology systems including infrastructure or SaaS platforms.
- Experience with compliance and regulatory guidelines, incident response and reporting, access control and vulnerability management.
- Solid understanding of cybersecurity principles and best practices.
- Demonstrated ability to work both independently and as an integral member of an agile team.
- Experience with observability tooling including logging infrastructure, continuous monitoring, tracing systems, alert definitions, etc.
- Proficient communication, planning, problem solving, trouble shooting, identifying performance bottlenecks and organization skills.
- Flexibility to adapt to changing project requirements and timelines.
Qualifications
- Bachelor’s degree in computer science or equivalent experience
- 1-3 years of experience as an SRE or relevant experience in a Windows Administrator role
- Hands-on experience with Windows Server Administration and troubleshooting in Linux, Windows as well as patching and health monitoring
- Azure or relevant cloud working experience (Storage, Compute, Security, Encryption & IAM Services)
- Scripting experience (Bash, Shell and/or Python)
- Experience with Virtualization, such as VMWare is desirable
- Nice to have - Experience with planning, deploying, troubleshooting proprietary services on Linux and Windows such as Cleo, Thales & Aspera is desirable
- Nice to have – Experience in Salesforce Administration
- Willing to work in shifts, weekend on-call support if there is a need.
- Strong problem-solving and analytical skills.
- Excellent communication and interpersonal skills, with the ability to work cross-functionally.
- Detail-oriented with a focus on operational efficiency and security.
- Demonstrated ability to work both independently and as an integral member of an agile team
- Ability to manage multiple priorities.
The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.