What you'll do... Reporting to Sr Manager of Cloud Engineering, this Staff Software Engineer role will work closely with Infrastructure Engineering, and Operations teams to provide highly available and fault tolerant cloud storage services. Using various automated tools, you will provide scalable and reliable storage solutions to manage Private clouds within multiple disparate cloud infrastructures.
You will be ensuring the reliability, performance, and availability of software systems by automating infrastructure tasks, monitoring systems, and improving processes to minimize downtime and ensure smooth operations. You will focus on site reliability engineering aspects of cloud storage platform and will be responsible for responding to and resolving incidents, including diagnosing problems, implementing solutions, and ensuring that systems are restored to a stable state.
You will define and track SLOs and SLIs to measure the reliability and performance of services, ensuring that they meet business requirements. Working with customers to deploy cloud-based services to analyze and improve storage efficiencies based on utilization, performance, and TCO will be critical. You will identify opportunities in the ecosystem that will help us to unlock higher value for our service offerings. You will be part of a team of highly motivated engineers who work very well together.
What you’ll do
- Design, build, tune, troubleshoot large scale cloud storage infrastructure running on Ceph.
- Site reliability engineering for distributed private cloud storage infrastructure.
- Design automation for storage optimization and other processes on private cloud platforms
- Drive significant architectural decisions about the next iterations of our Private Cloud infrastructure that will improve efficiency and reduce Cloud Storage spend Enable Application teams to follow best practices to deploy on Cloud Platforms and optimize Storage spend and improve efficiency and performance based on utilization with Ceph
- Breach the communication gap between Infrastructure (Cloud) and Applications
- Participate in a 12x7 on-call rotation
- Create and maintain technical documentation for operational readiness
- Design and maintain cloud storage best practices
- Build monitoring that alerts on symptoms rather than on outages
- Become a solid contributor on our team, and build, extend and maintain some of the key infrastructure that powers our Private Cloud platform
- Provide troubleshooting expertise for storage performance and other issues
- Train and educate others within Technology about Cloud technologies.
- Solve business needs by evaluating different storage technology options and vendor products.
- Develop and integrate provisioning and lifecycle tools for storage services components
- Contribute to an environment that promotes and reinforces the highest standards of integrity and ethics
- Demonstrates creativity and strength in the face of change, obstacles, or adversity
- Adapt to competing demands and shifting priorities
What you’ll bring
In addition to being technically sound, results driven with a strong operational background and impressive analytical ability, you will need the following to be successful:
- 3+ years of experience supporting large scale, highly available, production Cloud Storage deployments with Ceph
- Strong familiarity with any combination of OpenStack/Kubernetes/Rook
- Proven work experience as a Site Reliability Engineer or similar role
- Experience in programming, scripting and development
- Python or Shell Scripting
- Experience with Linux Administration and System Troubleshooting
- Networking concepts and administration TCP/IP, routing, switching, VLANs, Load balancing
- Experience using source control systems (git)
- Experience in configuration management tools like Ansible/Puppet/Chef
- Experience with Containers (Kubernetes, Docker, etc.)
- Experience with monitoring, reporting tools and data analytics
- Good understanding of clustered/distributed systems
- Apply best practice and team standards while meeting service level objectives
- Experience working with cloud deployments (scaling, resiliency, load balancing etc) and solid understanding of Service Monitoring, KPI, SLA, Disaster Recovery
- Deep experience with the Linux ecosystem, automation of common tasks, and configuration of systems monitoring tools
- Experience with capacity/performance management, monitoring and tuning
- Experience with firewalls, VPN, routing, switching, load balancers, monitoring, security and DNS
- Strong interpersonal skills to coordinate with other organizations across the business while managing customer expectations.
You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable.
For information about PTO, see
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.
For information about benefits and eligibility, see
Reston, Virginia US-07759/Bellevue, Washington US-11075:The annual salary range for this position is $132,000.00-$264,000.00 Sunnyvale, California US-08479:The annual salary range for this position is $143,000.00-$286,000.00 Bentonville, Arkansas US-09050:The annual salary range for this position is $110,000.00-$220,000.00 Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include: - Stock
Minimum Qualifications... Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 4 years’ experience in software engineering or related area.Option 2: 6 years’ experience in software engineering or related area.
Preferred Qualifications... Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
640 W California Avenue, Sunnyvale, CA 94086-4828, United States of America