As a Software developer in the cloud storage area, you will be implementing and consuming APIs in the IBM cloud infrastructure environment.You will be a motivated self-starter who loves to solve challenging problems and feels comfortable managing multiple and changing priorities, and meeting deadlines in an entrepreneurial environment.
You are Highly organized, detail-oriented, excellent time management skills and able to effectively prioritize tasks in a fast-paced, high-volume, and evolving work environment.
Responsibilities include:
- Designing and developing storage integrations to enable and support cloud platform business efforts
- Participate in troubleshooting and fixing issues in existing cloud storage environment
- Required to produce code that is secure, scalable, and reliable, supported by unit tests, functional tests, and technical documentation
- Required to participate in code reviews for your peers’ development work, triage and solve live
- customer issues, and participate in all scrum activities
- Additionally, monitor, measure, and improve code and data performance for the application you help to develop
- Available for occasional on-call shifts during daytime hours and weekends
Required Technical and Professional Expertise
- 8-16 years of experience delivering code for active Cloud Services/Projects
- Experience debugging complex problems
- Experience designing, building, and operating large-scale production systems
- Expertise in Ansible, Bash, core Python development, and deployments in production environment is a must.
- Experience automating infrastructure, configuration management, testing, and deployments using tools like Ansible, Chef and can explain the Infrastructure as Code paradigm
- A strong understanding of diverse infrastructure platforms and infrastructure concepts required.
- Systems management experience in Linux/UNIX systems (RHEL preferred)
- Experience in Docker and containerization technologies
- Experience with cloud computing technologies
- Experience with k8s CRDs, k8s controller programming with watcher informer model
- Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data centre administration, configuration , Incident management and support.
- Design, develop, and maintain automation tools and scripts to improve operational efficiency and reduce manual errors.
- Implement Infrastructure as Code (IaC) best practices for provisioning and managing infrastructure across various environments.
- Monitor and troubleshoot system performance issues, identify root causes, and implement solutions to prevent future occurrences.
- Participate in incident response activities, diagnose problems, and work towards swift resolution to minimize downtime.
- Collaborate with developers and operations teams to define Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
- Continuously improve monitoring and alerting processes to ensure proactive identification and resolution of potential issues.
- Stay up-to-date with the latest trends and technologies in SRE and cloud computing.
Preferred Technical and Professional Expertise
- Experience with Linux virtualization technologies such as KVM, Xen and QEMU
- Experience with Ceph, NFS, NVME, or object storage technologies
- Excellent Git skills (merges, rebase, branching, forking, submodules)
- Experience with Python, Ansible, Terraform, Jenkins
- Experience with application deployment using CI/CD
- Experience with monitoring and alerting tools (e.g., Prometheus, Grafana).
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Ability to work independently and as part of a team in a fast-paced environment.
- Experience with chaos engineering principles.
- Experience with performance optimization techniques.