Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

IBM SRE 
India, Karnataka, Bengaluru 
234918057

04.09.2024

Your Role and Responsibilities
  • Implement and automate infrastructure solutions that support IBM Cloud products and infrastructure
  • Developing and Administer CI/CD systems and tools for development and test teams
  • Keeping your assigned site or service up and running or getting it back up and running quickly when failure occurs
  • Working closely with internal partners and teams to ensure that our infrastructure meets security, SLA, and performance requirements
  • Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
  • Persistent testing of application and infrastructure resiliency over a variety of error conditions.
  • Support the compliance and security integrity of the environment
  • Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks.
  • Standup and maintain pre-production and developer environments to support the entire development organization and improve overall team velocity
  • Use metrics and analytics to determine reliability issues and remove them through automation and tooling
  • Be an advocate for our customers, providing them self-diagnosing tools to resolve common issues that arise in the field
  • Required to participate in code reviews for your peers’ development work, triage and solve live customer issues, and participate in all scrum activities
  • Additionally, monitor, measure, and improve code and data performance for the application you help to develop
  • Available for on-call shifts during daytime hours and weekends
  • All of this will take place in a strong team environment, which necessitates strong communication


Required Technical and Professional Expertise

  • 4-8 years of experience delivering code for active Cloud Services/Projects
  • Experience debugging complex problems
  • Experience designing, building, and operating large-scale production systems
  • Expertise in Ansible, Bash, core Python development, and deployments in production environment is a must.
  • Experience automating infrastructure, configuration management, testing, and deployments using tools like Ansible, Chef and can explain the Infrastructure as Code paradigm
  • A strong understanding of diverse infrastructure platforms and infrastructure concepts required.
  • Systems management experience in Linux/UNIX systems (RHEL preferred)
  • Experience in Docker and containerization technologies
  • Experience with cloud computing technologies
  • Experience with k8s CRDs, k8s controller programming with watcher informer model
  • Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data center administration, configuration , Incident management and support
  • 5+ years of working knowledge with one or more operating systems: Ubuntu (Preferred), RHEL, CentOS Linux, and Windows Servers
  • Strong experience with one or more Virtualization technologies: KVM, Xen, Citrix Hypervisor, VMware vSphere, etc.
  • Working knowledge with one or more programming tools: Bash, PowerShell, Python, Ruby and Go.
  • Strong Communication skills


Preferred Technical and Professional Expertise

  • Working knowledge with one or more key infrastructure tools/products: Ansible, Chef, etc.
  • Working knowledge with Container technologies: Kubernetes, Docker, etc.
  • Working knowledge with Monitoring technologies: Zabbix, Splunk, etc.
  • Working knowledge with ServiceNow, JIRA, Confluent, and GitHub
  • Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data center administration, configuration , Incident management and support
  • Experience with technologies enabling reliable data processing pipelines such as Kafka, Elasticsearch, Splunk; database and data visualization technologies for operations such as SQL dbs, Influxdb, Grafana, Kibana.
  • Experience with event monitoring/management ecosystems like Zabbix, Nagios, Sysdig, LogDNA, ServiceNow.