Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

IBM Observability Automation Architect 
India, Karnataka, Bengaluru 
472920087

02.12.2024

Your Role and Responsibilities
  • Implement and administrate infrastructure and solutions that support the IBM Cloud VPC.
  • Support the compliance and security integrity of the environment through your work
  • Partner with other teams, functional managers and program managers to deliver mission-critical services to the market
  • Support development of new and enhanced existing capabilities for our compute, storage and network services
  • Adopt and build on automation solutions governed by SRE principles including CI CD pipelines, configuration management, immutable infrastructure deployment, auto healing systems etc.
  • Provide technical escalation support for other Infrastructure Operations teams
  • Conceptualize, Design, implement, manage and create a reliable, highly performant, scalable automation solutions that can build consistency across our infrastructure
  • Work with and adopt open source technologies as well as participate in new IBM innovations across IaaS
  • A self-driven attitude to propose, test and implement solutions and improvements for review and consideration with your peers


Required Technical and Professional Expertise

  • 5+ years of experience in data center infrastructure or relevant work experience
  • 5+ years of experience in large-scale infrastructure design, engineering, and support
  • 5+ years of experience in IT Change, Incident, Problem, Asset management
  • 5+ years of infrastructure engineering with proven record for delivering high-quality, large-scale solutions. Experience designing architectures for scale and performance
  • 5+ years of practical experience with one or more operating systems: Ubuntu (Preferred), CentOS, RHEL or Debian Linux, and Windows Servers.
  • 5+ years of experience debugging issues across a Linux environment with network, storage, compute and orchestration components. Does not need to be code debugging.
  • Development experience with one or more programming languages: PowerShell, Python (preferred), and Ruby
  • 2+ years practical experience with orchestration that uses desired state models and/or finite state machine models of orchestration: Kubernetes(Preferred), OpenShift, etc.
  • 5+ years practical experience Containerization and container orchestration: Docker(preferred) Kubernetes (preferred), OpenShift, rancher, docker swarm, docker compose
  • 5+ years experience with Monitoring technologies: Sydig (preferred), Grafana, Nagios, Zenoss, ELK, Splunk, Zabbix etc.
  • Familiarity with Open Telemetry concepts, Tracing, Metrics, Events and other Observability principles
  • 2+ years of experience with one or more Virtualization technologies: Citrix Xen Hypervisor (Preferred), KVM(also preferred), libvirt, qemu, VMware vSphere, etc.
  • 5+ years of experience with one or more automation and configuration management tools/solutions: Ansible & Terraform (Preferred), Chef, python, bash, puppet, Rundeck, etc.
  • 2+ years of experience with version control systems: github(preferred), gitlab, subversion, etc.
  • Basic experience with databases, both RDBMS like mysql or postrgresql, as well as non-relational databases such as etcd, TimeScaleDB, InnoDB, etc. Not a DBA role.
  • Working knowledge with Network and Storage technologies
  • Working knowledge with ServiceNow, JIRA, Confluence, and GitHub
  • ITIL Foundation V4 certification is a plus


Preferred Technical and Professional Expertise

  • Excellent verbal and written communication skills
  • Highly responsible, motivated, able to work with little direction
  • Experience with design and development of complex systems
  • Ability to troubleshoot complex problems and customer issues
  • Working knowledge of Linux clustering, HA, and Fault Tolerant system implementations: active/active, active/passive, pacemaker, keepalived, haproxy, corosync, LVM
  • 2+ years of experience with complex systems and layered architecture models: OSI, Kubernetes, virtualization, TCP/IP, etc.
  • Working knowledge of what TCP/IP, BGP, Sockets, routing protocols, routes an keepalived are and how they participate in debugging and Highly available systems at scale.
  • Ability to debug an issue across the entire OSI stack of a typical Linux environment across storage, network, compute, OS, system tuning, orchestration.
  • Ability to debug stack traces to particular libraries in code and root cause identification.
  • Working knowledge of a message bus and message queues: kafka(preferred), Spark, RabbitMQ, redis, etc.
  • Extensive experience with databases and debugging their usage with application stacks
  • Experience with and understanding of the interaction and dependencies of a typical three tier model of application stacks, as well as cloud