Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

IBM Site Reliability Engineer - Apptio 
United States, Massachusetts, Lowell 
190929526

25.11.2024
On a typical day in this role, you will interact with Kubernetes, Docker, Helm, Elasticsearch, DataDog, Grafana, Sensu, Puppet, Ansible/AWX, AWS, Azure, Python/Bash/PowerShell, Terraform/Terragrunt. If you don’t know all these tools, don’t worry, we are not expecting that you know them all, we understand that technology evolves quickly.

Major Responsibilities:
  • Scale systems sustainably through mechanisms like automation
  • Ownership of monitoring system
  • Maintain services in production by measuring and monitoring availability, latency, and overall system health.
  • Application expansion and horizontal scaling.
  • Work closely with developers, support and QA teams on maintaining and improving the whole lifecycle of services.
  • Practice sustainable incident response and blameless post-mortems.
  • Provide primary operational support and engineering for multiple large distributed software applications.


Required Technical and Professional Expertise

  • Knowledge of configuration management tools (e.g. Ansible or Puppet)
  • Experience with any scripting language (Bash, Python, PowerShell, etc.)
  • Experience with containerization (e.g., Docker, Podman, etc.)
  • Experience with container orchestration tools (e.g., Kubernetes, Open Shift, Docker Swarm, etc.)
  • Experience with database administration and management (MS SQL Server, PostgreSQL, MongoDB)
  • Familiarity with public cloud providers such as AWS, Azure, or IBM Cloud
  • Experience with monitoring, observability & logging (e.g., DataDog, Prometheus, Grafana, ELK stack, Loki, etc.)
  • Familiarity with RESTful systems and their APIs
  • Experience with any high-level programming languages (Golang, .Net, Java, etc.) is a plus
  • Fluent English language skills


Preferred Technical and Professional Expertise

  • Ability to thrive in autonomy
  • Experience in a large-scale, distributed Linux/Unix or Windows is a plus
  • Mentoring peers and sharing skills
  • Great communication skills