Network Observability SRE ב-IBM ליד India, Bengaluru , 63674698

Your role and responsibilities

Automation:Develop and maintain automation tools and scripts to streamline deployment, monitoring, and management of the infrastructure and applications.

Monitoring and Alerting:Set up and maintain monitoring and alerting systems to proactively identify and resolve issues before they impact customers or services. Including participation in on-call rotations to respond promptly to high priority incidents.

Performance Optimization:Identify opportunities for performance optimization and work with development teams to implement improvements.

Documentation:Maintain up-to-date documentation for the infrastructure, processes, and procedures.

Collaboration:Work closely with development teams, product managers, and other stakeholders to understand requirements and ensure the reliability of the platform.

Continuous Improvement:Participate in post-incident reviews, retrospectives, and other forums to identify areas for improvement and drive continuous improvement initiatives.

Required education

Bachelor's Degree

Preferred education

Master's Degree

Required technical and professional expertise

· Strong Linux systems engineering background with CentOS/RHEL or Debian including experience building, maintaining and troubleshooting these systems.

· Automation and Scripting: Strong scripting skills (e.g., Bash, Python) and experience with configuration management tools (e.g., Ansible, Chef, Puppet) to automate deployment and management tasks.

· Excellent Git skills (merges, branching, forking)

· Experience with Cloud Platforms: Strong experience with cloud platforms such as IBM, AWS, Azure, or Google Cloud Platform, including expertise in:

o Deploying and managing services in these environments.

o Managing, and troubleshooting containerized applications.

· Troubleshooting and Problem Solving: Strong troubleshooting skills and the ability to quickly identify and resolve complex issues in a production environment, including experience with incident response and post-incident analysis.

Preferred technical and professional experience

Container Orchestration:Proficiency in container orchestration tools such as Nomad or Kubernetes, including experience with Hashicorp Consul/Vault or equivalents.

Monitoring and Logging:Experience with monitoring and logging tools (e.g., ELK stack, Grafana, Prometheus) to monitor the health and performance of infrastructure and applications. Including experience building and maintaining these tools.

Security:Knowledge of implementing security best practices and maintaining compliance standards (Center for Internet Security (CIS) Benchmarks, FedRAMP).

Security:Ability to patch software or adjust configurations to mitigate Common Vulnerabilities and Exposures (CVE) in a timely fashion.

· Experience with clustered time series database technologies such as InfluxDB as well as experience with distributed event streaming platforms using Kafka and Telegraf.

CI/CD:Experience with application deployment using CI/CD tools such as Jenkins and Tekton.

· Working knowledge with GitHub, JIRA, Confluence, and ServiceNow.

IBM Network Observability SRE
India, Karnataka, Bengaluru
63674698

IBM Network Observability SRE India, Karnataka, Bengaluru

Applied Materials Lead Architect – SRE & Observability India, Karnataka

Citi Group SRE Observability Lead - SVP PUNE India, Maharashtra, Pune

Apple ASE Observability SRE United States, West Virginia

IBM Network Observability SRE India, Karnataka, Bengaluru 63674698

IBM Network Observability SRE India, Karnataka, Bengaluru

Applied Materials Lead Architect – SRE & Observability India, Karnataka

Citi Group SRE Observability Lead - SVP PUNE India, Maharashtra, Pune

Apple ASE Observability SRE United States, West Virginia

IBM Network Observability SRE
India, Karnataka, Bengaluru
63674698