The Senior Linux & Cloud Administrator is responsible for 24x7 availability of SAP systems running in SAP ECS;
Respond, troubleshoot and resolve alerts & incidents in the Linux OS or infra layer – IBM Cloud
Manage IBM services and resources , including virtual machines, storage and network.
Monitor and manage IBM infrastructure to ensure optimal performance, scalability, and security.
Manage IBM virtual networks , subnets, routing, and network security groups. Implement monitoring solutions and configure alerts to proactively monitor the environment.
Develop and maintain automation scripts (e.g., PowerShell) to streamline routine tasks and optimize processes.
Collaborate with internal teams to understand and address their requirements and challenges.
Enable internal teams and ensure operational readiness.
Maintain detailed documentation of the configurations, procedures, and best practices.
Follow change management processes during service request execution;
Seek opportunities to streamline standard operating procedures through automation;
Must feel comfortable working in a fast paced, dynamic and flexible environment;
Support Operations 24/7 model with oncall/on duty/ weekend tasks/activities based on the shift schedule.
Apply ITIL incident management processes to ensure timely resolution.
Experience (Role Requirements):
8+ years of professional experience in Linux system administration with a demonstrated ability to perform trouble shooting and incident handling.
Technical Skills:
Expert-level Linux knowledge:Deep understanding of Linux internals, kernel architecture, process and memory management, filesystems, and system calls.
Cloud Infrastructure:Experience with IBM Cloud including core services.
Troubleshooting and Diagnostics:Mastery of tools like top, htop, vmstat, iostat, sar, ps, netstat, ss, journalctl, rsyslog, dmesg, strace, lsof, tcpdump, wireshark, perf, systemd-analyze.
Networking:Advanced understanding of TCP/IP, network interfaces, routing, DNS, DHCP, firewalls, and diagnostic tools.
Backup and Restore:
Infra service:Knowledge of infra services like DNS, LDAP etc.
Security:Strong knowledge of security principles, OS hardening, compliance, and tools for vulnerability scanning and intrusion detection.
Scripting and Automation:Proficiency in Shell scripting, Python, and/or other scripting languages. Experience with Infrastructure-as-Code tools like Ansible, Puppet, Chef, or Terraform.
Virtualization:Familiarity with Docker, Kubernetes, and other virtualization technologies.
Soft Skills:
Analytical and Problem-Solving:Exceptional ability to analyze complex issues, identify root causes, and implement effective solutions.
Communication:Excellent written and verbal communication skills, with the ability to explain technical findings clearly to both technical and non-technical audiences.
Documentation:Ability to create clear, concise, and comprehensive technical documentation.
Incident Management:Experience with ITIL or similar incident management frameworks.