Ensure 24x7 availability of SAP systems hosted in SAP ECS.
Respond to, troubleshoot, and resolve alerts and incidents at the Linux OS and infrastructure layers.
Manage Alibaba Cloud services , including virtual machines, storage, and network resources.
Monitor and manage infrastructure to ensure optimal performance, scalability, and security.
Configure and manage virtual networks, subnets, routing, and network security groups.
Implement monitoring tools and configure alerts for proactive system management.
Develop and maintain automation tools to streamline routine tasks and processes.
Collaborate with internal teams to address technical requirements and resolve issues.
Maintain comprehensive documentation of configurations, procedures, and best practices.
Follow ITIL-based change management processes.
Identify and implement automation opportunities in daily operations.
Work comfortably in a fast-paced, flexible, and dynamic environment.
Participate in 24/7 operations support, including on-call, weekend, and shift-based duties.
Apply ITIL incident management principles for effective resolution.
8+ years of hands-on experience in Linux system administration.
Proven expertise in troubleshooting and incident handling.
Linux Expertise: Deep knowledge of Linux internals, kernel architecture, memory management, file systems, and system calls.
Cloud Infrastructure: Experience with Alibaba Cloud or equivalent platforms such as AWS, GCP, or Azure.
Diagnostics & Troubleshooting: Proficiency with tools such as top, htop, vmstat, iostat, sar, ps, netstat, ss, journalctl, rsyslog, dmesg, strace, lsof, tcpdump, wireshark, perf, and systemd-analyze.
Networking: Strong understanding of TCP/IP, DNS, DHCP, subnetting, and related protocols.
Backup & Restore: Familiarity with data protection strategies.
Infrastructure Services: Working knowledge of DNS, LDAP, and other essential infra services.
Security: Understanding of OS hardening, compliance, security best practices, vulnerability scanning, and intrusion detection tools.
Scripting & Automation: Proficiency in Shell, Python, and tools like Ansible, Puppet, Chef, or Terraform.
Virtualization: Familiarity with container and orchestration technologies like Docker and Kubernetes.
Analytical Thinking: Strong analytical and root cause analysis capabilities.
Communication: Excellent verbal and written communication skills, with the ability to articulate technical content to both technical and non-technical stakeholders.
Documentation: Ability to produce clear and concise technical documentation.
Incident Management: Experience with ITIL or similar frameworks.
Learning Mindset: Passion for continuous learning and skill enhancement.
Fluency in English is mandatory.
Monitoring: Prometheus, Grafana
Log Management: Splunk
Diagnostics: As listed under Technical Skills
Job Segment:Cloud, ERP, Operations Manager, System Administrator, Linux, Technology, Operations
משרות נוספות שיכולות לעניין אותך