"VAST's data management vision is the future of the market."- Forbes
As a SOC Operator, you will be responsible for monitoring and maintaining the health and performance of our fleet of installed clusters. You will work in a 24/7 operations environment, ensuring the availability, reliability, and security of services. This role involves real-time monitoring, incident detection, incident management, incident resolution, and clear written and verbal communication with other teams and stakeholders.
Responsibilities:
- Monitor clusters using internal monitoring tools to detect and troubleshoot issues promptly.
- Respond to alerts and incidents in a timely manner, following standard operating procedures (SOPs) and escalation processes.
- Perform initial investigation and diagnosis of problems, escalating complex issues to support.
- Document incidents, including their details, troubleshooting steps, and resolutions in the incident tracking system.
- Collaborate with other teams, including Support, R&D, Account teams, and customers to ensure effective incident resolution and communication.
- Conduct routine checks and audits to identify potential problems or vulnerabilities.
- Assist with the implementation of changes and updates to the infrastructure as directed by team leads.
- Participate in shift-based work schedules, including nights, weekends, and holidays, to provide 24/7 coverage in the SOC.
- Maintain up-to-date knowledge of VAST Data Platform technologies via prescribed hands-on training modules.
- Adhere to security protocols and ensure the confidentiality, integrity, and availability of network and system data.
- Provide excellent customer service to internal and external stakeholders during incident resolution and communication.