המקום בו המומחים והחברות הטובות ביותר נפגשים
Our Sr. Director Platform Operations will:
Play a pivotal role in setting the roadmap and overseeing the day-to-day management of our AI and ML platforms including: setting strategies and overseeing container management in public cloud (AWS), cloud resource provisioning, ensuring low latency, high availability of cloud resources, cloud optimization, etc.
Maintain a deep understanding of the technical aspects of the platform, including infra, algorithms, APIs and integrations. Provide operations leadership to the engineering and production teams.
Implement robust processes and operations dashboard to monitor platform performance, user feedback, and adherence to service level agreements (SLAs), observability, resiliency, and key operational metrics in real time.
Collaborate with cyber, technology risk management, security and compliance teams to understand the company cyber, risk and compliance requirements. Work closely with product and engineering to ensure the platform adheres to industry best practices, corporate cyber and tech risk management standards
Implement automation and dashboards to visualize vulnerabilities, platform incidents, cloud controls compliance, cloud resource utilization etc. to enable proactive decision making, and risk mitigation.
Work closely with executive leadership and buy-ings to develop a long-term vision and roadmap for the platform operations enhancements
Build a high performing operations team, recruiting world class SREs, production engineers, data engineers, groom, and retain talent on team
Basic Qualifications
Bachelor's degree
At least 9 years of experience managing Platform, infrastructure operations or Site Reliability Engineering in a public cloud environment
At least 7 years of people management experience
Preferred Qualifications
Master's Degree in “STEM” field (Science, Technology, Engineering, or Mathematics)
5+ years of experience in managing large-scale, high-performance, distributed systems as a Site Reliability Engineer or a product engineer
5+ years years experience in setting up and scaling observability platform, providing monitoring and telemetry support and creating Operational health dashboards
3+ years experience in building systems and solutions within a regulated environment
3+ years of experience in Artificial Intelligence, Machine Learning or Cloud infrastructure
3+ years experience with managing distributed systems, multi-tenant, micro services, and container orchestration (Kubernetes)
Experience partnering with technology peers responsible for data architecture and distributed computing infrastructure or platforms
5+ years of experience with machine learning lifecycle(building, training models, serving models, setting up cloud infrastructure or data pipelines) and familiarity with major Machine Learning frameworks
. Eligibility varies based on full or part-time status, exempt or non-exempt status, and management level.
If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation, please contact Capital One Recruiting at 1-800-304-9102 or via email at . All information you provide will be kept confidential and will be used only to the extent required to provide needed reasonable accommodations.
משרות נוספות שיכולות לעניין אותך