As an AIOps Platform Engineer within our platform operations team, you will be tasked with the design, construction, and maintenance of our AIOps solution. This role demands a profound knowledge of AI/ML technologies, IT infrastructure, and platform engineering.
Job Responsibilities:
- Design and implement a robust AIOps platform to support the AI/ML and Data organization's operational needs
- Collaborate with data scientists, machine learning engineers, and platform operations teams to integrate AI/ML agents into the platform
- Develop and maintain data pipelines and workflows to ensure efficient data collection, processing, and analysis to provide actionable intelligence for agents.
- Implement monitoring and alerting systems to ensure the platform's reliability, availability, and performance
- Automate routine tasks and processes to enhance operational efficiency and reduce manual intervention
- Troubleshoot and resolve platform-related issues, ensuring minimal impact on AI/ML operations
- Stay current with industry trends and advancements in AIOps, AI/ML, and data engineering
- Develop and deploy agentic systems and agents to automate routine tasks and processes, enhancing operational efficiency
Required qualifications, capabilities and skills:
- Bachelor’s degree in Computer Science, Mathematics, or a related field
- Proven experience in platform engineering, with a focus on AI/ML technologies and data operations
- Strong programming skills in languages such as Python, Java, or Scala
- Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes)
- Familiarity with data processing frameworks (e.g., Apache Kafka, Apache Spark) and IT monitoring tools (e.g., Prometheus, Grafana, Datadog)
- Knowledge of machine learning algorithms and data analysis techniques.
- Experience working with agentic systems and agents for automation
- Excellent problem-solving skills and the ability to work collaboratively in a fast-paced environment
- Strong communication and interpersonal skills
Preferred qualifications, capabilities and skills:
- Master's degree in a related field
- Certifications in cloud computing, AI/ML, or data engineering (e.g., AWS Certified Machine Learning, Google Professional Data Engineer)
- Experience with DevOps practices and CI/CD pipelines