Key Responsibilities
System Architecture & Design:
- Architect and build end-to-end performance monitoring systems that detect and analyze minute regressions and performance anomalies in production environments.
- Design scalable solutions that collect, process, and analyze large volumes of performance data from diverse environments (e.g., bare metal, VMs, cloud infrastructures).
- Develop modular systems that integrate statistical techniques and machine learning models to extract actionable insights and drive continuous performance improvements.
Statistical Analysis & Machine Learning:
- Apply advanced statistical methods (e.g., change point detection, trend analysis) to identify subtle performance variations and anomalies in noisy datasets.
- Integrate machine learning techniques, including reinforcement learning and predictive analytics, to optimize resource allocation and proactively detect performance degradations.
- Collaborate with data science teams to refine models and validate findings against production data.
Software Development:
- Write clean, efficient code in languages such as C/C++, Go, or Python, ensuring high performance and low overhead in critical production systems..
Collaboration & Communication:
- Work cross-functionally with infrastructure, product, and operations teams to integrate performance insights into broader system optimization strategies.
- Present data-driven insights and performance recommendations to technical and non-technical stakeholders.
- Mentor junior team members and contribute to best practices in performance engineering and analytics.
Basic Qualifications
- Bachelor’s or higher degree in one of Computer Science, Data Science, AI/ML, Statistics, Mathematics or a related technical field.
- Strong grasp of statistical analysis and machine learning techniques and willingness to apply them to the system performance domain.
- 2+ years of experience in building production-grade Data/ML systems.
- Proficiency in one or more programming languages (e.g., C/C++, Go, Python)
Preferred Qualifications
- PhD in Computer Science, Machine Learning, Statistics, Data Science or related fields
- 5+ years of experience in AI/Data Science
- Experience designing and deploying in-production systems for performance regression detection or optimization.
- Background in implementing automated root cause analysis, anomaly detection, or predictive modeling using ML frameworks.
- Understanding of containerization, orchestration platforms (Kubernetes, Docker), and cloud infrastructure (AWS, GCP).
- Strong analytical skills, excellent communication abilities, and a passion for solving complex performance problems in dynamic environments.
Strong candidates may also have experience with:
- Knowledge of modern profiling tools (e.g., perf, eBPF) and techniques for low-level performance measurement and debugging.
What You’ll Achieve
- Create systems that empower teams to identify and address performance anomalies proactively, reducing downtime and resource waste.
- Leverage data-driven insights to drive system optimizations that balance performance, scalability, and cost efficiency.
- Contribute to a culture of continuous improvement, using innovative statistical and machine learning methods to shape the future of performance insights.
* Accommodations may be available based on religious and/or medical conditions, or as required by applicable law. To request an accommodation, please reach out to .