Job responsibilities
- Develop, test, and debug automated tasks for applications, systems, and infrastructure.
- Troubleshoot priority incidents and facilitate blameless post-mortems.
- Work with development teams throughout the software life cycle to ensure sustainable software releases.
- Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions.
- Build automations to reduce manual interventions for production operations.
- Build monitoring and observability tools and processes.
- Build and drive adoption for greater self-healing and resiliency patterns.
- Lead and participate in performance tests to identify bottlenecks, opportunities for optimization, and capacity demands.
- Participate in the 24x7 support coverage.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 2+ years applied experience
- Demonstrate strong development skills in PySpark, Python, or Scala.
- Possess experience in data preprocessing, ETL processes, and data pipeline creation.
- Have experience with data storage solutions and cloud services like AWS EMR, EC2, and S3.
- Utilize logging and monitoring tools such as Splunk, Dynatrace, and CloudWatch.
- Implement CI/CD processes using tools like Jenkins and Spinnaker.
- Develop and manage ML models in production, including model versioning and rollback strategies.
- Optimize ML models and infrastructure for performance and cost-efficiency.
- Conduct A/B testing and evaluate model performance.
- Exhibit strong analytical and troubleshooting skills for ML pipelines and production systems.
- Communicate effectively and collaborate with data scientists, engineers, and global support teams
Preferred qualifications, capabilities, and skills
- Bachelor’s (with 4+yrs of exp) or master’s degree (with 2+yrs of exp) in computer science, Data Science, Engineering, or a related field.
- Experience with AWS cloud services (e.g.: AWS EMR, ECS/EKS, EC2, S3)
- Relevant certifications in cloud platforms (e.g., AWS, DevOps).