Job responsibilities
- Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems
- Develops secure high-quality production code, and reviews and debugs code written by others
- Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems
- Leads communities of practice across Software Engineering to drive awareness and use of new and leading-edge technologies
- Adds to team culture of diversity, equity, inclusion, and respect
- Collaborate with cross-functional teams, including data scientists and software engineers, to understand model requirements and integrate them into applications
- Develop and implement strategies for deploying machine learning models into production, ensuring scalability, reliability, and efficiency
- Design and maintain continuous integration and continuous deployment (CI/CD) pipelines to automate the testing, deployment, and updating of machine learning models
- Manage and optimize the infrastructure required for running machine learning models, including cloud services, containerization (e.g., Docker), and orchestration tools (e.g., Kubernetes)
- Implement monitoring and logging solutions to track model performance, detect anomalies, and ensure models are operating as expected in production.
- Respond to incidents and troubleshoot issues related to model performance, data quality, and infrastructure
Required qualifications, capabilities, and skills
- Formal training or certification on security engineering concepts and 5+ years applied experience
- Hands-on practical experience delivering system design, application development, testing, and operational stability
- Advanced in one or more programming language(s)
- Proficient in all aspects of the Software Development Life Cycle
- Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.)
- Practical cloud native experience
- Strong expertise in deploying and managing machine learning models in production environments
- Proficiency in building and maintaining CI/CD pipelines for machine learning workflows.
- Expertise in cloud platforms (e.g., AWS, Google Cloud, Azure), containerization technologies (e.g., Docker, Kubernetes)
- Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
- Advanced Python Programming Skills including Pandas, Numpy and Scikit-Learn. Strong SQL skills a plus
Preferred qualifications, capabilities, and skills
- Proven experience in deploying and managing large-scale machine learning models in production environments
- Strong ability to monitor ML models in production, addressing model performance and data quality issues effectively
- Working knowledge of security best practices and compliance standards for Machine Learning systems
- Experience with infrastructure optimization techniques to enhance performance and efficiency
- Development of REST APIs using frameworks such as Flask or FastAPI for seamless integration into business solutions
- Familiarity with creating and utilizing synthetic datasets to improve model training and evaluation
- Bachelor's degree in Computer Science, Engineering, or a related field, with relevant experience in ML Ops or related roles