Job responsibilities
- Architects and implements distributed ML infrastructure, including inference, training, scheduling, orchestration, and storage.
- Develops advanced monitoring and management tools for high reliability and scalability.
- Optimizes system performance by identifying and resolving inefficiencies and bottlenecks.
- Collaborates with product teams to deliver tailored, technology-driven solutions.
- Drives decisions that influence the product design, application functionality, and technical operations and processes
- Integrates Generative AI within the ML Platform using state-of-the-art techniques.
- Adds to the team culture of diversity, equity, inclusion, and respect
- Provides hands on experience with the ability to analyze, write, develop, test, and release products using Python on AWS
- Adheres to changing organization policies for in-office presence 3 days a week as this is a Hybrid role.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 5+ years applied experience
- Deep expertise in AWS / Azure and Kubernetes ecosystem, including EKS, Helm, Custom Operators and Terraform.
- Advanced in Python programming language
- Background in High Performance Computing, ML Hardware Acceleration (e.g., GPU, TPU, RDMA), or ML for Systems.
- Strong coding skills and experience in developing large-scale ML systems.
- Extensive hands-on experience with ML frameworks (TensorFlow, PyTorch, JAX, scikit-learn).
- Proven track record in contributing to and optimizing open-source ML frameworks.
- Strategic thinker with the ability to craft and drive a technical vision for maximum business impact.
- Demonstrated leadership in working effectively with engineers, data scientists, and ML practitioners.
- Proven ability to identify trade-offs, clarify project ambiguities, and drive decision-making
- Ability to tackle design and functionality problems independently with little to no oversigh
Preferred qualifications, capabilities, and skills
- Excellent problem-solving and analytical skills
- Ability to work independently and in a team.
- Passion for Innovations and continuous Learning
- Experince with Java is a plus