Job responsibilities
- Develops cutting edge AI/ML Platform solutions using LLMs, public cloud and modern standards and patterns.
- Develops and implements state-of-the-art generative AI services leveraging Azure Open AI models and AWS Bedrock service.
- Develops solutions using AWS Cloud Services for compute, storage, databases, and security and Azure Services
- Develops advanced monitoring and management tools for high reliability and scalability.
- Architect and implement distributed ML infrastructure, including inference, training, scheduling, orchestration, and storage.
- Optimizes system performance by identifying and resolving inefficiencies and bottlenecks.
- Works closely with the Product team to design, build and deliver capabilities in agile sprints.
- Collaborates with cross-functional teams, including data scientists, software engineers, and designers, to integrate generative AI into various applications and products.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 5+ years applied experience.
- Strong coding skills and experience in developing large-scale ML systems.
- Deep expertise in AWS / GCP and Kubernetes ecosystem, including EKS, Helm, and custom operators.
- Hands-on experience with ML frameworks (TensorFlow, PyTorch, JAX, scikit-learn).
- Hands-on experience working on AWS Cloud Based applications development using EC2, EKS, Lambda, SQS, SNS, RDS Aurora MySQL & Postgres, DynamoDB, and Kinesis.
- Deep expertise across application, data, security, and infrastructure disciplines
- Experience in setting up public cloud infrastructure using TerraForm.
- Experience working containerized services on Kubernetes or ECS
- Experience with Python, Java, and REST APIs.
- Solid understanding of improving and debugging backend performance bottlenecks.
- Experience with application production readiness, production monitoring, and production issue triaging
Preferred qualifications, capabilities, and skills
- Knowledge of AWS Sagemaker and data analytics tools will be a plus.
- Knowledge on new model architectures using optimizations like quantization and pruning.
- Ability to adapt to new technologies and learn quickly in a fast-paced environment
- Knowledge or experience with working on Azure