

Share
We're looking for candidates who combine strong software engineering fundamentals with practical ML system development experience. You'll need to demonstrate expertise in building scalable, fault-tolerant distributed systems, with a track record of shipping production services that handle large-scale workloads. While ML engineering skills are important, we prioritize candidates who understand professional software engineering practices across the full development lifecycle - from system design and coding standards to testing, deployment, and operational excellence.
Key job responsibilities
- Design, develop and maintain ML model serving infrastructure to enable high-throughput, low-latency inference in production environments- Develop efficient data processing pipelines to handle large-scale training and inference data
- Support experimentation and A/B testing infrastructure to evaluate model improvements- Participate in code reviews, technical design discussions, and sprint planning to ensure high quality software delivery
- Strong understanding of ML fundamentals and common optimization techniques
- Experience with data processing and ETL pipelines at scale
- 3+ years of non-internship professional software development experience
- Bachelor's degree in computer science or equivalent
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Proven experience designing and implementing scalable, fault-tolerant distributed systems with focus on performance, reliability, and operational excellence
- Master's degree in computer science or equivalent
- Strong background in event-driven architectures, distributed caching, REST APIs, vector search, and data processing patterns (CDC, streaming)
- Deep experience with core AWS services including DynamoDB, ElastiCache, Lambda, S3, OpenSearch, and infrastructure-as-code (CDK/CloudFormation)
These jobs might be a good fit