As an SDE on our team, you will drive the development of custom Large Language Models (LLMs) across languages, domains, and modalities. You will be responsible for fine-tuning state-of-the-art LLMs for diverse use cases while optimizing models for high-performance deployment on AWS’s custom AI accelerators. This role offers an opportunity to innovate at the forefront of AI, tackling end-to-end LLM training pipelines at massive scale and delivering next-generation AI solutions for top AWS clients.
Key job responsibilities
• Large-Scale Training Pipelines: Design and implement distributed training pipelines for LLMs using tools such as Fully Sharded Data Parallel (FSDP) and DeepSpeed, ensuring scalability and efficiency• LLM Customization & Fine-Tuning: Adapt LLMs for new languages, domains, and vision applications through continued pre-training, fine-tuning, and Reinforcement Learning with Human Feedback (RLHF)• Model Optimization on AWS Silicon: Optimize AI models for deployment on AWS Inferentia and Trainium, leveraging the AWS Neuron SDK and developing custom kernels for enhanced performance• Customer Collaboration: Interact with enterprise customers and foundational model providers to understand their business and technical challenges, co-developing tailored generative AI solutions
- 5+ years of non-internship professional software development experience
- 5+ years of programming with at least one software programming language experience
- Experience as a mentor, tech lead or leading an engineering team
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Hands-on experience with deep learning and machine learning methods (e.g., for training, fine tuning, and inference)
- Experience with design, development, and optimization of generative AI solutions, algorithms, or technologies
- Bachelor's degree in Computer Science or equivalent
- Hands-on experience with at least one ML library or framework
- 2+ years of experience in developing, deploying or optimizing ML models
- 5+ years of experience in the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations
משרות נוספות שיכולות לעניין אותך