Finding the best job has never been easier
Share
Key job responsibilities
Lead the definition, design, architecture quality, implementation, and delivery of the most advanced, most difficult, most cross-cutting, and/or most ambiguous challenges spanning across our ML infrastructure.
- Actively mentor senior and Principal engineers, scale yourself by developing and institutionalizing best practices in AI/ML infrastructure and distributed computing across the organization.A day in the life
8+ years of professional software development experience in distributed systems with emphasis on ML infrastructure
- 8+ years of current programming experience building ML infrastructure using languages such as Python, C++ or Rust
- Hands-on experience with parallel computing platforms such as CUDA, OpenMP, etc
- Deep understanding of AI frameworks such as PyTorch, TensorFlow, and JAX, and their demands on underlying compute infrastructure, memory bandwidth, network interconnect, and storage as scale goes up
- Knowledge of emerging AI hardware accelerators and architectures
- Experience with containerization and orchestration technologies (Docker, Kubernetes)
- Experience with cloud computing platforms (AWS, Azure, GCP) and their offerings
- 5+ years of non-internship professional software development experience
- 5+ years of programming with at least one software programming language experience
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience as a mentor, tech lead or leading an engineering team
- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent
These jobs might be a good fit