The point where experts and best companies meet
Share
Key job responsibilities
• Develop high-performance inference software for a diverse set of neural models, typically in C/C++
• Design, prototype, and evaluate new inference engines and optimization techniques
• Participate in deep-dive analysis and profiling of production code
• Optimize inference performance across various platforms (on-device, cloud-based CPU, GPU, proprietary ASICs)
• Collaborate closely with research scientists to bring next-generation neural models to life
• Partner with internal and external hardware teams to maximize platform utilization
• Work in an Agile environment to deliver high-quality software against aggressive schedules
• Hold a high bar for technical excellence within the team and across the organization
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language
- Strong C/C++ programming skills
- Solid understanding of deep learning architectures (CNNs, RNNs, Transformers, etc.)
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent
- Experience with inference frameworks such as PyTorch, TensorFlow, ONNXRuntime, TensorRT, LLaMA.cpp, etc.
- Proficiency in performance optimization for CPU, GPU, or AI hardware
- Experience with latency-sensitive optimizations and real-time inference
- Understanding of resource constraints on mobile/edge hardware
- Knowledge of model compression techniques (quantization, pruning, distillation, etc.)
- Strong communication skills and ability to work in a collaborative environment
- Passion for solving complex problems and driving innovation in AI technology
These jobs might be a good fit