

Key job responsibilities
• Develop high-performance inference software for a diverse set of neural models, typically in C/C++
• Design, prototype, and evaluate new inference engines and optimization techniques
• Participate in deep-dive analysis and profiling of production code
• Optimize inference performance across various platforms (on-device, cloud-based CPU, GPU, proprietary ASICs)
• Collaborate closely with research scientists to bring next-generation neural models to life
• Partner with internal and external hardware teams to maximize platform utilization
• Work in an Agile environment to deliver high-quality software
• Hold a high bar for technical excellence within the team and across the organization
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language
- Bachelor's degree in Computer Science, Computer Engineering, or related field
- Strong C/C++ programming skills
- Solid understanding of deep learning architectures (CNNs, RNNs, Transformers, etc.)
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience with inference frameworks such as PyTorch, TensorFlow, ONNXRuntime, TensorRT, LLaMA.cpp, etc.
- Proficiency in performance optimization for CPU, GPU, or AI hardware
- Proficiency in kernel programming for accelerated hardware using programming models such as (but not limited to) CUDA, OpenMP, OpenCL, Vulkan, and Metal
- Experience with latency-sensitive optimizations and real-time inference
- Understanding of resource constraints on mobile/edge hardware
- Knowledge of model compression techniques (quantization, pruning, distillation, etc.)
- Experience with LLM efficiency techniques like speculative decoding and long context
- Strong communication skills and ability to work in a collaborative environment
- Passion for solving complex problems and driving innovation in AI technology
משרות נוספות שיכולות לעניין אותך

Position Responsibilities:
- Participate in the design, development, evaluation, deployment and updating of data-driven models and analytical solutions for machine learning (ML) and/or natural language (NL) applications.- Routinely build and deploy ML models on available data.
- 3+ years of building machine learning models for business application experience
- PhD, or Master's degree and 6+ years of applied research experience
- Experience programming in Java, C++, Python or related language
- Experience with neural deep learning methods and machine learning
- Experience with modeling tools such as R, scikit-learn, Spark MLLib, MxNet, Tensorflow, numpy, scipy etc.
- Experience with large scale distributed systems such as Hadoop, Spark etc.
משרות נוספות שיכולות לעניין אותך

- Pioneer new approaches to foundation models
- Publish and present research at top-tier conferences and journals
- Work with state-of-the-art LLMs and multi-modal foundation models
- Access to substantial computational resources for researchKey job responsibilities
- Research and develop novel techniques for efficient runtime inference (low latency, high throughput)
- Design and evaluate efficient foundation model architectures
- Create new methods for improving training efficiency
- Conduct experimental studies to validate efficiency improvements
- Write high-quality Python code to implement research ideas- Author technical documentation and research papers
- Present findings to technical and non-technical stakeholders
A day in the life
Your day might start with a team stand-up to discuss ongoing projects and brainstorm solutions to technical challenges. You'll spend time implementing and testing new efficiency optimization techniques in Python, analyzing performance metrics, and iterating on approaches. You'll collaborate with team members to review code and research results, participate in technical discussions about architecture designs, and engage with other AGI teams to understand their efficiency needs. You might end your day analyzing experimental results or writing up findings for a research paper. Throughout the week, you'll have opportunities to present your work to stakeholders and contribute to the team's research roadmap.
- 3+ years of building models for business application experience
- PhD, or Master's degree and 4+ years of CS, CE, ML or related field experience
- Experience programming in Java, C++, Python or related language
- Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
משרות נוספות שיכולות לעניין אותך

Key job responsibilities
• Develop high-performance inference software for a diverse set of neural models, typically in C/C++
• Design, prototype, and evaluate new inference engines and optimization techniques
• Participate in deep-dive analysis and profiling of production code
• Optimize inference performance across various platforms (on-device, cloud-based CPU, GPU, proprietary ASICs)
• Collaborate closely with research scientists to bring next-generation neural models to life
• Partner with internal and external hardware teams to maximize platform utilization
• Work in an Agile environment to deliver high-quality software
• Hold a high bar for technical excellence within the team and across the organization
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language
- Bachelor's degree in Computer Science, Computer Engineering, or related field
- Strong C/C++ programming skills
- Solid understanding of deep learning architectures (CNNs, RNNs, Transformers, etc.)
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience with inference frameworks such as PyTorch, TensorFlow, ONNXRuntime, TensorRT, LLaMA.cpp, etc.
- Proficiency in performance optimization for CPU, GPU, or AI hardware
- Proficiency in kernel programming for accelerated hardware using programming models such as (but not limited to) CUDA, OpenMP, OpenCL, Vulkan, and Metal
- Experience with latency-sensitive optimizations and real-time inference
- Understanding of resource constraints on mobile/edge hardware
- Knowledge of model compression techniques (quantization, pruning, distillation, etc.)
- Experience with LLM efficiency techniques like speculative decoding and long context
- Strong communication skills and ability to work in a collaborative environment
- Passion for solving complex problems and driving innovation in AI technology
משרות נוספות שיכולות לעניין אותך