Responsibilities:Develop in depth understanding of ML workload, develop algorithms and optimization techniques to drive PPA (Performance, Power, Area) on current and future Arm platforms.
- Implementation of performance critical machines specific kernel or ARM assembly code on dedicated hardware accelerator and CPU to drive algorithm and architecture exploration
- Collaborate with ML Algorithm development team to explore performance critical analysis
- Develop internal tooling capabilities to support algorithmic and architecture exploration
Required Skills and Experience :- 3+ years experience in developing performance critical kernel on dedicated accelerator, GPU or CPU
- Graduate students in Computer Engineering, Electrical Engineering, Computer Science or other related technical fields
- Deep knowledge in machine learning, deep learning, and neural network design, optimization, and compression techniques
- Overall high-level knowledge of computer architecture, systems, and HW-SW co-design
- Ability to develop and work with large software systems in programming languages like Python
- Knowledge of cutting-edge deep learning libraries such as Tensorflow, and Pytorch
- Willing to learn and train large deep learning models on GPU-based systems
“Nice To Have” Skills and Experience :- Experience with ML model design, optimization, and HW-SW co-development methodology
- ML Model Optimization techniques targeting PPA (Performance, Power, and Area) of neural networks on ARM compute platforms
- Adaptability to the fast-moving ML industry and willingness to learn new technology in a very dynamic environment
Salary Range:$191,100-$258,500 per year