We write performant and scalable frameworks (in Swift and C++) to distribute and coordinate ML inference tasks to different hardware acceleration IP blocks on different SoCs. You will integrate inference code into a full service stack to ensure that user traffic is served reliably and performantly, and will have a strong focus on developing code that is easy and safe to develop, update and monitor.