M.S. or PhD in Electrical Engineering/Computer Science or a related field (mathematics, physics or computer engineering), with a focus on computer vision and/or machine learning
Rich experiences in video machine learning covering one of the topics: Video Understanding / Video Foundation Model / Multi-modal LLM
Proven prototyping skills and proficient in coding (C, C++, Python)
Excellent written and verbal communications skills, be comfortable presenting research to large audiences, and have the ability to work hands-on in multi-functional teams
Preferred Qualifications
Publication record in relevant venues (e.g. NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, SIGGRAPH)
Industry experiences with multi-modal foundation model and frameworks
Knowledge and understanding of generative AI, multi-modal large language model, video caption
Solid understanding of state-of-the-arts in Video Understanding and familiar with the challenges of developing algorithms that run efficiently on resource constrained platforms
Team oriented, result oriented, and self motivated
Apple is an equal opportunity employer that is committed to inclusion and diversity, and thus we treat all applicants fairly and equally. Apple is committed to working with and providing reasonable accommodation to applicants with physical and mental disabilities.