M.S. or PhD in Electrical Engineering/Computer Science or a related field (mathematics, physics or computer engineering), with a focus on computer vision and/or machine learning
Rich experiences in video machine learning covering one of the topics: Video Understanding / Video Foundation Model / Multi-modal LLM
Proven prototyping skills and proficient in coding (C, C++, Python)
Excellent written and verbal communications skills, be comfortable presenting research to large audiences, and have the ability to work hands-on in multi-functional teams
Preferred Qualifications
Publication record in relevant venues (e.g. NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, SIGGRAPH)
Industry experiences with multi-modal foundation model and frameworks
Knowledge and understanding of generative AI, multi-modal large language model, video caption
Solid understanding of state-of-the-arts in Video Understanding and familiar with the challenges of developing algorithms that run efficiently on resource constrained platforms
Team oriented, result oriented, and self motivated