Upcoming Lectures
Upcoming Lectures
Lecture 1 Efficient Multi-modal Analysis and its Applications
日期: 2022-09-07 点击:
Speaker Bio
Yi Yang is a Distinguished Professor of Computer Science at Zhejiang University. He completed his Ph.D. in Computer Science at Zhejiang University in 2010. He worked as a Discovery Early Career Research Award (DECRA) Researcher at the University of Queensland from 2013 to 2015, and then moved to the University of Technology Sydney. He became an Associate Professor in 2016 and a full Professor in 2017. Prof. Yi Yang served as an Area Chair at ICCV 2019, ICCV 2021, and CVPR 2021. He has also received awards such as the Google Faculty Research Award (2016), the Digital Innovation Award from the Australian Computer Society (2017), and the AWS Machine Learning Award (2020).
杨易,国家重大引才计划专家,浙江大学求是讲席教授,目前担任浙江大学计算机学院副院长、微软-教育部视觉感知重点实验室主任、人工智能省部共建协同创新中心副主任。曾获教育部全国优秀博士论文,浙江省自然科学一等奖,澳大利亚科研终身成就奖,澳大利亚研究理事会青年职业奖,澳大利亚计算机学会颠覆创新金奖,谷歌学者研究奖和AWS机器学习研究奖等二十余次人工智能领域国际奖项。Google Scholar 引用40000余次,H-index 105,入选2021年人工智能全球最具影响力学者榜单AI 2000中经典AI、多媒体、计算机视觉、数据库四个领域全球影响力前一百的最具影响力学者,近四年连续入Clarivate Analytics全球高被引学者。国际科研竞赛中累计获得40余次奖项,含20次世界冠军。担任7种重要国际期刊的副主编或领域主编,以及CVPR、ICCV、IJCAI,ACM MM等重要国际学术会议的重要职务等19次。
Abstract
In this talk, I will introduce our recent works on large-scale video scene understanding. First, I will explain the major challenges for multi-scene analysis. I will introduce solutions for large-scale video semantic understanding and discuss the impact of temporal context information on the performance. Second, I will introduce an efficient video modeling method based on video data. It effectively tackles sequential video sequence and provides a feasible solution towards action understanding. Finally, I will discuss the methods to improve model efficiency when dealing with multi-modal data (such as audio, text), and demonstrate the potential multi-modal video applications in the real-world.
大规模多场景多模态智能分析面临诸多挑战。本报告将首先讨论目前多场景视觉分析技术面临的瓶颈,概述智能视觉系统在物体和动作感知上的进展与挑战,介绍高效视觉感知模型设计策略和视觉感知算法在实际场景下应用实例。其次,本报告将讨论视频时序建模的方法,结合视频分类、定位、分割等任务,介绍高效视频分析的前沿技术。最后,本报告将介绍多模态数据的联合训练方法,讨论如何使多模态算法具备更强的匹配、融合和推理能力。