基于多模态关联分析的新闻视频标注与检索

CASIA OpenIR > 毕业生 > 博士学位论文

	基于多模态关联分析的新闻视频标注与检索
其他题名	News Video Annotation and Retrieval based on Multi-modal Association Analysis
	张师林
	2012-12-05
学位类型	工学博士
中文摘要	随着多媒体技术的进步和互联网技术的发展，视频数据的数量迅速增长，视频信号已经成为信息系统中一种重要的信息表示形式。新闻视频作为视频信息中有代表性的一种媒体，准确、及时地报道世界各地政治、经济、军事、娱乐等各个方面信息，逐渐成为人们获取新闻资讯的主要途径。视频新闻不同于普通视频之处在于，其中包括了具体的时间、地点、人物和事件等要素，这些内容提供了大量有价值的信息。观众关注他们感兴趣的新闻，并希望能够看到那些与他们兴趣有关的新闻的后续报道。当一件重大事件被报道的时候，比如“党的第十八次代表大会”、“最美丽的女教师张丽莉”，各种不同来源的相关新闻报道迅速出现。然而与此重大事件相关的信息往往孤立地分散在不同的电台播报中，并且出现在不同的时间。人们希望能够采用某种方法对一段时间内的各主要媒体所报道的新闻进行检测和追踪，自动把相关事件的信息汇总，供人查阅。目前新闻视频处理方法大多是基于人工的，主要由专人对每天从卫星上接收到的各电视台的新闻播报进行全天不间断地收看和记录，这是一种费时费力的繁重体力劳动。而且因为里面包含了人的因素，可能会产生漏报和不能实时处理的问题。为了充分利用接收的新闻视频资源，加快处理的速度，提取更多有价值的信息，并提高信息的时效性，需要采用机器设备对所收集的视频信息进行自动地、并具有一定智能水平的分析和处理，其中对新闻视频的标注和检索是两种最基本的任务。新闻视频标注具有潜在的市场前景和广泛的用户需求。基于多模态关联分析的新闻视频标注关键技术研究及服务产品的研发具有重要的应用价值。该项技术的发展能有效带动智能搜索相关技术的研究与综合集成，在电子商务、教育、旅游、智能交通、军事、国家安全等重大应用领域具有重要的产业价值。本文对新闻视频标注技术进行了较深入的研究和探讨。论文的工作主要体现在以下几个方面： 1）针对新闻视频中文字识别和语音识别错误率还比较高的问题，本文提出了一种基于融合多模态信息的新闻视频自动标注方法。该方法综合利用了文字识别和语音识别结果之间在字面和语义上的对应关系，并以此提高新闻视频标注的准确率。另外，利用命名实体识别技术，该方法可以自动标注新闻视频的中新闻事件的人物、发生地点和主题词。实验结果表明，该方法的标注准确率比同类方法更高，并且不受词表限制。 2）考虑到新闻视频自身所包含的语义信息比较有限，本文所提出的方法借助于互联网上的海量媒体，利用包含语义信息的网络新闻来标注新闻视频。互联网数据量庞大，各种媒体形式的网页多达数十亿，其自身就是一个巨大的知识库。充分利用互联网这个媒体数据库，新闻视频的标注和索引能够得以实现。由于这种应用模式需要从新闻视频检索Web内容，然后又从Web中各种形式的内容提取摘要之后传导回新闻视频，从而具有跨媒体的技术特点。借助于这种跨媒体应用，人们的信息获取能力将得以增强，可以从Internet上的媒体中获取很多有价值的信息。探索新闻视频和Web内容的语义联系，将使得新闻视频的自动标注和索引成为可能，为海量新闻视频数据的...
英文摘要	With the rapid development of multimedia and network technology, various types of videos, flashes, and videos have become the mainstream of information exchange. News video is a representative media type in all video forms, which reports news of politics, economics, military and entertainment accurate and timely. The news information is the major means by which people acquire useful information. News videos supply abundant information and the people want to watch the news which is related to their appetite. When an important event occurs, the audiences care about reporting of subsequent problems. For example, “The Eighteenth Congress of the Communist Party”, and “the most beautiful teacher Zhang Lili”, the related reporting is springing up from different media sources. But the valuable information is dispersed in every unnoticed place, and the people cannot detect and track some interesting news. Now most news content processing procedure is done by hand. Even some automatic method must be carried out with human behavior, as the automated news annotation prone to error. Human labor is time-consuming and the annotation results are subjective. So the news video annotation tasks call for an efficient and automatic machine method. The automated news video management system is needed by media market. Cross media association analysis based news video annotation technique is useful to accomplish the above task. The development of the technique laid a foundation for e-business, education, tour, intelligent transportation and national security. The main contributions of this thesis are listed as follows: Firstly, we present a novel approach to fuse multi-modal information to annotate the news video. As the news video speech recognition and optical character recognition are with low accuracy, the complementation of the above two recognition results is accomplished to annotate the news video. Further more, by name entity recognition, we can find the person names, events places and the key words of the news video contents. Experiments on the CCTV news set show that the new video annotation accuracy is superior to the other approaches. Secondly, we propose a novel method for news video annotation by leveraging the Web news information. Due to the limited information contained in the news video, we explore the Web news and find the corresponding news on the web that is most similar to the news video. By mining the relationship, we can annotate the news video ...
关键词	跨媒体关联分析视频检索关键帧视觉词袋模型语音识别文字识别多模态融合图模型命名实体 Cross Media Association Analysis Video Retrieval Key Frame Bag Of Words Model Speech Recognition Optical Character Recognition Multi-modal Information Fusion Graph Model Name Entity
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6489
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张师林. 基于多模态关联分析的新闻视频标注与检索[D]. 中国科学院自动化研究所. 中国科学院大学,2012.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20091801462909（11266KB）			暂不开放	CC BY-NC-SA