CASIA OpenIR  > 09年以前成果
A multimodal scheme for program segmentation and representation in broadcast video streams
Wang, Jinqiao1; Duan, Lingyu2; Liu, Qingshan1; Lu, Hanqing1; Jin, Jesse S.3
Source PublicationIEEE TRANSACTIONS ON MULTIMEDIA
2008-04-01
Volume10Issue:3Pages:393-408
SubtypeArticle
AbstractWith the advance of digital video recording and playback systems, the request for efficiently managing recorded TV video programs is evident so that users can readily locate and browse their favorite programs. In this paper, we propose a multimodal scheme to segment and represent TV video streams. The scheme aims to recover the temporal and structural characteristics of TV programs with visual, auditory, and textual information. In terms of visual cues, we develop a novel concept named program-oriented informative images (POIM) to identify the candidate points correlated with the boundaries of individual programs. For audio cues, a multiscale Kullback-Leibler (K-L) distance is proposed to locate audio scene changes (ASC), and accordingly ASC is aligned with video scene changes to represent candidate boundaries of programs. In addition, latent semantic analysis (LSA) is adopted to calculate the textual content similarity (TCS) between shots to model the inter-program similarity and intra-program dissimilarity in terms of speech content. Finally, we fuse the multimodal features of POIM, ASC, and TCS to detect the boundaries of programs including individual commercials (spots). Towards effective program guide and attracting content browsing, we propose a multimodal representation of individual programs by using POIM images, key frames, and textual keywords in a summarization manner. Extensive experiments are carried out over an open benchmarking dataset TRECVID 2005 corpus and promising results have been achieved. Compared with the electronic program guide (EPG), our solution provides a more generic approach to determine the exact boundaries of diverse TV programs even including dramatic spots.
KeywordBroadcast Video Latent Semantic Analysis Multimodal Fusion Tv Program Segmentation
WOS HeadingsScience & Technology ; Technology
WOS KeywordRETRIEVAL
Indexed BySCI
Language英语
WOS Research AreaComputer Science ; Telecommunications
WOS SubjectComputer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications
WOS IDWOS:000258767100009
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/9521
Collection09年以前成果
Affiliation1.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100080, Peoples R China
2.Inst Infocomm Res, Singapore 119613, Singapore
3.Univ Newcastle, Sch Design Commun & Informat Technol, Callaghan, NSW 2308, Australia
Recommended Citation
GB/T 7714
Wang, Jinqiao,Duan, Lingyu,Liu, Qingshan,et al. A multimodal scheme for program segmentation and representation in broadcast video streams[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2008,10(3):393-408.
APA Wang, Jinqiao,Duan, Lingyu,Liu, Qingshan,Lu, Hanqing,&Jin, Jesse S..(2008).A multimodal scheme for program segmentation and representation in broadcast video streams.IEEE TRANSACTIONS ON MULTIMEDIA,10(3),393-408.
MLA Wang, Jinqiao,et al."A multimodal scheme for program segmentation and representation in broadcast video streams".IEEE TRANSACTIONS ON MULTIMEDIA 10.3(2008):393-408.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Jinqiao]'s Articles
[Duan, Lingyu]'s Articles
[Liu, Qingshan]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Jinqiao]'s Articles
[Duan, Lingyu]'s Articles
[Liu, Qingshan]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Jinqiao]'s Articles
[Duan, Lingyu]'s Articles
[Liu, Qingshan]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.