CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleThe Research and Implementation of Intelligent Case Merging System Based on Multimedia Data
Thesis Advisor李子青
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机技术
Keyword案件串并 Lda模型 文本建模 图像检索 相似性 Case-merging Lda Model Text Modeling Image Retrieval Similarity
Abstract摘要 随着公安信息系统的不断建设,目前公安案件数据库已经积累了海量数据,包括文本、图像等等。传统的案件串并系统通常只能对单一文本类型数据进行串并,也无法分析数据潜在的关键信息。如何利用这些不同类型的数据用于案件内在的关联分析,对案件数据进行更深层更准确的挖掘,帮助公安业务人员快速高效的在海量案件库中找到相似的案件进行串并,成为了本文待解决的问题。 本文工作主要包括以下三个方面: 第一,详细介绍了案件串并系统中用到的关键技术,如文本预处理、文本建模、图像检索、主题识别等方法,并总结了这些关键技术的优缺点及相关领域的研究进展。 第二,将LDA(Latent Dirichlet Allocation)主题模型引入案件串并领域,对案件文本进行LDA建模,挖掘案件潜在的语义信息,提高案件串并质量。在此基础上利用图像检索算法,提出了一种融合LDA文本和图像信息的案件串并方法,提高串并结果的准确率。 第三,利用SharePoint和SQL Server 2013开发平台,集成上述算法开发出案件串并系统,并利用网络爬虫爬取案件数据,验证算法,形成应用供用户使用。 在案件数据上的实验结果表明,本文利用LDA主题模型进行案件串并的方法优于传统的词袋方法,准确率达到了72%,在融合了图像检索算法后其结果提升了1%-4%,证明了LDA主题模型算法以及融合算法在案件串并上都是合理有效的。最终开发完成的系统包括案件统计、案件串并、数据爬取和存储等,交互性良好,具备了完整系统的要素。
Other AbstractAbstract With the construction of public security information system, the current public security database has accumulated massive data, including text, images and so on. The traditional case-merging systems are usually designed for text data only and can’t analyze potential information of case data. Investigators need to find helpful data in a large number of case data quickly and efficiently. How to use these different types of data to do latent semantic analysis and mining case data deeper and more accurately has become a problem to be solved. The main contribution of this paper includes the following four aspects: Firstly, this paper investigates the key technologies in the field of case-merging, such as text preprocessing, text modeling, image retrieval and topic recognition, and summarizes the advantages and disadvantages of these technologies and research progress. Secondly, this paper introduces the LDA model to the field of case-merging, which can help improve the quality of case-merging. On the basis of image retrieval algorithm, the author proposed a case-merging method of combining text and image information to improve the accuracy of case-merging. Lastly, the author use SharePoint Server and SQL Server to integrate these algorithms into one system and use web-spider to get case data to prove the validation of algorithm and allow users to use this system. The experimental results on the case data show that, the LDA model is better than the traditional bag-of-words method on case data, which achieves the accuracy of 72%. After the integration of image retrieval algorithm, the accuracy of results raises 1%-4%. The results prove our algorithm is effective on case data. The system we accomplished includes the functions of case statistics, case merging and data capture and can interact with users well.
Other Identifier2011E8009061001
Document Type学位论文
Recommended Citation
GB/T 7714
吴迪. 基于多媒体数据的案件智能串并系统的研究与实现[D]. 中国科学院自动化研究所. 中国科学院大学,2014.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_2011E800906100(2804KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[吴迪]'s Articles
Baidu academic
Similar articles in Baidu academic
[吴迪]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[吴迪]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.