CASIA OpenIR  > 毕业生  > 博士学位论文
综合集成研讨厅中若干非结构化信息处理技术的研究
Alternative TitleResearch on Unstructured Information Processing In Cyberspace for Workshop of Metasynthetic Engineering
王艾
Subtype工学博士
Thesis Advisor戴汝为
2010-05-29
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword综合集成研讨厅 潜语义模型 跨语言检索 群体交互 摘要生成 Cyberspace For Workshop Of Metasynthetic Engineering Latent Semantic Model Cross Language Information Retrieval Collective Interaction Text Summarization
Abstract综合集成研讨厅体系是我国科学家提出的一项原始性创新成果,其目的是解决与开放的复杂巨系统相关的问题。在应用该体系解决开放的复杂巨系统相关问题的过程中,专家通过在线研讨交换领域知识,阐述观点以及意见;利用基于Internet的系统架构所引入大量的网络资源,与研讨过程中产生的专家定性知识,激发参与讨论专家的创造性思维,使专家获得对问题深入认识,形成有效的决策方案。这些在研讨过程中产生的大量定性知识以及引入的网络资源,属于非结构化数据。有效处理这些非结构化数据以及融合不同类型的非结构化数据,对解决重大决策问题有着重要的意义。 本文从综合集成研讨厅是一个知识的生产与服务体系出发,将信息技术用于综合集成研讨厅中的新知识的产生、知识的凝练以及知识的质量评价,分别开展了跨语言检索、发言文本的摘要生成以及专家权威度分析等研究工作。具体来说,本文的主要工作包括下面几个部分: (1)提出基于潜在语义模型的跨语言检索方法。在研讨中,从多语言的海量信息中,为专家提取相关信息,同时还要跨越语义的“鸿沟”,对辅助专家组织和使用多语言数据资源有重要的现实与理论意义。针对这一问题,本文提出基于潜在语义模型的跨语言检索方法,该方法通过计算同一语种中不同词语之间的共现关系、不同语种中具有同一含义(义项)的词之间的共现关系,估计不同语种词之间的语义关联。本文实现了两种不同的方法:基于LDA模型的跨语言检索和基于PLSI的跨语言检索。这两个方法均建立在不同语种的语义基础上,无需对检索词或文档进行翻译,可以避免翻译过程中词的多义、歧义等问题,跨越语义的“鸿沟”。实验表明,本文提出的两个方法有效,并且实际应用的检索性能较好。 (2)提出面向研讨环境的摘要生成方法。在研讨厅环境中,专家之间交换的专业知识、阐述的观点以及意见,多属于定性知识。如何在研讨厅中对其进行分析和归纳,从而清晰展现研讨话题的演变脉络,进而为参与研讨的专家和决策者提供相应信息支持,对解决重大决策问题有着重要的现实与理论意义。本文提出了一种面向研讨环境的摘要生成方法,该方法采用概率混合模型抽取专家发言的话题集,对相邻话题的变化情况进行判断,既可发现话题的转变趋势,又可根据话题的演化生成相应的摘要,提供给专家。自动生成的摘要有助于增强专家之间的良性互动、激发专家思维,同时又可用于决策方案和会议总结的辅助生成。实验表明,本文提出的面向研讨环境的摘要生成方法合理、有效。 (3)提出一种新的专家权威度计算方法。在综合集成研讨厅中,专家群体是最具有能动性的成员,各专家以研讨的方式畅所欲言,充分表达自己观点,随时进行质询和辩论,以促进对复杂问题认识的提高。因此,在综合集成研讨厅体系中,如何衡量专家意见的合理性,计算在研讨过程中涌现出来的专家权威度,刻画专家群体之间的交互关系和结构,从而促进研讨流畅、高效地进行,是研讨厅体系实践和应用过程中的一个重要问题。针对综合集成研讨环境中的专家权威度评价问题,本文提出了一种基于Semantic-PageRank的专家权威度计算方法。该方法既考虑专...
Other AbstractAs an original innovation proposed by Chinese scientists, the goal of cyberspace for workshop of metasynthetic engineering (CWME) is to solve the complex problems related to Open Giant Complex Systems (OCGSs). In CWME,experts express their opinions and exchange their domain knowledge by online discussion, which produce a lot of qualitative knowledge. What’s more, plenty of web sources are introduced to CWME to help experts expand their knowledge and inspire creative thinking. Both the qualitative knowledge and web sources belong to unstructured data. How to effectively deal with these unstructured data is very important for the emergence of collective wisdom in CWME. As a system of knowledge generation and knowledge service, CWME need the application of information technology to facilitate the knowledge generation, knowledge summarization and knowledge quality evaluation process. Therefore, we develop the research work of Cross-Language Information Retrieval (CLIR), summary generation of expert speech and experts’ authority analysis to address the requirement of CWME. Specifically, this paper involves with the following issues: 1. This paper proposed a latent semantic model based cross-language retrieval method. In CWME, how to extract relevant information from huge volume of multi-lingual information and provide them for experts is an important problem for the utilization of multi-lingual resources. To solve this problem, this paper proposed a latent semantic model based cross-language retrieval method, which measure the semantic relation among different language words by computing the co-occurrences of different words in the same language and co-occurrences of different words in different languages. This paper realized this method at two different approaches: a LDA-based cross-language retrieval approach and a PLSI-based cross-language retrieval approach. These two approaches are built on the basis of semantics among different languages and don’t rely on word-by-word translation of query or documents, which can avoid the phenomena of polysemy and ambiguity during translations and can bridge the semantic “gap”. Experimental results show that these two approaches are very effective and achieve very good performance. 2. This paper proposed a probabilistic mixture model based summarization approach for CWME discussions. In CWME, experts express their opinions and exchange their professional knowledge, which both belong to qualitative knowledge. Ho...
shelfnumXWLW1477
Other Identifier200718014628017
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6262
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
王艾. 综合集成研讨厅中若干非结构化信息处理技术的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20071801462801(723KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王艾]'s Articles
Baidu academic
Similar articles in Baidu academic
[王艾]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王艾]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.