序列标注中的神经网络方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	序列标注中的神经网络方法研究
	吴惠甲
	2017-05
学位类型	工学博士
中文摘要	序列标注是自然语言处理和机器学习领域中的一项重要的基础性工作，是近年来的一个研究热点。本论文研究序列标注中的神经网络方法，具有理论意义和应用价值。论文的主要工作和创新点归纳如下： 1、提出了一种基于多层感知机的序列解码方法该方法首先将词、词性和标签信息用预训练的实值向量表示，然后引入前词的标签信息，在解码时利用柱搜索方法，在一定程度上捕捉标签之间的依赖关系。在汉语和英语范畴标注任务上的实验表明，该方法有效提高了范畴标注的准确率。 2、提出了一种动态窗口序列标注模型多层感知机需要用固定窗口来获取上下文特征，但是窗口的大小不易选取，且不同的词所需要的窗口信息也可能不同，本文设计了动态窗口模型来解决这一问题。该模型借鉴长效短期记忆（LSTM）网络中的门机制，通过设计的过滤门来动态提取上下文窗口中的信息，利用基于位置的注意力机制聚焦窗口内容。实验表明，该模型可有效提升现有神经网络方法在范畴标注任务上的性能。 3、提出了一种跨层堆叠的双向长效短期记忆标注模型该模型针对层数过深会导致网络难以训练的问题，借鉴残差网络的思想，引入了跨层连接方法以加速收敛，并提出了在双向网络上三种基本的跨层连接方式，以寻找最优传输通道。此外，针对跨层连接会带来额外的迭代计算开销的问题，设计了一种跨层模块来优化跨层的信息传输结构，把长效短期记忆网络细胞模块中的自连接部分用跨层连接替代，因此无需保留自连接部分，从而降低了迭代计算开销。在范畴标注和词性标注任务上证实了跨层模块的性能要好于普通的跨层连接，并且在范畴标注任务上达到了目前的最好水平。
英文摘要	Sequence tagging is a important fundamental task in natural language processing and machine learning. In this paper, we study the neural network methods for sequence tagging. The main contributions are summarized as follows: 1. We proposed a sequential decoding model based on the multilayer perceptron. We present a sequential decoding model based on the multilayer perceptron, using the pretrained word, POS tags and category embeddings to improve their representations. Furthermore, we use previous tags in the encoding and decoding steps to capture the relations between tags. Experimental results show that our model improves the performance on the supertagging task. 2. We proposed a dynamic window supertagging model for supertagging. Multilayer perceptrons use a fixed window to capture the local context, but it would be hard to choose the proper window size, and different words may require different context window sizes for tagging. To solve this problem, we add a new mechanism to the system to dynamically choose the context in a fixed window. Specifically, following the gating mechanism in LSTM, we use logistic gates to control the items in the local context. Using local attention to focus the information in the context window. Experimental results show that our approach is effective to improve the performance of the existing approaches on the supertagging task. 3. We proposed a shortcut stacked LSTM model for sequence tagging. The network will be hard to train due to deep stacked layers. Following the design of residual blocks, we use shortcut connections to make the convergence faster. We investigate three kinds of shortcut connections between two bidirectional LSTM blocks to find the optimal structure to pass the information. Futhermore, the computational complexity will be high due to many stacked layers. We design a new architecture called shortcut block to pass the information across layers. Our method is replacing the self-connected parts in LSTM cells with gated shortcuts. The computation complexity for iteration is reduced since we do not need to preserve the internal states. Experiments on POS tagging and supertagging show both the efficiency and the effective of the shortcut blocks compared to the traditional shortcut connections, achieving the state-of-the-art on the supertagging task.
关键词	序列标注范畴标注动态窗口跨层连接长效短期记忆网络
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14649
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	吴惠甲. 序列标注中的神经网络方法研究[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
吴惠甲-博士论文.pdf（1455KB）	学位论文		限制开放	CC BY-NC-SA