Semantic Role Labeling (SRL) is a shallow semantic analysis technique. Given a sentence, it does not perform deep semantic analysis, but only labels arguments that are related to the predicates in the sentence. SRL can provide useful semantic analysis for applications such as information retrieval, question answering, and machine translation etc. However, in practice the robustness of SRL is very weak: Good results can only be obtained on a very small and specific domain of texts. SRL result on general texts is usually very bad. The main reasons for this phenomenon are as follows: First, because SRL utilizes syntactic parsing results, it relies heavily on syntactic parsing. Second, SRL performs very badly on out-of-domain test data. The commonly used corpus for SRL is the PropBank, which consists mostly of economic news texts from Wall Street Journal (WSJ). On texts from other genres, the performance of SRL drops significantly. Moreover, because the available training data for SRL is very limited, using more linguistic knowledge to help SRL is very important. Therefore, how to utilize more linguistic knowledge to boost the robustness of SRL is also an important problem for research. The work in this thesis aims to enhance the robustness of SRL. Focusing on the problems above, This thesis presents research on three aspects: 1. This thesis has proposed a Minimum Error Weighting (MEW) combination strategy for SRL to reduce SRL’s reliance on single syntactic parsing result. System combination is an effective method to reduce SRL’s reliance on single parsing result. Traditional combination methods equally trust the SRL results to be combined. However, different systems have different properties. It is reasonable to trust systems with better overall results more than other systems. So this thesis has proposed a strategy that assigns different weights to results from different systems. These weights are trained by minimizing an error function on development set. This thesis has introduced an algorithm for MEW training, which has no requirement for the form of the error function. So the error function can be freely defined as needed. Using the proposed method, this thesis has achieved the best SRL result on commonly used Chinese PropBank data set to date. 2. This thesis has proposed a model based on Deep Belief Network (DBN) to learn a Latent Feature Representation (LFR) for domain adaptation of SRL. Because of SRL’s reliance on syntactic parsing, it is di...
修改评论