With the development of computer and smartphone, Internet users are generating large amounts of unstructured texts.A lot of knowledge is contained in those unstructured texts.However, it's not easy to use them directly.The information extraction task tries to extract the knowledge from unstructured texts so that other tasks like question answering could benefit from them.As an important and challenging subtask of information extraction, relation extraction aims to recognize a pair of entities and judge the semantic relation between them automatically.
In this dissertation, we focus on the predefined relation between two entities, which is called predefined binary relation extraction.Relation extraction can also be divided by other dimensions like if the entities are given or the number of relational facts in a sentence.Due to the flexibility of natural languages, a semantic relation could be expressed in various ways.This feature makes the extraction difficult.Relation extraction task has attracted the attention of not only researchers but also industry.Various related research papers are published every year and some commercial corporations have already applied relation extraction in their products.
This dissertation focus on the relation extraction of unstructured texts and the main achievements are as follows:
1. To address the problem that supervised methods are not suitable for distantly supervised dataset because sentences in such dataset are not labeled,we apply reinforcement learning to sentence level relation classification with the distantly supervised dataset.The bag relation prediction process is converted into a reinforcement learning process.Given a bag, the convolutional neural networks based classifier predict the relation of each sentence separately.Then we combine the predicted relation of each sentence to predict the bag relation.The reward is based on the comparison of the predicted bag relation and the gold relation, which is used to judge the performance of the relation classifier.We conduct two different types of experiments and our method achieves better performance than the baseline methods.
2. To address the overlapping problem in multiple relational facts extraction, we propose a sequence-to-sequence model with copy mechanism.This model generates relational facts directly.When generating a triplet, the model first generates the relation, then the model copies the first entity from the source sentence, lastly, the model copies the last entity from the source sentence.Since we generate entity with the copy mechanism, this entity could participate in other relational facts when necessary.Therefore, the overlapping problem is resolved.Two different strategies are utilized when generating the relational facts.Experiments show that our method achieves comparable performance with the baseline methods when the sentence only contains one relational fact.While our model significantly outperforms the baseline methods when the sentence contains multiple relational facts.
3. To address the extraction order problem in multiple relational facts extraction, we use reinforcement learning to guide the extraction of relational facts.When a sentence contains multiple relational facts, the extraction order could make a difference to the extraction performance.Some relational facts are easier to extract and they can help the extraction of other relational facts.A supervised model requires a predefined extraction order for each sentence.However, it's difficult to assign the best extraction order for each sentence manually.We use reinforcement learning to train a sequence-to-sequence model and associate the reward with the number of correctly extracted relational facts.To achieve the highest reward, the model would extract relational facts in their best order automatically.The widely conducted experiments verify the effectiveness of this method.