As the carrier of information recording and dissemination, documents have always been of great significance to the human life and culture. In recent years, for processing and understanding by computers, the digitization of historical paper documents has become a trend. As two key technologies in the process of document digitization, layout analysis and character recognition have attracted a lot of attention from researchers for many years, and have been widely used in many applications. Therefore, it is of both theoretical significance and application value to study how to analyze the document layout and recognize the document contents accurately. In practice, the variety of documents entails different technologies of layout analysis and character recognition. This thesis focuses on the analysis and recognition of historical document images, with the aim of improving the performance of layout analysis and character recognition. The main contributions are summarized as follows:
1. Layout analysis for document images based on fully convolutional network. Compared with contemporary documents, historical documents face many new challenges in layout analysis, and the existing methods often fail to solve these problems well. Therefore, we propose a document image layout analysis method based on the single-task fully convolutional network (FCN). This method uses a fully convolutional network to perform pixel-level prediction on historical document images, and obtains accurate layout analysis results. In order to improve the algorithm efficiency of layout analysis, a document image layout analysis method based on the multi-task fully convolutional network is also proposed. This method can solve multiple tasks of layout analysis simultaneously, such as document image binarization, page segmentation, text line segmentation and baseline detection. The multi-task network can not only improve the efficiency of processing, but also improve the accuracy of layout analysis through the information interaction between different tasks. Experiments on the document image binarization dataset and medieval manuscripts dataset prove the effectiveness and superiority of the proposed methods.
2. Layout analysis and database construction of Chinese ancient documents. For Chinese ancient document images, we propose a layout analysis method based on the multi-task fully convolutional network, which can simultaneously solve multiple layout analysis tasks such as document image binarization, text line segmentation and character segmentation. Based on this layout analysis method, we design an interactive annotation software to carry out the document image binarization, text line segmentation, character segmentation and character class annotation for a large number of Chinese ancient document images from Complete Library in Four Sections and Ancient Scriptures, and build a large database of Chinese ancient document images. The resulting database contains more than 10,000 pages of Chinese ancient document images and their labels for binarization, text line segmentation, character segmentation and character classification. Therefore, it is suitable for a variety of research problems. This thesis presents the basic evaluation metrics and experimental results to provide baselines for researches in the field.
3. Class-incremental learning of large class set based on convolutional prototype network. When recognizing ancient Chinese characters, it is often difficult to know all character classes in advance and learn them in batch due to the large number of rare characters and variant characters. Therefore, the model should be able to expand the classification capability of new classes continuously, that is, class-incremental learning. We propose a class-incremental learning method of large class set based on the convolutional prototype network (CPN) for Chinese ancient character recognition. After interpreting the inherent advantages of the convolutional prototype network over traditional convolutional neural network (CNN) in the open world issues such as class-incremental learning, we propose that the feature extraction ability and robustness of the network can be enhanced by strategies such as combining unsupervised reconstruction loss and adding pre-training classes, thereby improving the class-incremental learning performance of the network. We also propose a method based on prototype regularization in invariable feature space, and a method based on prototype and network parameter regularization in variable feature space. Both methods yield superior performance in class-incremental learning on the Chinese ancient handwritten character dataset.