With the increasing use of pen-based input devices and user-interfaces, more research attention has been paid on online document analysis techniques including text segmentation, recognition and retrieval. In spite of the great progress on handwritten text recognition, the remaining recognition errors can still present locating the keywords. Keyword spotting is to locate the instances in the document without accurate recognition of the document. The user can adjust the similarity threshold to balance the recall and the precision for fulfilling different needs. This thesis studies into text-query-based keyword spotting techniques on large database of multi-writer online handwritten Chinese documents. Based on handwriting recognition, candidate character confidences are computed on the candidate segmentation-recognition lattice and combined into word similarities. Due to the accurate character/word similarity computation and dynamic search, the query can be efficiently located on the lattice. The major contributions of this work are as follows: (1) A keyword spotting method based on one-vs-all(OVA)trained prototype classifier is proposed. Compared with the prototype classifier trained with minimum classification error(MCE)criterion, the OVA classifier can better detect target words and reject imposters. Our experimental results demonstrate the effectiveness of keyword spotting using OVA classifiers. (2) A spotting method based on the character confidence computed from the N-best list on the candidate segmentation-recognition lattice is proposed. Each path is evaluated by a scoring function combining multiple contexts including the character classification score, bi-gram linguistic score and geometric scores. The scores of the N-best paths are transformed to posterior probabilities using soft-max with its parameter estimated from the character confusion network, which is generated from the N-best paths of a training data set of text lines. The experimental results demonstrate the superiority of this method. (3) A keyword spotting method with edge probability computation on the pruned candidate segmentation-recognition lattice is proposed. Based on the semi-Markov conditional random fields(semi-CRFs)model, the candidate segmentation-recognition lattice is pruned by a forward-backward algorithm and the edge probability is computed as the character confidence. We further propose to improve the recall of the keyword spotting using an error-correcting character...
修改评论