英文摘要 | As a widely used tool for information transmission and communication in human society, text plays an important role in the design of human-computer interaction system. Currently, common text input methods include keyboard, writing pad and tablet computer, etc, all of which have disadvantages, such as the keyboard is limited by the size and key number and the writng pad’s writing area is small. Thus, it attracts lots of interests to design natural, convenient and efficient text input methods. In recent years, by combining handwriting recognition and computer vision techniques, researchers have developed a class of vision gesture based text input systems, which can be divided into two categories. In one category, handwriting was produced with an ordinary pen on paper and a camera was used to capture the ink trajectory, which was recognized into text. Such systems were still limited to some external conditions, such as the size of writing area. In the other category, the characters were written by moving a finger on the desk or in the air. The finger trajectory was recorded by a camera or other motion sensors, e.g., Kinect. Since there is no pen lift information in the moving trajectory, it complicates the segmentation of characters, and so, these systems can only recognize isolated characters. This dissertation reports our first attempt to gesture character string recognition. In our system, a user can write freely in the air by moving his/her finger with the trajectory captured by a camera or a motion sensor, such as Kinect, then, the written-in-the-air trajectory is recognized into string result. The major contributions of this dissertation are as follows: (1) To perform experiment, we build a Kinect-based fingertip trajectory capturing system and collect 1,000 gesture digit strings written by 30 persons. (2) To solve the problem that there are extra stokes after over-segmentation due to the lack of pen lift information in the visual gesture strings, we propose deletion geometric models for deleting stroke segments that are likely to be ligatures. (3) To improve the accuracy of character string segemtation and recognition, we integrate character classifier output, character geometric model and deletion geometric model in the integrated segmentation-recognition framework. (4) By performing experiments in our collected gesture digit string dataset, our method achieves promising results: the string-level correct rate is over 80%. What’s more, our method... |
修改评论