CASIA OpenIR  > 毕业生  > 博士学位论文





1. 构建了汉语自然语言刺激的神经影像数据集


2. 提出了一种特征消除方法来分析词汇级句法信息的大脑表征机制


3. 发现了大脑构建层次句法结构时采用适应语言句法结构的计算方式


4. 基于一组具有心智经验基础的语义特征揭示了大脑的语义表征机制



Language understanding in the brain is a complex cognitive process. After receiving language signals, the brain needs to build semantic and syntactic representations. The former associates linguistic symbols with real-world objects, actions, and previous experiences, and the latter enables humans to combine small linguistic units into large linguistic units. Studying the semantic and syntactic representation of the brain is the key to exploring the language understanding mechanism of the human brain and inspiring more effective brain-like natural language processing models. It has important theoretical significance and application value.

Since the semantic and syntactic effects in the process of language understanding are mixed together and difficult to separate, most of the existing studies are based on the paradigm of controlled experiments, and try to separate the effects of the two through the controlled corpus. However, the controlled corpus is quite different from natural language and therefore brings problems such as generalizing conclusions outside controlled conditions.

This thesis studies the brain representation mechanism of semantics and syntax under naturalistic language stimuli. In view of the current lack of high-quality neuroimaging data collected under naturalistic language stimuli and the difficulty of extracting and quantifying semantic and syntactic features, this thesis first collects neuroimaging data. Then, this thesis leverages the distributed text representation in natural language processing and the psychologically plausible semantic representation in the field of cognitive science to represent syntactic and semantic features. With the collected neuroimaging data and the extracted features, this thesis studies how the brain represents syntax and semantics by modeling the mapping relationship between semantic and syntactic features and brain activity through computational methods.

The main contributions of this thesis are summarized as follows.

1. The collection of a neuroimaging dataset under Chinese naturalistic language stimuli

Brain activation during language understanding is fundamental for studying the brain mechanism of language, and the quality of the data heavily constrains what can be learned. Research under naturalistic language stimuli depends on computational models to establish the mapping between language stimuli and brain activation, which has high requirements for the scale and quality of neuroimaging data. Therefore, this thesis collects functional magnetic resonance image data from 12 native Chinese participants when they are listening to 60 Chinese stories with a total duration of about 5 hours. The collected neuroimaging data is then preprocessed and technically validated. The technical validation results show the high quality of the data. In addition, this thesis annotates the linguistic features of the stimulus story text, which lays a foundation for the analysis of the brain representation mechanism of syntactic and semantics.

2. The investigation of the word syntactic representation in the brain by a feature elimination method

Word is the smallest linguistic unit that can be independently used and the foundation that makes up complex phrases and sentences. Studying how words form phrases and sentences must first categorize words and understand the relationships between words. Therefore, this thesis selects three syntactic features: part-of-speech, dependency relationship, and predicate-argument structure, and analyzes the word-level syntactic feature representation patterns of the brain in both Chinese and English. The main difficulty of studying the above problem is to separate a specific syntactic from the others and from semantics. This thesis proposes a feature elimination method to separate different syntactic features from the word embeddings computed by the distributed text representation model. The proposed feature elimination method can eliminate a specific feature from the word embedding space while retaining other features. Based on the original and one-feature-removed word embeddings, we explore how the brain encodes syntactic features by associating these vectors with brain imaging data. The motivation for removing one feature from representations is that if a specific feature is removed from the original word embeddings and if this feature is represented in the brain, the predictability of the brain areas associated with this feature will be severely damaged.  Results suggest some possible contributions of several brain regions to the complex division of syntactic processing, and there are certain overlapping and dissociation differences between Chinese and English.

3. The investigation of the structure-adaptive brain mechanism for the construction of hierarchical structures

The hierarchical nature of language enables finite words to combine into infinite sentences following syntactic rules. In view of this characteristic of language, this thesis studies how the brain builds hierarchical syntactic structure in Chinese and English. In order to explore the general brain mechanism of syntactic structure building, this thesis starts from the structural difference between Chinese and English and quantitatively analyzes the relationship between the branching direction and the working memory load generated by different parsing strategies. The results show that there are obvious differences in the language structure between Chinese and English, with English mainly being right-branching, while Chinese has both left- and right-branching structures. This difference leads to different working memory burdens for different parsing strategies. For Chinese, the bottom-up strategy causes less memory burden, while for English, on the contrary, the top-down strategy has a smaller memory load. Subsequently, through an fMRI analysis, this thesis finds that the brain activation of both Chinese and English participants is more consistent with the parsing strategy with less memory load, indicating that the brain is limited by cognitive resources and adopts parsing strategies with less memory load according to different language structures.

4. The investigation of the brain basis of semantics with a set of mental-experience-based semantic features 

The brain basis of semantics is a common concern of multiple disciplines such as cognitive neuroscience and psychology. Studies have shown that the semantic acquisition of the brain is contributed by mental experience, including concrete sensory-motor experiences and abstract emotional and social experiences. However, most of the existing studies have focused only on one type of experience, ignoring the importance of another type of experience to semantic representation. This thesis selects a representative set of sensory-motor and non-sensory-motor semantic features, including visual, motor, emotion, socialness, space, and time, and manually annotated the scores of each dimension for words in Chinese naturalistic language stimuli. Then, this thesis establishes the mapping between this set of semantic features and brain activation through linear regression models and analyzes the dissociation and overlapping neural patterns of these semantic dimensions. The results show multiple brain regions sensitive to semantic information and reveal a complex semantic atlas in the brain, which provide new insights into the organization pattern of the brain semantic system.

关键词自然语言刺激 句法 语义 神经语言表征 功能磁共振影像
GB/T 7714
张肖寒. 自然语言刺激下大脑的语义和句法表征机制研究[D],2023.
文件名称/大小 文献类型 版本类型 开放类型 使用许可
张肖寒_答辩后修改_自然语言刺激下大脑的(15802KB)学位论文 限制开放CC BY-NC-SA
所有评论 (0)
