Recently, brain-computer interface (BCI) technology has made impressive progress and has been developed for many applications. Thereinto, the BCI system based on rapid serial visual presentation (RSVP) is a promising information detection technology. However, the use of RSVP is closely related to the user's performance, which can be influenced by their vigilance levels. Therefore it is crucial to detect vigilance levels in RSVP-based BCI. In this paper, we conducted a long-term RSVP target detection experiment to collect electroencephalography (EEG) and electrooculogram (EOG) data at different vigilance levels. In addition, to estimate vigilance levels in RSVP-based BCI, we propose a multimodal method named VigilanceNet using EEG and EOG. Firstly, we define the multiplicative relationships in conventional EOG features that can better describe the relationships between EOG features, and design an outer product embedding module to extract the multiplicative relationships. Secondly, we propose to decouple the learning of intra- and inter-modality to improve multimodal learning. Specifically, for intra-modality, we introduce an intra-modality representation learning (intra-RL) method to obtain effective representations of each modality by letting each modality independently predict vigilance levels during the multimodal training process. For inter-modality, we employ the cross-modal Transformer based on cross-attention to capture the complementary information between EEG and EOG, which only pays attention to the inter-modality relations. Extensive experiments and ablation studies are conducted on the RSVP and SEED-VIG public datasets. The results demonstrate the effectiveness of the method in terms of regression error and correlation.
修改评论