传染病多元耦合时序预测的深度学习方法与应用

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 互联网大数据与信息安全

	传染病多元耦合时序预测的深度学习方法与应用
	王月娇
	2021-05-22
页数	80
学位类型	硕士
中文摘要	传染病每年在全球范围内不仅夺走数以百万计的生命，还会给经济发展和社会稳定带来难以估量的损失。对传染病进行及时的预测与研判，有利于公共卫生部门在传染病暴发时衡量传播风险，合理分配防疫资源，尽早遏制传染病的人际传播。常用的传染病预测算法多为改进的自回归模型或传播动力学模型。这些模型具有建模能力不足、依赖先验知识等缺点。近年来，深度学习模型开始被应用于传染病预测领域，并取得了优于传统预测算法的表现。但深度学习模型应用于传染病预测领域尚处于早期探索阶段，面临着模型结构设计与传染病的领域知识融合不紧密、传染病时序数据稀疏且非平稳、模型的误差衡量指标与公共卫生管理目标不一致等问题。这些问题阻碍了深度学习模型在传染病预测领域的推广，也限制了模型预测精度的提高。所以，传染病多元时间序列预测的深度学习方法研究应该结合流行病学知识，重视传染病时序中蕴含的异质性问题，将深度学习模型与公共卫生管理决策相融合。基于以上的研究背景，本文将多年龄组人群的免疫异质性、人群交互模式、公共卫生管理决策需求等领域知识和深度学习模型的设计相结合，使深度学习模型更适用于传染病的多元时间序列预测问题。本文的主要工作和创新点归纳如下：（1）探究多年龄组人群的免疫异质性对预测的影响，验证了深度学习模型在长期预测中的鲁棒性，并提出了符合公共卫生决策需求的误差衡量指标。手足口病（Hand, foot and mouth disease, HFMD）是对0到6岁的低龄儿童危害性最大的丙类传染病，不同年龄的儿童对病毒的免疫力存在差异，所以HFMD的多年龄组发病时序呈现出不同的峰值时间和峰值规模。本文利用深度学习模型，融合HFMD三个年龄组的时序特征和免疫特性，将峰值时间和峰值规模的预测误差作为新的衡量指标，与多元自回归模型进行对比。实验发现，在短期预测任务中，深度学习模型的各项指标普遍优于自回归模型；在长期预测任务中，自回归模型的预测误差随预测步长的增加而快速增大，而深度学习模型的预测性能则呈现出鲁棒性。（2）探究多年龄组人群的交互特征对预测精度的影响，并提出了一种自适应的图卷积深度学习预测模型（Adaptively temporal graph convolution model, ATGCN）。新型冠状病毒肺炎疫情（Corona Virus Disease 2019，COVID-19）是百年以来全球最严重的公共卫生危机，科学界亟需对疫情趋势做出准确研判。通过对COVID-19流行病学参数的荟萃分析和对多年龄组易感人群的交互特征的调研，本文发现人群交互是疫情预测建模的主要因素，因此提出了基于多年龄组交互特征的ATGCN模型。以美国马里兰州的九个年龄组的新增确诊病例为研究对象，ATGCN将多年龄组的发病时序刻画为全连接的有向图，自适应地学习多年龄组之间的交互矩阵，并将其应用于图卷积预测模块。实验结果显示，在长期预测任务中，ATGCN模型的预测精度比自回归模型、深度序列学习模型分别提高了12.4%和40.9%；和基于专家经验的交互矩阵相比，自适应学习交互矩阵的ATGCN模型的预测精度提高了8.1%，这表明多年龄组的交互特征有效地融合了多元发病时序的信息，提高了模型的预测精度。综上，本文探究了深度学习模型在传染病多元时序预测领域的应用价值。以HFMD和COVID-19为案例研究对象，创新地将易感人群的免疫异质性和交互特征等领域知识与深度学习模型结合起来，有效地提高了深度学习模型的预测精度，推动了深度学习模型更好地服务于公共卫生管理与决策。
英文摘要	Infectious diseases not only claim millions of lives globally every year, but also bring incalculable losses to economic development and social stability. Timely prediction of infectious diseases will help public health departments to measure the risk of transmission when an infectious disease breaks out, reasonably allocate epidemic prevention resources, and contain the spread of infectious diseases as soon as possible. In recent years, deep learning models have begun to be applied to the field of infectious disease prediction, and have achieved better performance than traditional algorithms. However, this research field of infectious disease prediction is still in the early stage of exploration. It faces the following problems: the metric of the deep learning model is inconsistent with the public health management goal, the time series data of infectious diseases is sparse and non-stationary, and the deep learning model is not closely integrated with the domain knowledge of infectious diseases. These problems hinder the promotion of deep learning models in the field of infectious disease prediction and limit the improvement of model prediction accuracy. Therefore, the study of deep learning methods for multivariate time series forecasting of infectious diseases should combine epidemiological knowledge, pay attention to the heterogeneity problems contained in the infectious diseases, and integrate deep learning models with public health management decision-making. Based on the above research background, this thesis combines the domain knowledge of the immune heterogeneity, population interaction patterns, and public health management decision-making needs with the design of deep learning models to make models more suitable for infectious disease prediction. The main work and innovation points of this thesis are summarized as follows: (1) We explored the impact of immune heterogeneity of multiple age groups on prediction, verified the robustness of deep learning models in long-term prediction, and proposed error metrics to meet the needs of public health decision-making. Hand, foot and mouth disease (HFMD) is a category C infectious disease which is harmful to young children aged 0 to 6. Children of different ages have different immunity to the virus. Therefore, the number of confirmed cases of multiple age groups showed different peak times and peak scales. This thesis utilized the deep learning models to fuse the temporal characteristics and immune characteristics of three age groups of HFMD, and applied the peak time and peak scale prediction errors as new metrics to compare with the autoregressive models. Experiments have found that in short-term prediction tasks, the performances of deep learning models were generally better than autoregressive models; in the long-term prediction task, the prediction errors of autoregressive models increased rapidly with the horizon, while the predictive performance of deep learning models showed robustness. (2) We explored the impact of contact patterns of multiple age groups on prediction accuracy, and proposed an adaptively temporal graph convolution model (ATGCN). The Corona Virus Disease 2019 (COVID-19) is the world's most serious public health crisis in a century, and the scientific community urgently needs to make accurate judgment on the trend of the epidemic. Based on the meta-analysis of the epidemiological parameters of COVID-19 and the investigation of the interaction characteristics of susceptible populations of COVID-19, we believes that population contact pattern is the main factor in epidemic prediction, so ATGCN is proposed. Taking daily confirmed cases of nine age groups in Maryland, USA as the research object, ATGCN models multiple age groups as a fully connected directed graph, adaptively learns their contact matrix, and applies it to the graph convolution module. Experiments showed that in long-term prediction tasks, the prediction accuracy of ATGCN was 12.4% and 40.9% higher than that of the autoregressive models and deep sequence learning models, respectively; compared with the contact matrix based on expert experience, the prediction accuracy of ATGCN increased by 8.1%, which indicates that the contact features of age groups effectively integrate the information of multivariate time series and improve the prediction accuracy of the model. In summary, this study explored the application value of deep learning models in the field of multivariate time series prediction of infectious diseases. Taking HFMD and COVID-19 as the case study objects, we innovatively combined domain knowledge such as immune heterogeneity and contact patterns of susceptible people with deep learning models, which effectively improved the prediction accuracy of models and promoted the deep learning methods to better serve public health management and decision-making.
关键词	深度学习传染病多元时间序列预测手足口病新冠肺炎
语种	中文
七大方向——子方向分类	数据挖掘
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44776
专题	多模态人工智能系统全国重点实验室_互联网大数据与信息安全
推荐引用方式 GB/T 7714	王月娇. 传染病多元耦合时序预测的深度学习方法与应用[D]. 中国科学院大学自动化研究所. 中国科学院大学自动化研究所,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
手稿15.pdf（4642KB）	学位论文		开放获取	CC BY-NC-SA