多源气象大数据精准降水模型与算法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 人工智能与机器学习（杨雪冰）-技术团队

	多源气象大数据精准降水模型与算法研究
	唐永强
	2019-05-24
页数	1-132
学位类型	博士
中文摘要	精准降水估计和预报关系国计民生，对社会各行各业的正常运转以及人民的生产生活有着重大影响。随着气象遥感观测技术的迅速发展和地面雨量站观测网络的不断完善，我国现已建成以雨量站-天气雷达-卫星遥感为代表的多源降水观测体系，其积累的多源气象数据呈爆炸式增长，为精准降水估计和预报提供了重要的“原料”基础。然而，由于影响降水的因素众多且关联关系复杂，传统气象方法在刻画存在不确定性的复杂关系方面能力有限，难以有效利用这些多源观测数据。作为大数据分析的有力工具，机器学习在复杂数据建模方面优势明显，与多源气象大数据相结合有望获得更为精准的降水估计和预报，受到各政府、研究机构和大型企业的广泛关注。然而，如何充分地融合与利用多源气象观测数据中所蕴含的丰富信息仍存在诸多困难与挑战。定量降水估计、晴雨估计以及降水集合预报后处理是多源气象大数据精准降水下的三个典型任务。本文从机器学习方法建模的角度出发，针对三个任务中亟需解决的多源气象数据融合问题进行了研究。本文的创新性研究成果主要有：（1）提出了一种基于地理和时序注意力的连续条件随机场降水估计模型(GTA-CCRF)。针对定量降水估计任务中多源数据融合问题，提出了一种基于连续条件随机场的多源数据融合框架，能够有效地获取雷达和雨量计两种数据源的互补信息；在该融合框架中，进一步考虑了降水的时空结构信息，提出了基于地理和时序注意力的时空加权策略，有效提升了降水估计效果。在11个实际降水过程上的实验表明，所提方法在均方根误差、平均绝对误差、相关系数、平均相对误差以及中位绝对误差五项指标上均优于国际先进的降水估计方法。（2）提出了一种基于自步学习的鲁棒多样性多视角晴雨聚类模型(RD-MSPL)。针对晴雨估计任务中因标注样本不足而无法充分挖掘晴雨模式问题，本文从机器学习多视角聚类角度进行了研究。首先，在聚类目标中引入结构稀疏范数，解决了晴雨估计数据中噪声样本混叠问题；其次，在自步正则项中引入反结构稀疏范数，解决了样本选择过程中视角多样性问题。在实际晴雨估计数据集上与相关方法进行对比，结果证实了所提方法在多源晴雨聚类问题上的优越性；在四个公开数据集上的实验表明，所提方法在更具一般性的多视角聚类问题上具有优异的泛化性能。（3）提出了一种基于张量多弹性核自步学习的集合预报成员聚类模型(T-MEK-SPL)。针对降水集合预报后处理中的集合成员聚类任务，创新性地将其转化为了更具一般性的时间序列聚类问题，并提出利用多核聚类框架统一解决时间序列聚类中的高维、漂移和多弹性核融合三个共性问题。在此基础上，提出对多个核空间的自表示系数矩阵施加张量低秩约束，来获得多个弹性核的高阶互补信息；并进一步提出引入自步学习范式，来学习得到更为本质的子空间结构。在实际降水集合预报数据集上的实验表明，所提方法能够更好地对集合预报成员进行簇划分；在58个公开数据集上的实验表明，所提方法在时间序列聚类问题上具有优异的泛化性能。
英文摘要	Accurate precipitation estimation and forecast are highly related to the national economy and the people's livelihood. With the rapid development of meteorological remote sensing technology and the continuous improvement of ground rain gauge network, a multi-source precipitation observation system represented by rain gauge station, weather radar and satellite remote sensing has been constructed in China. As a result, a large number of multi-source meteorological data are collected, providing basic ``raw material" for accurate precipitation estimation and forecast. Due to the numerous factors affecting precipitation and the complex relationship, traditional meteorological methods are powerless to characterize such a complicated relationship with uncertainty, making it difficult to utilize these multi-source observations effectively. As a powerful tool for big data analysis, machine learning has apparent advantages in complex data modeling. Combined with multi-source meteorological big data, it is expected to obtain more accurate precipitation estimation and forecasting and has received considerable attention from governments, research institutions and large companies all over the world. However, how to adequately integrate and utilize the rich information from multi-source meteorological data remains many difficulties and challenges. The typical tasks in accurate precipitation under multi-source meteorological big data include quantitative precipitation estimation, rain/no-rain estimation and post-proce\\ssing of precipitation ensemble forecast. In this dissertation, the problems that need to be solved in the three tasks are investigated from the perspective of machine learning method modeling. The main contributions of this dissertation can be summarized as follows: (1) A Geographic and Temporal Attention Continuous Conditional Random Field (GTA-CCRF) model is proposed for quantitative precipitation estimation. GTA-CCRF tackles the problem of multi-source data fusion in quantitative precipitation estimation task with the continuous conditional random field framework, which can effectively explore the complementary information from radar and rain gauge. In this framework, the spatiotemporal structural information of precipitation is further considered. The spatiotemporal weighting strategies based on geography and temporal attention are proposed, which effectively improves the performance of precipitation estimation. Experimental results on 11 real precipitation processes show that compared with other competitors, GTA-CCRF achieves the best results for RMSE，MAE, CC, AE, and MRE. (2) A Robust and Diverse Multiview clustering model based on Self-Paced Learning (RD-MSPL) is proposed for rain/no-rain estimation. The crucial point in the rain/no-rain task is that the pattern of rain/no-rain could not be fully tapped due to insufficient labeled samples. RD-MSPL carries out researches on this issue from the perspective of multi-view clustering in machine learning. Firstly, A structural sparsity norm is introduced into the objective function to cope with the noise samples in rain/no-rain data. Then, it is further proposed to embed the anti-structure sparse norm into the self-step regular term to increase the diversity of views in the sample selection process. Experimental testing is performed on the real rain/no-rain estimation dataset with several competitors, and the results validate the superiority of RD-MSPL for multi-source rain/no-rain clustering problem. Experiments on 4 public datasets show that the proposed method has excellent generalization performance for more general multiview clustering problem. (3) A Tensor Multiple Elastic Kernels Self-Paced Learning (T-MEK-SPL) model is proposed for members clustering task in post-processing of precipitation ensemble forecast. T-MEK-SPL innovatively transforms the members clustering task in post-processing of precipitation ensemble forecast into a more general time series clustering problem and considered three difficulties existing in the time series clustering at the same time, namely, high dimension, drift and, multiple elastic kernels fusion. On this basis, it is further proposed to apply the high-order tensor constraint to adequately capture the complementary information among multiple elastic kernels, and introduce the self-paced learning paradigm to learn more discriminative subspace structure. Experiments on the real ensemble forecast dataset demonstrate that the proposed method can better cluster the ensemble forecasting members, and experiments on 58 public datasets show that the proposed method has excellent generalization performance for time series clustering problem.
关键词	精准降水，多源数据，连续条件随机场，多视角聚类，时间序列聚类
语种	中文
七大方向——子方向分类	机器学习
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23915
专题	多模态人工智能系统全国重点实验室_人工智能与机器学习（杨雪冰）-技术团队精密感知与控制研究中心
推荐引用方式 GB/T 7714	唐永强. 多源气象大数据精准降水模型与算法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis-tyq.pdf（6468KB）	学位论文		开放获取	CC BY-NC-SA