基于随机森林的降雨估计模型与算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于随机森林的降雨估计模型与算法研究
	匡秋明
	2017-05-27
学位类型	工学博士
中文摘要	精细化短临降雨预报（约1公里×1公里空间分辨率、6分钟时间分辨率，0-2小时预报）对奥运会、阅兵等大型公务活动、航天器发射等科研活动、农业灌溉、水利发电、洪涝灾害预警等方面具有重要意义。降雨估计是降雨预报的基础，精准的降雨估计结合时间序列降雨外推技术或数值模式方法可以实现降雨预报。但是受风速、风向、地形、降雨气象条件、降水相态、降雨类型等因素影响，传统的降雨估计方法难以满足高分辨率条件下降雨估计精度要求，如何提升降雨估计准确性是挑战性的研究课题。为了提高气象实况估计和预报能力，我国已经建成覆盖全国的雷达、卫星和地面气象要素监测网络，并采集了大量的观测数据。亟需研究新的模型与算法，有效利用这些观测资料提升降雨估计准确性以满足应用需求。近年来，随机森林算法因其具有易于实现模型并行，大数据模型训练时不容易过拟合，与深度学习相比需要计算资源少等优点，在大数据挖掘知识竞赛和众多实际领域应用中取得了显著成效。本文以随机森林算法为基础，研究多源数据特征融合表示、时空建模和不平衡数据回归的理论与方法。首先，针对晴雨估计问题，提出了多视角权重随机森林（Multi-View Weighted Random Forests，MVWRF)模型和推理方法，利用雷达、卫星和地面观测多源数据，提升了晴雨估计准确性；其次，针对降雨量估计问题，提出了以降雨过程为输入的时空模型（Random Forest and Linear Chain Conditional Random Field based Spatiotemporal Model，RANLIST）和三阶段近似优化算法，以降雨过程时序关系和雷达反射率因子空间邻域相关关系为基础，建立降雨量估计的时空模型，挖掘数据中的时空信息，提升了降雨量估计准确性；最后，针对强降雨估计问题，提出了不平衡回归重采样(Multiclass Over-sampling and Under-Sampling，MOU）和集成回归模型(Resampling based Random Forest，RRF)，提高了强降雨估计的准确性。主要工作和创新点如下： (1)提出了一种多视角权重随机森林算法（MVWRF），用于晴雨估计。在此过程中： (a)提出一种视角构建方法，分别构建VisCAPPI、VisPPI、VisSat和VisGround等视角，使得时空分辨率不一致的雷达、卫星和地面观测多源数据、能够统一到一个多视角框架下处理；(b) 提出一种多视角权重随机森林模型和模型推断方法，实现了多源数据融合建模，协同雷达、卫星和地面观测多视角进行晴雨估计，发挥了多源数据针对晴雨估计的互补作用；在真实数据上实验表明：新的算法在晴雨估计准确率、召回率、漏报率、空报率等评价指标上优于其它算法。 (2)提出一种新的时空模型（RANLIST），用于降雨量估计。在此过程中： (a) 提出了一种“结构+统计”的降雨量估计时空模型，该模型利用了雷达反射率因子空间邻域结构和降雨过程时间序列结构（结构），以及随机森林机器学习挖掘数据内部信息（统计）；(b) 提出一种基于随机森林和线性链条件随机场的模型近似求解方法，RANLIST模型按降雨过程进行模型训练和测试，充分挖掘降雨过程的整体对于降雨量估计的作用；在真实数据上实验表明：本文提出算法与次优算法相比，显著提高了降雨量估计的准确性。 (3)提出一种混合重采样集成回归算法（MOU\_RRF），用于强降雨估计。在此过程中： (a) 提出了一种针对强降雨回归问题的MOU混合重采样方法，首次综合利用多类不平衡过采样和多类不平衡降采样方法，处理不平衡回归样本；(b) 提出一种RRF集成回归方法。该方法能对过采样、降采样和混合采样分类方法进行扩展，用于处理回归问题。在真实数据集上的强降雨回归估计实验表明：所提出的样本混合重采样集成回归算法，在强降雨估计效果上优于传统的气象方法、支持向量机和随机森林机器学习方法、以及典型过采样、降采样和混合采样集成回归方法。 (4)提出了降雨估计模型训练和测试的执行流程和框架，晴雨估计、降雨量估计和强降雨估计算法成果在全国范围内应用，生成全国雷达精细化降雨估计产品，实现了约1公里$\times$1公里空间分辨率降雨估计，结合光流法外推实现了0-2小时短临降雨预报。
英文摘要	High-resolution (about 1 km × 1 km) short-term (0-2 hours) rainfall forecasting is crucial for outdoor events such as the Olympic Games, parades and research activities like spacecraft launches, agricultural irrigation, water power generation, and flood warning. Precise rainfall estimation is an important basis of rainfall forecasting. However, traditional rainfall estimation methods are inaccurate and unstable due to the influence of wind speed, wind direction, terrain, meteorological conditions, precipitation phase, rainfall type, ground reflection clutter and other factors. Thus, they can not meet the requirements of rainfall estimation with high resolution. How to improve the accuracy of rainfall estimation is a very challenging research topic. Currently, the radar, satellite, and ground-based observation networks have been constructed and they are applied for collecting a large number of meteorological data. However, traditional methods for rain/no rain classification and rainfall intensity estimation only use a few parts of data. It can not meet the requirement of applications. In recent years, Random Forest method has being achieved remarkable results in competition fields and various practical applications, with the advantages of paralleling easily, not over-fitting on big data, using less computing resources compared with deep learning. In this thesis, Random Forest is employed as a basic algorithm for rainfall estimation. The problems of multi-source data fusion, spatiotemporal modeling and imbalanced regression have been researched. Firstly, for the problem of rain/no rain classification, this thesis proposes a model of multi-view weighted random forests (MVWRF) together with its inference method. It improves the performance of rain/no rain classification using data from radar, satellite and ground observation stations. Secondly, in order to estimate rainfall, machine learning methods of Random Forest and linear chain condition random fields are employed for building spatiotemporal model (RANLIST), which utilizes the spatial structure of radar reflectivity factors and time-series information of rain processes. A three-stage method also has been presented for optimizing RANLIST model. For the implementation of RANLIST model for rainfall estimation, rain processes are used as the basic units for training and testing processes. The spatial and temporal information is explored to improve the performance of rainfall estimation. Finally, for heavy rainfall estimation, a resampling method of multi-class over-sampling and under-sampling (MOU) and resampling based ensemble regression method (RRF) are proposed for handling imbalanced regression problems. Experiment results show that they improve the performance of heavy rainfall estimation. The main contributions of this dissertation can be summarized as follows: (1)MVWRF method is proposed for rain/no rain classification. The innovations are listed as follows: (a) We present a method for constructing multi-views. Several views such as VisPPI, VisPPI, VisSat and VisGround are constructed, so that multi-source data of radar, satellite and ground-based observation data, which has inconsistent spatial and temporal resolution, can be handled by a unified multi-view framework. (b) A MVWRF model is proposed based on multi-view data under the bayesian framework, which collaborates multi-source data and are used for rain/no rain classification. It explores the complementation of multi-sources data. Experiments show that this method can effectively merge multi-source data from radar, satellite and ground observation stations. Compared with meteorological methods, typical machine learning classification methods, and other multi-view methods, the presented method can improve the accuracy of rain/no rain classification. (2) A new spatiotemporal model is proposed for radar-based rainfall estimation. The innovations in this process are described as follows: (a) A radar based rainfall estimation model RANLIST (structure \& statistical) is proposed, which utilizes both the spatial structure of the radar reflectivity factors, time series structure of the rainfall process (structure), and learn the internal structure included in the data by statistical machine learning method (statistical). (b) A spatial submodel based on random forest, a time series submodel based on linear chain random fields and a submodel fusion method are presented for optimizing RANLIST model. The rainfall processes are used as basic units for training and testing, which is benefit for exploiting the information including in the whole rainfall processes. Experiments show that the following results. Compared with suboptimal algorithm, the proposed method improves the accuracy of rainfall estimation obviously. (3)A hybrid resampling ensemble regression algorithm is proposed for heavy rainfall estimation. The main innovations are as follows: (a) A MOU hybrid resampling method is proposed, which is the first time to utilize multi-class over-sampling methods and multi-undersampling methods for regression. (b) A RRF ensemble regression method is presented. This method can employ the existing typical oversampling methods, undersampling methods and hybrid sampling methods for handling regression problems. The performances of typical oversampling methods, undersampling methods and hybrid sampling methods are contrasted with MOU resampling method by experiments. Experiment results show that the proposed method can achieve the best performance in heavy rainfall estimation, which is superior to traditional meteorological methods, support vector machine and random forest, as well as typical over sampling, under sampling and hybrid sampling based ensemble regression methods. (4)In this thesis, the formulations of rain/no rain classification, rainfall intensity estimation and heavy rainfall estimation are presented. The train and test workflow and framework are proposed. The proposed algorithms are applied over China. It realizes rainfall estimation with spatial resolution of 1 km × 1 km. It can also forecast rainfall every six minutes in the coming 0-2 hours with optical flow method.
关键词	随机森林降雨估计多源数据融合时空模型不平衡数据
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14701
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	匡秋明. 基于随机森林的降雨估计模型与算法研究[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
匡秋明博士学位论文.pdf（9703KB）	学位论文		限制开放	CC BY-NC-SA