|关键词||随机森林 降雨估计 多源数据融合 时空模型 不平衡数据|
本文以随机森林算法为基础，研究多源数据特征融合表示、时空建模和不平衡数据回归的理论与方法。首先，针对晴雨估计问题，提出了多视角权重随机森林（Multi-View Weighted Random Forests，MVWRF)模型和推理方法，利用雷达、卫星和地面观测多源数据，提升了晴雨估计准确性；其次，针对降雨量估计问题，提出了以降雨过程为输入的时空模型（Random Forest and Linear Chain Conditional Random Field based Spatiotemporal Model，RANLIST）和三阶段近似优化算法，以降雨过程时序关系和雷达反射率因子空间邻域相关关系为基础，建立降雨量估计的时空模型，挖掘数据中的时空信息，提升了降雨量估计准确性；最后，针对强降雨估计问题，提出了不平衡回归重采样(Multiclass Over-sampling and Under-Sampling，MOU）和集成回归模型(Resampling based Random Forest，RRF)，提高了强降雨估计的准确性。
(a) 提出了一种“结构+统计”的降雨量估计时空模型，该模型利用了雷达反射率因子空间邻域结构和降雨过程时间序列结构（结构），以及随机森林机器学习挖掘数据内部信息（统计）；(b) 提出一种基于随机森林和线性链条件随机场的模型近似求解方法，RANLIST模型按降雨过程进行模型训练和测试，充分挖掘降雨过程的整体对于降雨量估计的作用；在真实数据上实验表明：本文提出算法与次优算法相比，显著提高了降雨量估计的准确性。
(a) 提出了一种针对强降雨回归问题的MOU混合重采样方法，首次综合利用多类不平衡过采样和多类不平衡降采样方法，处理不平衡回归样本；(b) 提出一种RRF集成回归方法。该方法能对过采样、降采样和混合采样分类方法进行扩展，用于处理回归问题。在真实数据集上的强降雨回归估计实验表明：所提出的样本混合重采样集成回归算法，在强降雨估计效果上优于传统的气象方法、支持向量机和随机森林机器学习方法、以及典型过采样、降采样和混合采样集成回归方法。
|英文摘要||High-resolution (about 1 km × 1 km) short-term (0-2 hours) rainfall forecasting is crucial for outdoor events such as the Olympic Games, parades and research activities like spacecraft launches, agricultural irrigation, water power generation, and flood warning. Precise rainfall estimation is an important basis of rainfall forecasting. However, traditional rainfall estimation methods are inaccurate and unstable due to the influence of wind speed, wind direction, terrain, meteorological conditions, precipitation phase, rainfall type, ground reflection clutter and other factors. Thus, they can not meet the requirements of rainfall estimation with high resolution. How to improve the accuracy of rainfall estimation is a very challenging research topic. Currently, the radar, satellite, and ground-based observation networks have been constructed and they are applied for collecting a large number of meteorological data. However, traditional methods for rain/no rain classification and rainfall intensity estimation only use a few parts of data. It can not meet the requirement of applications. In recent years, Random Forest method has being achieved remarkable results in competition fields and various practical applications, with the advantages of paralleling easily, not over-fitting on big data, using less computing resources compared with deep learning.|
In this thesis, Random Forest is employed as a basic algorithm for rainfall estimation. The problems of multi-source data fusion, spatiotemporal modeling and imbalanced regression have been researched. Firstly, for the problem of rain/no rain classification, this thesis proposes a model of multi-view weighted random forests (MVWRF) together with its inference method. It improves the performance of rain/no rain classification using data from radar, satellite and ground observation stations. Secondly, in order to estimate rainfall, machine learning methods of Random Forest and linear chain condition random fields are employed for building spatiotemporal model (RANLIST), which utilizes the spatial structure of radar reflectivity factors and time-series information of rain processes. A three-stage method also has been presented for optimizing RANLIST model. For the implementation of RANLIST model for rainfall estimation, rain processes are used as the basic units for training and testing processes. The spatial and temporal information is explored to improve the performance of rainfall estimation. Finally, for heavy rainfall estimation, a resampling method of multi-class over-sampling and under-sampling (MOU) and resampling based ensemble regression method (RRF) are proposed for handling imbalanced regression problems. Experiment results show that they improve the performance of heavy rainfall estimation.
The main contributions of this dissertation can be summarized as follows:
(1)MVWRF method is proposed for rain/no rain classification. The innovations are listed as follows: (a) We present a method for constructing multi-views. Several views such as VisPPI, VisPPI, VisSat and VisGround are constructed, so that multi-source data of radar, satellite and ground-based observation data, which has inconsistent spatial and temporal resolution, can be handled by a unified multi-view framework. (b) A MVWRF model is proposed based on multi-view data under the bayesian framework, which collaborates multi-source data and are used for rain/no rain classification. It explores the complementation of multi-sources data. Experiments show that this method can effectively merge multi-source data from radar, satellite and ground observation stations. Compared with meteorological methods, typical machine learning classification methods, and other multi-view methods, the presented method can improve the accuracy of rain/no rain classification.
(2) A new spatiotemporal model is proposed for radar-based rainfall estimation. The innovations in this process are described as follows: (a) A radar based rainfall estimation model RANLIST (structure \& statistical) is proposed, which utilizes both the spatial structure of the radar reflectivity factors, time series structure of the rainfall process (structure), and learn the internal structure included in the data by statistical machine learning method (statistical). (b) A spatial submodel based on random forest, a time series submodel based on linear chain random fields and a submodel fusion method are presented for optimizing RANLIST model. The rainfall processes are used as basic units for training and testing, which is benefit for exploiting the information including in the whole rainfall processes. Experiments show that the following results. Compared with suboptimal algorithm, the proposed method improves the accuracy of rainfall estimation obviously.
(3)A hybrid resampling ensemble regression algorithm is proposed for heavy rainfall estimation. The main innovations are as follows: (a) A MOU hybrid resampling method is proposed, which is the first time to utilize multi-class over-sampling methods and multi-undersampling methods for regression. (b) A RRF ensemble regression method is presented. This method can employ the existing typical oversampling methods, undersampling methods and hybrid sampling methods for handling regression problems. The performances of typical oversampling methods, undersampling methods and hybrid sampling methods are contrasted with MOU resampling method by experiments. Experiment results show that the proposed method can achieve the best performance in heavy rainfall estimation, which is superior to traditional meteorological methods, support vector machine and random forest, as well as typical over sampling, under sampling and hybrid sampling based ensemble regression methods.
(4)In this thesis, the formulations of rain/no rain classification, rainfall intensity estimation and heavy rainfall estimation are presented. The train and test workflow and framework are proposed. The proposed algorithms are applied over China. It realizes rainfall estimation with spatial resolution of 1 km × 1 km. It can also forecast rainfall every six minutes in the coming 0-2 hours with optical flow method.
|匡秋明. 基于随机森林的降雨估计模型与算法研究[D]. 北京. 中国科学院研究生院,2017.|