|Place of Conferral||北京|
|Keyword||降水估计 降水预报 随机森林 不平衡数据 标签分布学习|
本论文旨在回答精准降水估计与预报需要解决的三个关键问题，包括：定量降水估计、降水相态识别以及降水集合预报。本文从气象问题出发，研究了随机森林、不平衡数据采样、标签分布学习用于上述问题的方法。针对定量降水估计，提出了基于地形的加权随机森林(Terrain-based Weighted Random Forests, TWRF)方法，利用高分辨率雷达观测数据，考虑有效表征降水特点的雷达反射率垂直廓线在不同高度层的特征重要性，建立回归模型，并在建模中考虑了地形对降水的影响，提升了定量降水估计的准确性；针对降水相态识别，提出了基于马氏距离的适应性过采样(Adaptive Mahalanobis Distance-based Over-sampling, AMDO)方法，通过为分类器有效合成新样本应对降水相态识别中存在的多类、混合类型特征、混叠以及不平衡问题，提升了降水相态识别的准确性；针对降水集合预报，提出了基于气候概率的标签分布学习(Label Distribution Learning with Climate Probability, LDLCP)方法，利用多源数值模式预报，考虑天气系统的不确定性、气候在概率意义上的相似性以及预报变量之间的关联性，通过标签分布学习建立优化模型并给出相应的预报方法，提升了降水集合预报的准确性。以上三种方法在实际数据集及公开数据集上与当前主流的气象方法和相关的机器学习方法对比，取得了显著的性能提升，并已在业务中部署应用，为公众提供气象服务。
|Other Abstract||Accurate precipitation estimation and forecast with the requirement of high spatial-temporal resolution and quantitative accuracy are crucial for hydrological monitoring, flood mitigation, environmental governance, research activities, industrial and agricultural production, electric system, etc. Due to the high uncertainty within the nonlinear dynamic weather system, the performance of precipitation estimation and forecast by conventional meteorological methods based on physical models and statistical analysis is unsatisfactory. How to improve the accuracy of precipitation estimation and forecast of high resolution is challenging, both for research and operational applications. Currently, with the development of meteorological technologies, the national radar, satellite, and ground-based observation networks have been constructed and various numerical weather forecasts have been developed. As a result, a large number of meteorological data are collected, opening opportunities for considering meteorological problems in a machine learning perspective. Therefore, new machine learning methods which mining the meteorological big data through the combination of data-driven approach and meteorological expertise are required to make progress for precipitation estimation and forecast and meet the requirement of applications.|
This dissertation takes three key issues in precipitation estimation and forecast into account, namely, quantitative precipitation estimation, precipitation phase recognition, and ensemble forecast of precipitation. Based on the analysis of meteorological problems, the methods and applications of random forests, resampling for imbalanced data, and label distribution learning have been researched. For the issue of quantitative precipitation estimation, the Terrain-based Weighted Random Forests (TWRF) method is proposed. With high spatial-temporal radar observations, TWRF regards the vertical profile of reflectivity as features to depict the characteristics of precipitation, assigns weights for reflectivity at different heights to depict feature importance when constructing the regression model, and considers the influence of terrain. As a result, TWRF can improve the performance of quantitative precipitation estimation. For the issue of precipitation phase recognition, the Adaptive Mahalanobis Distance-based Over-sampling (AMDO) method is proposed. In nature, this issue is a multi-class imbalanced classification problem with mixed-type attributes and overlapping. AMDO can effectively synthesize new appropriate samples for classifiers and improve the performance of precipitation phase recognition. For the issue of ensemble forecast of precipitation, the Label Distribution Learning with Climate Probability (LDLCP) method is proposed. With various numerical weather forecasts, LDLCP takes the uncertainty of weather system, the probabilistic similarity of climate, and the relevance between forecast variables into account and develops a optimization model using label distribution learning paradigm. The forecast obtained by LDLCP shows promising performance for ensemble forecast of precipitation. The above methods are evaluated using benchmark data sets and real data sets and compared with conventional meteorological methods and representative machine learning methods. The experimental results validate the superiority of the proposal. Further, the proposed methods have been used in operation, to provide meteorological service for the public.
The main contributions of this dissertation can be summarized as follows:
(1)A Terrain-based Weighted Random Forests (TWRF) method is proposed for radar quantitative precipitation estimation. This method extends the radar features as the entire vertical profile of reflectivity. To measure how the ideal condition is relaxed at different heights, the weighted random forests model is proposed to improve the performance of basic random forests via assigning weights for features selected in each node of the individual tree. Due to the orographic enhancement of precipitation, terrain difference is considered for model implementation to improve the performance in the area of complex terrain. Experimental results on real precipitation processes show that compared with other competitors, TWRF achieves the best results for BIAS, RMSE, MAE, MB and CC. In addition, the proposed data preprocessing and workflow of model training and testing have been implemented in operation and used for meteorological product.
(2)An Adaptive Mahalanobis Distance-based Over-sampling (AMDO) method is proposed for precipitation phase recognition. This method inherits the core idea of mahalanobis distance-based over-sampling and extends it for precipitation phase recognition, which in nature is a multi-class imbalanced classification problem with mixed-type attributes and overlapping. Moreover, AMDO develops partially balanced resampling and optimizes the sample synthesis in principal component space, to further improve the effectiveness and efficiency of over-sampling. The approximate computational complexity of AMDO is O(NlogN). Extensive experimental testing is performed on 15 benchmark data sets and 2 real data sets with several competitors. The results validate the superiority of AMDO for multi-class imbalanced problems. For precipitation phase recognition, improvements of 43.74% in minority precision and 10.84% in average precision are obtained by AMDO. In addition, the proposed workflow of sampling and multi-classification has been implemented in operation and used for meteorological product.
(3)A Label Distribution Learning with Climate Probability (LDLCP) method is proposed for ensemble forecast of precipitation. LDLCP formulates this issue as a label distribution learning problem for the first time, aiming to optimize the forecast distribution to be consistent with local climate without distribution assumption. A specialized target function for regression task is proposed for ensemble forecast. Moreover, LDLCP jointly utilize the forecast variable and relevant variable for modeling, to improve the interpretability and generalization performance. Experimental results on real data sets show that the proposed method performs the best compared with other competitors. For ensemble forecast of precipitation, LDLCP can reduce 43.9% RMSE and 37.4% average CRPS. Currently, the proposed data organization and processing and workflow of model optimization and forecasting have been implemented for early access.
|First Author Affilication||Institute of Automation, Chinese Academy of Sciences|
|杨雪冰. 精准降水估计与预报的机器学习方法研究[D]. 北京. 中国科学院研究生院,2018.|
|Files in This Item:|
|杨雪冰博士论文-CASIA-final.（7999KB）||学位论文||暂不开放||CC BY-NC-SA||Application Full Text|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.