CASIA OpenIR  > 毕业生  > 博士学位论文
精准降水估计与预报的机器学习方法研究
杨雪冰1,2
学位类型工学博士
导师张文生
2018-05-15
学位授予单位中国科学院研究生院
学位授予地点北京
关键词降水估计 降水预报 随机森林 不平衡数据 标签分布学习
摘要    精准的降水估计与预报要求时空高分辨率与定量高精度,对水文监测、洪涝减灾、环境治理、科研活动、工农业生产、电力系统等方面具有重要意义。由于非线性的动态天气系统具有高度不确定性,传统基于物理模型和统计分析的气象方法在降水估计与预报中存在性能瓶颈,难以满足高分辨率条件下的精度要求,如何提升降水估计与预报的精准性在研究和应用领域都具有挑战性。随着气象技术的进步,我国现已建成覆盖全国的雷达、卫星、地面气象要素监测网络并发展了多样的数值模式预报,积累的气象数据呈爆炸式增长,为采用机器学习方法处理气象问题带来了机遇。因此,亟需研究新的机器学习方法,通过数据驱动的方式结合气象领域先验知识有效地挖掘气象大数据提升降水估计与预报准确性以满足应用需求。
     本论文旨在回答精准降水估计与预报需要解决的三个关键问题,包括:定量降水估计、降水相态识别以及降水集合预报。本文从气象问题出发,研究了随机森林、不平衡数据采样、标签分布学习用于上述问题的方法。针对定量降水估计,提出了基于地形的加权随机森林(Terrain-based Weighted Random Forests, TWRF)方法,利用高分辨率雷达观测数据,考虑有效表征降水特点的雷达反射率垂直廓线在不同高度层的特征重要性,建立回归模型,并在建模中考虑了地形对降水的影响,提升了定量降水估计的准确性;针对降水相态识别,提出了基于马氏距离的适应性过采样(Adaptive Mahalanobis Distance-based Over-sampling, AMDO)方法,通过为分类器有效合成新样本应对降水相态识别中存在的多类、混合类型特征、混叠以及不平衡问题,提升了降水相态识别的准确性;针对降水集合预报,提出了基于气候概率的标签分布学习(Label Distribution Learning with Climate Probability, LDLCP)方法,利用多源数值模式预报,考虑天气系统的不确定性、气候在概率意义上的相似性以及预报变量之间的关联性,通过标签分布学习建立优化模型并给出相应的预报方法,提升了降水集合预报的准确性。以上三种方法在实际数据集及公开数据集上与当前主流的气象方法和相关的机器学习方法对比,取得了显著的性能提升,并已在业务中部署应用,为公众提供气象服务。
    主要工作和创新点如下: 
  (1)提出了一种基于地形的加权随机森林降水估计方法(TWRF)。针对雷达定量降水估计性能不佳,该方法首先扩展雷达特征为整个反射率因子垂直廓线,进而提出加权随机森林回归模型改进传统随机森林,以随机决策树中节点拆分时的特征被选概率反映不同高度处理想条件的满足程度,并提出基于地形的建模策略,考虑了地形增强效应,有效提升复杂地形区域的降水估计效果。在实际降水过程上的实验表明:所提方法在偏差率、均方根误差、平均绝对偏差、平均偏差以及相关系数等评价指标上均优于对比方法。本方法中提出的数据处理、模型训练和测试的执行流程已工程化实现,形成气象产品。
  (2)提出了一种基于马氏距离的适应性过采样相态识别方法(AMDO)。针对降水相态识别少类精确率低,该方法提出推广已有的基于马氏距离的过采样方法应对相态识别这种含有混合特征、带混叠的多类不平衡问题,进而提出部分平衡重采样并优化适应于主成分空间的采样,改善采样策略并提升采样效率,计算复杂度近似为O(NlogN)。在15个公开数据集以及2个实际数据集与主流方法进行对比,结果证实了所提方法处理多类不平衡问题的优越性,对相态识别提升了43.74%的少类精确率以及10.84%的平均精确率。本方法中提出的采样与多分类执行流程已工程化实现,形成气象产品。 
  (3)提出了一种基于气候概率的标签分布学习集合预报方法(LDLCP)。针对降水集合预报效果较差,该方法首次将集合预报问题形式化为标签分布学习问题,不依赖分布假设,优化预报分布与气候概率一致,进而提出新的针对回归问题的损失函数用于集合预报,并提出采用联合预报变量与相关变量共同建模,提升了模型的可解释性与预报能力。在实际降水集合预报数据上与主流方法的对比实验表明:所提方法取得了最优效果,对于降水预报降低了43.9%的均方根误差以及37.4%的平均CRPS。本方法提出的集合预报数据组织、处理、模型参数优化和预报输出的执行流程已工程化实现,正在试点应用。
其他摘要Accurate precipitation estimation and forecast with the requirement of high spatial-temporal resolution and quantitative accuracy are crucial for hydrological monitoring, flood mitigation, environmental governance, research activities, industrial and agricultural production, electric system, etc. Due to the high uncertainty within the nonlinear dynamic weather system, the performance of precipitation estimation and forecast by conventional meteorological methods based on physical models and statistical analysis is unsatisfactory. How to improve the accuracy of precipitation estimation and forecast of high resolution is challenging, both for research and operational applications. Currently, with the development of meteorological technologies, the national radar, satellite, and ground-based observation networks have been constructed and various numerical weather forecasts have been developed. As a result, a large number of meteorological data are collected, opening opportunities for considering meteorological problems in a machine learning perspective. Therefore, new machine learning methods which mining the meteorological big data through the combination of data-driven approach and meteorological expertise are required to make progress for precipitation estimation and forecast and meet the requirement of applications.
    This dissertation takes three key issues in precipitation estimation and forecast into account, namely, quantitative precipitation estimation, precipitation phase recognition, and ensemble forecast of precipitation. Based on the analysis of meteorological problems, the methods and applications of random forests, resampling for imbalanced data, and label distribution learning have been researched. For the issue of quantitative precipitation estimation, the Terrain-based Weighted Random Forests (TWRF) method is proposed. With high spatial-temporal radar observations, TWRF regards the vertical profile of reflectivity as features to depict the characteristics of precipitation, assigns weights for reflectivity at different heights to depict feature importance when constructing the regression model, and considers the influence of terrain. As a result, TWRF can improve the performance of quantitative precipitation estimation. For the issue of precipitation phase recognition, the Adaptive Mahalanobis Distance-based Over-sampling (AMDO) method is proposed. In nature, this issue is a multi-class imbalanced classification problem with mixed-type attributes and overlapping. AMDO can effectively synthesize new appropriate samples for classifiers and improve the performance of precipitation phase recognition. For the issue of ensemble forecast of precipitation, the Label Distribution Learning with Climate Probability (LDLCP) method is proposed. With various numerical weather forecasts, LDLCP takes the uncertainty of weather system, the probabilistic similarity of climate, and the relevance between forecast variables into account and develops a optimization model using label distribution learning paradigm. The forecast obtained by LDLCP shows promising performance for ensemble forecast of precipitation. The above methods are evaluated using benchmark data sets and real data sets and compared with conventional meteorological methods and representative machine learning methods. The experimental results validate the superiority of the proposal. Further, the proposed methods have been used in operation, to provide meteorological service for the public. 
    The main contributions of this dissertation can be summarized as follows: 
    (1)A Terrain-based Weighted Random Forests (TWRF) method is proposed for radar quantitative precipitation estimation. This method extends the radar features as the entire vertical profile of reflectivity. To measure how the ideal condition is relaxed at different heights, the weighted random forests model is proposed to improve the performance of basic random forests via assigning weights for features selected in each node of the individual tree. Due to the orographic enhancement of precipitation, terrain difference is considered for model implementation to improve the performance in the area of complex terrain. Experimental results on real precipitation processes show that compared with other competitors, TWRF achieves the best results for BIAS, RMSE, MAE, MB and CC. In addition, the proposed data preprocessing and workflow of model training and testing have been implemented in operation and used for meteorological product.
    (2)An Adaptive Mahalanobis Distance-based Over-sampling (AMDO) method is proposed for precipitation phase recognition. This method inherits the core idea of mahalanobis distance-based over-sampling and extends it for precipitation phase recognition, which in nature is a multi-class imbalanced classification problem with mixed-type attributes and overlapping. Moreover, AMDO develops partially balanced resampling and optimizes the sample synthesis in principal component space, to further improve the effectiveness and efficiency of over-sampling. The approximate computational complexity of AMDO is O(NlogN). Extensive experimental testing is performed on 15 benchmark data sets and 2 real data sets with several competitors. The results validate the superiority of AMDO for multi-class imbalanced problems. For precipitation phase recognition, improvements of 43.74% in minority precision and 10.84% in average precision are obtained by AMDO. In addition, the proposed workflow of sampling and multi-classification has been implemented in operation and used for meteorological product.     
    (3)A Label Distribution Learning with Climate Probability (LDLCP) method is proposed for ensemble forecast of precipitation. LDLCP formulates this issue as a label distribution learning problem for the first time, aiming to optimize the forecast distribution to be consistent with local climate without distribution assumption. A specialized target function for regression task is proposed for ensemble forecast. Moreover, LDLCP jointly utilize the forecast variable and relevant variable for modeling, to improve the interpretability and generalization performance. Experimental results on real data sets show that the proposed method performs the best compared with other competitors. For ensemble forecast of precipitation, LDLCP can reduce 43.9% RMSE and 37.4% average CRPS. Currently, the proposed data organization and processing and workflow of model optimization and forecasting have been implemented for early access.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/20941
专题毕业生_博士学位论文
作者单位1.中国科学院自动化研究所
2.中国科学院大学
推荐引用方式
GB/T 7714
杨雪冰. 精准降水估计与预报的机器学习方法研究[D]. 北京. 中国科学院研究生院,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
杨雪冰博士论文-CASIA-final.(7999KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[杨雪冰]的文章
百度学术
百度学术中相似的文章
[杨雪冰]的文章
必应学术
必应学术中相似的文章
[杨雪冰]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。