基于订单数据挖掘的城市出租车需求预测研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 平行智能技术与系统团队

	基于订单数据挖掘的城市出租车需求预测研究
	张驰展
	2021-05-26
页数	72
学位类型	硕士
中文摘要	出租车因其灵活、便捷的特点，逐渐成为备受城市居民喜爱的一种出行方式。随着在线打车平台的出现，“车寻人”的传统出租车运营形式逐步转变为乘客叫单和司机接单的“按需出行”模式，并催生了网约车和顺风车等行业。然而由于司机和乘客之间供需关系的时空动态变化性，“打车难”的情况时有发生。通过挖掘历史打车订单数据中的有效信息并精确预测未来的打车需求分布，可提前合理配置出租车 (网约车) 资源，缓解供需不匹配的问题，对于提升居民的出行效率具有重要的意义。出租车需求在时间上波动性较强且受早晚高峰影响，空间上分布差异大且区域间依赖关系复杂，同时还受到天气、节假日等因素的影响，因此精确的出租车需求预测面临着严峻的挑战。本论文利用纽约市出租车订单记录数据集，对出租车上下车需求的相关性、不同区域间需求的异质性进行了分析建模，并基于深度学习理论发展出两种新颖的出租车需求预测模型。本文的主要研究工作如下： 1. 出租车订单记录数据处理与分析。通过对时间和空间离散化得到出租车需求的定义，并据此对订单记录进行处理和汇总，构建可用于出租车需求预测的标准数据集；对出租车需求的时空分布进行可视化，以直观理解城市居民的出行特点，并简单分析出租车需求的变化规律。 2. 基于上下车相关性和多任务学习的出租车需求预测。考虑到出租车需求在一天中不同时间段具有明显不同的变化趋势，设计了一种时间特征编码器，可从出租车需求数据中提取出与时间相关的特征表示；在此基础上，考虑到出租车上车需求和下车需求的时空相关性，提出了一种并行多任务深度学习模型来对出租车上下车需求进行协同预测；实验结果表明，该模型可有效利用两种需求间的相关性并提升预测精度。 3. 基于区域异质性和多层次学习的出租车需求预测。针对出租车需求的区域异质性，基于 Spearman 相关系数和成对聚类理论，开发出一种出租车区域聚类算法，可以将出租车需求相关性较高的区域划分为同一个簇；提出了一种多层次循环神经网络预测模型，包含簇级别预测网络和全局级别预测网络两部分，分别提取簇内相关性特征和全局共享特征，并使用加权的均方误差来计算损失函数；实验结果表明，该模型可有效感知区域异质性信息并提升预测性能。
英文摘要	Taxis have gradually become a popular way of travel by urban residents thanks to its flexibility and convenience. With the emergence of online ride-hailing platforms, the traditional taxi operation form of “car-seeking”has gradually transformed into the“Mobility-on-Demand”mode by passengers calling and drivers taking orders, giving birth to the online car-hailing and ride-sharing industry. However, due to the spatio-temporal dynamic variability of the supply-demand relationship between drivers and passengers, sometimes it is difficult to take a taxi. By mining the valid information in historical taxi-hailing order data and accurately predicting the distribution of future taxi demand, it is possible to reasonably allocate taxi (online car-hailing) resources in advance and alleviate the problem of mismatch between supply and demand, which is of great significance for improving the travel efficiency of residents. Taxi demand fluctuates strongly in time and is affected by morning and evening peaks, with large spatial distribution differences and complex inter-regional dependencies. At the same time, it is also affected by factors such as weather and holidays. Therefore, its accurate prediction is facing severe challenges. In this paper, the correlations of taxi pick-up and drop-off demand are analyzed and modeled based on the New York City taxi order records dataset. In addition, the heterogeneity of taxi demand among different regions is explored and investigated. Furthermore, two novel taxi demand prediction methods based on deep learning theory are proposed by this paper. The main research works of this paper are as follows: 1. Taxi order records data processing and analysis. The definition of taxi demand is given by time and space discretization. A standard dataset is constructed by processing and summarizing the order records accordingly, which can be used for taxi demand forecasting. The temporal and spatial distribution of taxi demand is visualized to intuitively understand the travel characteristics of urban residents and make a qualitative analysis of the changing laws of taxi demand. 2. Taxi demand prediction by taxi pick-up/drop-off correlation and multi-task learning. Considering that taxi demand has different changing trends at various time slots of the day, a time feature encoder is designed to extract time-related feature representations from taxi demand data. Based on this, considering the spatiotemporal correlation of taxi pick-up and drop-off demand, a parallel multi-task recurrent neural networks prediction model is proposed to co-predict the two kinds of taxi demand. The experimental results show that the model can effectively use the correlation between the two demands and improve the prediction accuracy. 3. Taxi demand forecast based on regional heterogeneity and multi-level learning. Considering the regional heterogeneity of taxi demand, a taxi zone clustering algorithm is developed based on Spearman’s correlation coefficient and pairwise clustering theory, which can divide the city regions with a high correlation of taxi demand into one cluster. A multi-level recurrent neural networks prediction model is proposed, which includes cluster-level prediction module and global-level prediction module, respectively extracting intra-cluster correlation features and global shared features. In addition, the mixed weighted mean square error is utilized to calculate the loss function. Experimental results show that the model can effectively perceive regional heterogeneity information and improve prediction performance.
关键词	出租车需求预测，数据挖掘，多任务学习，长短期记忆网络，深度学习
语种	中文
七大方向——子方向分类	人工智能+交通
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44850
专题	多模态人工智能系统全国重点实验室_平行智能技术与系统团队
推荐引用方式 GB/T 7714	张驰展. 基于订单数据挖掘的城市出租车需求预测研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
张驰展毕业论文-最终版.pdf（18350KB）	学位论文		开放获取	CC BY-NC-SA