基于深度神经网络的大规模声学模型训练研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于深度神经网络的大规模声学模型训练研究
其他题名	Research on DNN-based large scale acoustic model training
	游钊
	2015-05-25
学位类型	工学博士
中文摘要	随着深度神经网络在大词汇量连续语音识别中的广泛应用，语音识别系统的性能较传统基于高斯混合模型的系统有了很大的提升，并达到了实际的应用要求。随着互联网上数据不断地积累，语音数据从最早的几十小时增长到现在的上万小时，目前数据量还在不断地增加。如何利用如此大规模的语音数据快速地训练语音识别系统成为一个急迫要解决的问题。本论文针对基于深度神经网络的大规模声学模型训练问题和在语音识别的具体应用中遇到的问题进行了深入的探索和研究，取得的主要研究成果和创新点有： 1. 针对DNN预训练的算法进行了研究，提出将基于深层玻尔兹曼机的预训练模型应用于连续语音识别系统的深度神经网络训练中。在TIMIT数据集的phone识别任务中，基于深层玻尔兹曼机的深度神经网络和基于深层信度网络的深度神经网络相比，在核心测试集上PER相对下降了3.8%。 2. 针对采用单台服务器多GPU进行DNN训练方面，提出将基于均值随机梯度下降的one pass learning算法应用到深度神经网络的训练中。并提出将one pass learning算法和异步的并行方式相结合，使得该算法能在多GPU上运行。基于均值随机梯度下降的one pass learning算法和异步随机梯度算法相比训练速度提升了5.3倍。 3. 在研究DNN的分布式训练方面，提出了基于Stochastic Hessian Free算法的GPU集群训练方式，解决了异步并行算法中的机器之间通讯带宽要求较高的问题，并且和异步并行算法相比明显地提升了训练速度。 4. 涉及到多通道混合数据训练方面，本文提出了基于DNN自适应的方法来进行多通道混合训练，取得了比特征补零方式的DNN多通道混合训练方法更好的识别性能。进一步，本文采用基于奇异值分解的DNN训练加速方法，在24块GPU卡构成的GPU集群上，仅用7天时间完成了7500小时多通道语音数据的混合训练。
英文摘要	In the past few years, Deep Neural Network (DNN) has been widely used in large vocabulary continuous speech recognition (LVCSR). Meanwhile, the DNN-based acoustic model achieves significant improvement over traditional GMM-based models, and further promotes the speech recognition system to satisfy the requirements of practical applications. With the development of internet, the amount of speech training data increases explosively from dozens of hours to thousands of hours nowadays. Hence, it becomes an urgent problem to exploit such large scale of speech data to train a high-performance recognition system efficiently. In this thesis, we study on the issues of DNN-based large-scale acoustic model training and several specific application problems of speech recognition technology. The main work and contributions include: 1. For the DNN pre-training problem, we propose to apply the Deep Boltzmann Machine (DBM) pre-training model for the DNN training procedure in LVCSR. In the task of phone recognition on TIMIT dataset, the DBM-DNN achieves 3.8% relative PER reduction on the core test set comparing with Deep Belief Network based DNN (DBN-DNN). 2. To train DNN on multi-GPUs from single-server, we propose to apply the one pass learning algorithm based on average stochastic gradient descent (ASGD) to the DNN training procedure. Furthermore, by combining with the asynchronous parallel mode, one pass learning algorithm successes to operate on multiple GPUs of single-server. The asynchronous ASGD algorithm accelerates the DNN training speed by 5.3 times, comparing with asynchronous stochastic gradient algorithm. 3. For the distributed DNN training problem, we propose a novel GPU cluster training pattern based on the Stochastic Hessian Free (SHF) algorithm, and effectively solve the problem of demanding high communication bandwidth between machines in the asynchronous parallel algorithm. Specially, the SHF algorithm speeds up the DNN training procedure on GPU cluster obviously comparing with asynchronous parallel algorithm. 4. For the mixed-bandwidth training problem, we propose an DNN adaptation approach to train DNN on mixed-bandwidth speech data, and achieves better performance than the feature zero-padding based mixed-bandwidth training methods. Besides, by exploiting the singular value decomposition (SVD) algorithm, we accomplish training DNN on the GPU cluster with 24 GPUs with 7500 hours of mixed-bandwidth speech data in seven days.
关键词	深度神经网络声学模型深层玻尔兹曼机分布式并行训练多通道混合训练 Deep Neural Network Acoustic Model Deep Boltzmann Machine Distributed Parallel Training Mixed-bandwidth Training
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6679
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	游钊. 基于深度神经网络的大规模声学模型训练研究[D]. 中国科学院自动化研究所. 中国科学院大学,2015.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20121801462808（1685KB）			暂不开放	CC BY-NC-SA