CASIA OpenIR  > 模式识别国家重点实验室  > 图像与视频分析
循环神经网络计算加速研究
李哲
Subtype硕士
Thesis Advisor程健
2019-05-30
Degree Grantor中科院自动化所
Place of Conferral中科院自动化所
Degree Discipline模式识别与智能系统
Keyword循环神经网络 二值化 低秩分解 加速
Abstract

近年来,循环神经网络在许多长序列输入的机器学习任务中都取得了良好的效果,包括机器翻译、文本分类、情感分析、图片生成、语言建模等。但是循环神经网络作为深度网络的一种,自身也存在着计算复杂度高的问题。这也使得循环神经网络推理时间长、能量消耗大、难以在移动端部署。这些问题都严重影响了循环神经网络的广泛应用,降低了人们的使用体验。为了克服上述问题,本文主要研究循环神经网络推理计算的优化,在保持网络模型进度不变或者略微降低的前提下,实现网络模型推理计算速度的提升。

本论文从循环神经网络的结构出发,分析常用的循环神经网络的结构特点和存在的问题,研究加速循环神经网络的方法。论文的主要工作和创新点可以总结为以下几点:

1.提出了基于经典长短时记忆网络(LSTM)的二值化门控网络模型

LSTM是现在最常用的循环神经网络,首次提出了门控单元的结构,改善了梯度爆炸和梯度消失的问题,在许多任务中都取得了当今世界最好的结果。但是门控单元的引入极大增加了网络的计算复杂度,这也让加速研究更有必要。本文首先分析了LSTM的网络结构,发现基于连续激活函数的门控结构不能很好地控制信息的流动。于是本文提出了包含二值输入门和二值遗忘门的LSTM。一方面二值门控单元可以使得网络的信息流动更加透明,本文可以更清楚地看到网络内部信息流动的情况;另一方面二值门控单元计算效率更高,可以加快网络的运行,并且函数更加鲁棒,权重函数的小变化不容易引起实验结果的改变。我们可以对二值门控单元的权重矩阵使用低秩分解等方法进行进一步加速,提升网络的推理速度。

2.提出了基于门控循环单元(GRU)的轻量化网络

GRU相对于LSTM减少了一个门控单元,更好地平衡了速度与实验效果之间的关系,但是计算复杂度依然很大。本文从GRU门控单元中信息的流动入手,发现GRU的两个门控单元之间存在着功能重复的问题。于是本文提出了使用二值输入门来代替GRU中原本的重置门,并且保留了更新门来对任务进行预测。这样可以使得两个门的功能更加分明,信息流动更加清晰。实验也证明了我们的模型在实验效果上相对GRU有了明显提升。同时二值输入门本身所需计算量较小,并且更加鲁棒,可以在上面采取低秩分解的方法进行加速,进一步提升网络的推理速度。我们的实验也表明我们的模型可以显著提升模型在文本分类和语言建模任务上的效果,并且减少网络推理时间。

Other Abstract

Recently, Recurrent Neural Networks (RNNs) have shown great promise in machine learning tasks involving sequential data, such as machine translation, document classification, sentiment analysis, image generation, language modeling. However, as a kind of deep networks, RNNs also have the problem of high computational complexity, which causes the problem that RNNs have long inference time, high energy consumption and difficult to deploy on the mobile side. All these problems seriously affect the widespread use of RNNs and reduce the user experience. In order to solve these problems, this paper mainly studies the optimization of inference calculation of RNN, and achieves the improvement of inference calculation speed of network on the premise of keeping the performance of network model unchanged or slightly reduced.

Based on the structure of RNNs, this paper analyzes the structural characteristics and existing problems of common RNNs, and studies the methods of accelerating RNNs. The contributions can be summarized as follows:

1. We propose binary-gated structure based on vanilla long short-term memory (LSTM).

LSTM is now the most popular RNNs. It proposed the gate functions for the first time, which alleviate the gradient exploding/vanishing problem. It has achieved state-of-the-art performance in many tasks. However, the introduction of gate functions greatly increases the computational complexity of the network, which also makes acceleration research much necessary. We first analyze the structure of LSTM, we find that gate functions based on continuous activation function can not control the flow of information well. So we propose the binary-valued gates LSTM that contains binary input gate and binary forget gate. On the one hand, the binary gate can make the information flow of the network more transparent, we can see the information flow more clearly. On the other hand, the calculation efficiency of the binary gates is higher. The network is more robust, because the small change of the weight function is not easy to cause the change of the experimental results. So we can further accelerate the weight matrix of the binary gates by using the method of low-rank factorization to improve the inference speed of the network.

2. We propose a lightweight network based on Gated Recurrent Units (GRU).

GRU reduces a gate compared with LSTM, which better balances the relationship between speed and experimental performance, but the computational complexity is still significant. Starting with the information flow in GRU gate functions, we find that there is a problem of duplication of functions between two gates of GRU. So we replace the reset gate in GRU by binary input gate and retain the update gate to make prediction for tasks. Thus the function of two gates is more distinct, and the flow of information is more clear. Experiments have also proved that our model has significantly improved in experimental results compared with GRU. The binary input gate has smaller computational complexity and is more robust, so it can be accelerated by low-rank factorization to further improve the inference speed of the network. Our experiments also show that our model can significantly improve the effectiveness of the model in document classification and language modeling tasks, and reduce inference time.

Pages82
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23771
Collection模式识别国家重点实验室_图像与视频分析
Recommended Citation
GB/T 7714
李哲. 循环神经网络计算加速研究[D]. 中科院自动化所. 中科院自动化所,2019.
Files in This Item:
File Name/Size DocType Version Access License
循环神经网络计算加速研究.pdf(2181KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李哲]'s Articles
Baidu academic
Similar articles in Baidu academic
[李哲]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李哲]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.