软硬件协同的高效DNN加速器研究

CASIA OpenIR > 毕业生 > 博士学位论文

	软硬件协同的高效DNN加速器研究
	李繁荣
	2022-05-23
页数	120
学位类型	博士
中文摘要	近年来深度神经网络在工业界的各种应用中大放异彩，但是由于DNN计算和存储密集的特点，在传统计算平台上难以高效处理，设计DNN加速器来实现高效计算对DNN走进现实生活的方方面面具有重大的研究意义和应用价值。许多研究发现DNN中有较多的冗余存在，并提出了压缩优化方法，包括精简模型设计、DNN量化技术和模型稀疏化方法，但是将这些方法与DNN加速器结合来提升执行性能时，依然面临许多问题。为此，本文采用软硬件协同的方法，围绕更好地利用模型冗余来设计DNN加速器展开了研究：针对DNN加速器执行精简网络模型效率低的问题，提出了面向精简网络模型的高效DNN加速器设计。首先提出了硬件友好的幂次量化方法，将加速器中的乘法器由移位逻辑来代替，降低硬件设计复杂度；同时，提出了支持深度可分离卷积高效计算的层融合数据流设计，以及对模型计算更加灵活高效的混合数据流设计，使DNN加速器高效地完成精简网络的加速计算；最后基于该加速器搭建了低功耗目标检测系统，并实验验证了加速器设计的高效性。针对无规则稀疏DNN加速器难高效的问题，提出了支持稀疏参数和稀疏激活计算的无规则稀疏加速器FSA。FSA采用细粒度脉动数据流，有效缓解无规则稀疏特性造成的访存冲突问题；同时，引入混合神经网络分块方法，与高效灵活的片上网络相结合，有效缓解计算碎片化问题和负载不均衡对计算性能的影响；此外，提出了自动调度搜索方法，快速搜索最优计算调度策略，进一步提升FSA的能效。实验表明，FSA能够大幅提升稀疏模型计算的执行性能和能量效率。针对稀疏加速器难以兼顾模型精度与加速器性能和面积的问题，本文对动态结构化稀疏方法进行了研究，首先提出了动态双门控方法DGNet，利用卷积计算中空间和通道维度的稀疏特性，来权衡模型精度与计算量的关系。然后提出了支持动态结构化稀疏计算的加速器DynSpar，通过对门控函数和收集操作的高效设计，实现动态结构化稀疏模型的高效处理。实验表明，DGNet与DynSpar结合可以以较小的硬件代价实现模型精度与加速器性能之间更好的权衡。
英文摘要	In recent years, deep neural networks have been widely used in various applications in the industry. However, due to the intensive computing and memory footprint of DNNs, it is not easy to process them on traditional computing platforms efficiently. Therefore, designing DNN accelerators to realize efficient processing has great research significance and application value for DNNs to enter every aspect of real life. Many studies have found much redundancy in DNN, and proposed model compression methods, including compact model design, DNN quantization technologies, and pruning methods. However, there are still many problems when combining these methods with DNN accelerator for better performance. To this end, this paper adopts the method of hardware/software co-design to conduct research on DNN accelerators with better use of the model redundancy: In response to the problem of the low efficiency of DNN accelerators for compact models, a dedicated DNN accelerator is proposed. First, a hardware-friendly power-of-two quantization method is proposed, which can replace the multipliers in the accelerator with shift logic to reduce the hardware design complexity. At the same time, we propose a fused-layer pipeline dataflow to support efficient computation of depthwise separable convolutions and a hybrid dataflow that is more flexible for model processing, enabling the DNN accelerator to process those compact models efficiently. Finally, based on the accelerator, we build a low-power object detection system and experimentally verify the efficiency of the accelerator. In response to the problem that irregular sparse DNN accelerators are challenging to process efficiently, we propose an irregular sparse accelerator FSA, which supports computation with sparse weights and sparse activations. FSA uses fine-grained systolic dataflow to reduce contentions caused by irregular sparse patterns. At the same time, FSA employs a hybrid network partitioning, combined with the flexible network-on-chip, and it can alleviate the fragmentation problem and load imbalance. In addition, we propose an automatic scheduling search strategy, which can quickly search for the optimal computing scheduling strategy to further improve the energy efficiency of FSA. Experiments show that FSA can greatly improve the performance and energy efficiency of processing sparse neural networks. In response to the problem that irregular sparse accelerators are difficult to tradeoff model accuracy with accelerator performance and area, this paper studies the dynamic structured sparse methods. First, DGNet, a dynamic dual gating method, is proposed, which leverages both spatial and channel sparsities in the convolutional layers to tradeoff accuracy and computational complexity. Then, DynSpar, a sparse accelerator, is proposed to support dynamic structured sparse computation. It designs efficient modules for the gating function and the gather operation to realize efficient processing of the dynamic structured sparse models. Experiments show that by combining DynSpar with DGNet, a better tradeoff between model accuracy and accelerator performance can be achieved with a small hardware overhead.
关键词	DNN加速器软硬件协同设计体系结构深度神经网络
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48570
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李繁荣. 软硬件协同的高效DNN加速器研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
软硬件协同的高效DNN加速器研究_sig（4190KB）	学位论文		限制开放	CC BY-NC-SA