面向通用通信基带处理器的专用协处理器研究与设计

CASIA OpenIR > 毕业生 > 博士学位论文

	面向通用通信基带处理器的专用协处理器研究与设计
	赵旭莹
	2017-11-30
学位类型	工学博士
英文摘要	主流通信基带处理器大多采用协处理器对某些实时性要求高，但不适于矢量处理的复杂算法进行加速。随着通信技术发展和日益增长的数据速率需求，协处理器中加速引擎种类和数目不断增加。加速引擎互联关系和协处理器调度模式直接影响处理器整体性能，成为一个研究热点。研究和设计具有自主知识产权的高性能、低功耗和高可靠性芯片是一个巨大挑战。论文对无线通信系统中不适于矢量处理的信道译码算法进行研究，首先在通用通信基带处理器现有体系结构基础上，提出了一种新型二维可配置协处理器架构；然后对协处理器中turbo译码器、viterbi译码器和polar译码器进行了设计及优化，包括：提出了一种基于二阶差分辅助的CRC校验停止准则，改善turbo译码器在信号质量差或突发错误下无用功迭代、设计了一种支持多标准的高性能可配置viterbi译码器，以及提出了一种路径扩展优化方法和新型路径删减策略，有效降低了polar译码延迟。具体包括以下工作和创新点： 1. 提出了一种新型二维可配置协处理器架构，极大降低了互连网络功耗和总线带宽占用比。针对主流协处理器架构存在互连网络功耗大、协处理器调度频繁等问题，提出了一种面向通信处理器的新型二维可配置协处理器架构。通过将加速引擎分簇，并以特定工作模式重新编程加速引擎内部联结关系，使协处理器在灵活度和可靠性方面达到平衡。第一维配置为工作模式和协处理器公共参数配置，由主处理器发起，协处理器实时响应；第二维配置为加速引擎私有参数配置，由主处理器离线完成。通过功耗评估模型，总线互连网络功耗仅为主流通信处理器架构的1/3；对于无线通信标准数据帧处理，总线带宽占用比由6.88%降到2.05%。新型协处理器架构的提出为通信处理器低功耗、低复杂度设计提供了有益探索。 2. 提出了一种基于二阶差分的CRC校验停止准则，有效降低了turbo译码器在信号质量差或突发错误下无用功迭代次数。针对在传输环境较差或发生突发错误时，turbo译码器迭代多次而译码结果不理想的问题，提出了一种基于二阶差分的CRC校验提前退出迭代方法。该方法通过对传递信息进行二阶差分值计算，可以提前感知信道情况并及时退出迭代。仿真实验表明：与常规CRC校验停止准则相比，该方法在信道恶劣情况下，turbo译码器平均迭代次数下降约20%。 3. 针对目前多标准viterbi译码器吞吐不高的问题，设计了一种支持多标准的高性能可配置viterbi译码器，适用于不同场合的卷积码译码。该译码器支持编码约束长度为5~9，码率为1/2,1/3,1/4，支持零结尾和咬尾。译码器峰值吞吐为1.15Gbps@6144bit,600MHz。主流商用viterbi译码器VCP2，数据处理能力为9.5Mbps@40bit,333MHz，本文中译码器数据处理能力为32.173Mbps@40bit,333MHz，性能提升约3.3倍，可满足日益增长的数据量处理需求。 4. 提出了一种路径扩展优化方法和新型路径删减策略，有效降低了polar译码延迟。针对连续消除列表（Successive Cancellation List，SCL）算法译码延迟比较大的问题，提出了一种路径扩展优化方法，避免冗余路径分裂，有效降低了译码延迟，同时理论证明该优化方法在译码性能方面没有任何损失。此外，提出了一种基于置信区间的新型删减策略，降低了SCL译码复杂度。仿真表明路径扩展优化方法可以有效降低路径分裂数目，最高可达49%；在性能损失可忽略情况下，新型删减策略可以降低搜索路径数目，在中SNR区间可以降低60%，高SNR区间可以降低80%。; Most of the mainstream communication baseband processors adopt coprocessors to accelerate complex algorithms which are not suitable for the vector processing but with high requirement of real-time. With the development of communication technology and the increasing demand of data rate, the types and numbers for acceleration engines in the coprocessors continuously increase. The interconnection of acceleration engines and the coprocessor scheduling technology directly affect the overall performance of the processor, becoming a hot research topic. The research and design of high performance, low power consumption, and high reliability chips with independent intellectual property rights is a huge challenge. In this paper, we focus on the channel decoding algorithms which have the high computational complexity in wireless communication system. Firstly, based on the present architecture of the general purpose communication processor, a novel two level reconfigurable CSCP architecture is proposed; then, the acceleration engines in the CSCP such as turbo decoder, viterbi decoder, and polar decoder are designed and optimized, including: proposing a second order difference aided CRC check stopping criterion to reduce the iteration numbers when the transmission environment is bad, designing a reconfigurable high performance viterbi decoder which supports multiple standards, and proposing an optimized path expansion method and a novel tree-pruning scheme to reduce the polar decoding latency. The main work and contributions are as follows. We propose a novel two level reconfigurable CSCP architecture, reducing the power consumption of the bus interconnection network and the bandwidth utilization ratio. Based on the shortcomings of the current commercial communication digital signal processor architecture, a novel two level reconfigurable coprocessor architecture for communication processors is proposed, and of which the internal connections of the acceleration engines are reprogrammed in a specific work mode by clustering. The first level configuration includes coprocessor work mode and coprocessor common parameters. The coprocessor is initiated by the main processor and responds in real time. The second level configuration includes the private parameters for each acceleration engine and it is accomplished by the main processor offline. The power consumption of bus interconnection network is equivalent to a third of the typical communication processor architecture under a power evaluation model. The bus bandwidth utilization ratio falls to 2.05% from 6.88% for a standard data frame processing by clustering the acceleration engines. The novel coprocessor architecture has provided a useful exploration for low power and low complexity design of communication processors. We propose a second order difference aided CRC check stopping criterion to improve the iteration numbers of turbo decoders. When the transmission environment is bad, the receiving data cannot be decoded correctly even the decoder keeps working all the time. To solve this problem, second order difference aided CRC check stopping criterion is proposed. Based on the second order difference of soft information and/or hard-bit information, such bad scenarios are detected before CRC check stopping conditions are satisfied and the decoders stop iteration. Simulation results show that compared with the conventional CRC stopping criterion, the average iterative number of turbo decoder is reduced by about 20% in the case of poor channel conditions. We design a high performance multi-standard viterbi decoder. The throughput for the present multi-standard viterbi decoder is relatively low, we propose a high performance decoder which supports polynomial reconfiguration, constraint length of 5~9, and code rate of 1/2, 1/3, 1/4. Moreover, both tail-biting and zero trellis terminating modes are supported. Simulation results show that the maximum data throughput is 1.15Gbps under a clock frequency of 600MHz. The data processing capacity of commercial viterbi decoder VCP2 is 9.5Mbps@40bit, 333MHz, and the proposed decoder is 32.173Mbps@40bit,333MHz. The data throughput is improved about 3.3 times and it can meet the increasing demand of data processing. To reduce the decoding latency of polar decoder under successive cancellation list (SCL) decoding, an optimized path expansion method is proposed and it is proved that the optimization method has no degradation in decoding performance. In addition, a novel tree-pruning scheme is proposed based on the confidence interval. Studies show that the optimized path expansion method has a number of split paths which are up to 49% lower than the conventional SCL algorithm for each bit estimation and the average searching path number with pruned-SCL falls almost 60% at moderate SNR region and 80% at high SNR region.
关键词	协处理器架构硬件加速引擎停止迭代多标准viterbi译码器 Polar译码器
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/15518
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	赵旭莹. 面向通用通信基带处理器的专用协处理器研究与设计[D]. 北京. 中国科学院大学,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
面向通用通信基带处理器的专用协处理器研究（4431KB）	学位论文		限制开放	CC BY-NC-SA