CASIA OpenIR  > 毕业生  > 博士学位论文
面向 5G 的通用通信处理器关键技术研究与应用
李桓
学位类型工学博士
导师王东琳
2018-05-24
学位授予单位中国科学院大学
学位授予地点北京
关键词5g移动通信 通用通信处理器 指令集 微体系结构 代数指令开发方法
其他摘要5G 移动通信系统是面向 2020 年移动通信需求提出的新一代移动通信系统。
与现有的移动通信系统相比,
5G 移动通信系统在无线覆盖能力、传输时延、系统
安全和用户体验方面将得到显著提高。同时,
5G 移动通信系统将与其他移动通信
技术密切结合,构成新一代无所不在的移动信息网络,满足未来
10 年移动互联网
流量增加
100 倍的发展需求。根据 IMT2020 白皮书的规定, 5G 系统下行峰值吞
吐率达到
10Gbps,上行峰值吞吐率达则达到 2Gbps。如何对海量数据进行快速高
效地处理,成为未来
5G 移动通信系统所面临的首要问题。
由于通信领域算法具有经典性强、数据量大、复杂度高、并行度高、实时性强
的五个特点,面向移动通信领域的处理器不断朝着并行加速的方向发展。考虑到
功耗与成本等因素,目前商用的通信处理器多采用
DSP 与协处理器结合的架构。
其中,协处理器由多个
ASIC 组合而成。 DSP 完成一般的信号处理,并调度不同
功能的
ASIC 完成算法固定以及数据相关性较强等无法用传统 DSP 高效实现的运
算。由于协处理器是定制化的硬件电路,其灵活性较差,无法同时满足多种协议要
求。此外,目前
5G 标准尚未冻结,相关算法研究尚未成熟,难以设计面向 5G
高性能协处理器。

MaPU1.0 处理器是中国科学院自动化研究所国家专用集成电路设计工程技
术研究中心自主研发的一款通用数字信号处理器。
MaPU1.0 具有自主知识产权的
AppAISArcTM 指令集体系结构,在存储上创新性的提出了“同心圆”架构,并具
有灵活的多粒度并行存储结构。上述特点使得
MaPU1.0 在处理 FFTFIR 等高
密集型算子时具有领先的性能功耗比。然而,
MaPU1.0 作为一款面向超算、通信、
多媒体等多个领域的通用验证性处理器,在通信领域应用时可进一步深度优化,在
发挥上述特点及优势的同时更加适应通信领域信号处理。
面对
5G 系统的海量运算需求,本课题突破 ASIC 实现时的诸多限制,汲取
MaPU1.0 处理器的特点及优势,设计一款面向 5G 的通用通信处理器 UCP,通过
全软件编程的形式实现整个基带链路数据处理。本论文对
UCP 设计和应用的几个
关键技术进行研究。论文的主要工作和创新点归纳如下:

1. 针对 5G 基带链路实现瓶颈进行研究和探索,结合硬件并行实现提出高效
低复杂度的算法,降低
5G 基带链路实现难度。
1)提出了灵活可并行的低复杂度 Massive MIMO 检测方法
针对传统 Massive MIMO 检测算法复杂度高、并行度低的问题,本文提出了
一类灵活可并行的低复杂度
Massive MIMO 检测算法。该算法对 Massive MIMO
检测时复杂的矩阵运算进行简化,根据硬件结构灵活构造具有不同并行度的算子,
从而调动全部硬件资源实现
Massive MIMO 检测过程。该算法将检测复杂度由传
统方法的
O (K3) 降低至 O (K2)。相较于其它简化算法,本文所构造的算法达到近
似性能时,可减少 1/2 的除法运算和部分乘法运算,并且并行度提升一倍。相较于
传统方法,等价乘法次数可降低
2/3 甚至更多。
2)提出了低复杂度的 LDPC 译码方法
针对传统 LDPC 译码算法中乘法、比较次数过多的问题,本文提出了一种低
复杂度的
LDPC 译码算法。该算法对译码时的横向更新过程进行简化,利用传统
算法中的缩放因子和迭代次数构造调整因子,将次小值计算中的乘法、比较操作
转换为移位操作。该算法将传统译码算法中的乘法、比较操作降低一半,仅增加了
与比较等量的移位操作。相较于其它简化算法,本文所构造的算法将乘法操作降
低一半,并将等量加法操作转化为更易于硬件实现的移位操作。性能上较传统方
法损失
0.1dB,较其它简化算法稳定性更强。
3)提出了低时延的 Polar 译码方法
针对传统 Polar 译码算法中排序过程时延过大的问题,本文提出了三种基于
Pairwise 排序的路径筛选算法。全排序算法 FS-PMS 充分利用路径度量间的关系
对不必要的比较过程进行删减,相较于现有用于
Polar 译码的排序算法,具有更少
的比较次数,在硬件并行度低于每级比较次数时可得到更短的时延。半排序算法

HS-PMS 则在此基础上,对最后合并过程进行简化,相较于现有用于 Polar 译码的
排序算法,具有更少的比较级数,在硬件并行度不低于每级比较次数时可得到更短
的时延。多比特算法
M-PMS 则适用于多比特并行译码的排序过程,具有更少的比
较级数,在多比特并行译码时可得到更短的时延。

2. 提出了基于 AppAISArcTM 指令集体系结构的 UCP 微操作指令集
针对 MaPU1.0 指令集未对通信领域专门优化,商用通信处理器过多依赖协处
理器进行硬件加速的问题,本文提出了一套基于
AppAISArcTM 指令集体系结构
的面向
5G 通信微操作指令集。该指令集专门针对通信领域深度优化,不仅包含常
见的基本数据处理指令,更针对通信领域的高密集型算子和不易于
DSP 实现的比
特级
/软比特级算子定制了相应指令,并在不同部件中复用了部分常见指令。该微
操作指令集能够有效的支持高密集型算子的并行加速,创新性的通过软件编程形
式完成商用通信处理器中协处理器的硬件加速处理,极大的提高了实现时的灵活
性。

3. 提出了适用于通信算法零时间重构的 UCP 微体系结构
针对 MaPU1.0 微体系结构未对通信领域专门优化,商用处理器支持灵活实现
的算子类型较少的问题,本文提出了一种适用于通信算法零时间重构的
UCP 微体
系结构。该结构中的运算部件能够与不同通信算法高度适配,部件间的互联结构
既具有较强的灵活性,又能够满足各类运算需求,存储结构遵照“同心圆”思想,
各个层次的存储空间能够有效地适配不同指令与算法。该微体系结构能在与通信
算法紧密集合的同时,进一步发挥
MaPU 系列处理器性能功耗比的优势。
4. 提出了高效的 UCP 代数指令开发方法

针对具有高并行度、强实时性特征的通信领域算法在实现时需充分发挥硬件
性能的问题,本文提出了一套高效的
UCP 代数指令开发方法。该方法能够指导程
序员充分挖掘
UCP 的处理能力,高效完成从通信算法到代数指令的开发过程。该
方法通过最优性能分析获取算法实现的最优性能以及最优条件,从最优条件出发
构建算法并行实现时的
DAG 图,利用关键路径优先法编写微码流水线,再根据整
体数据特征进行流水线优化。该方法可指导程序员提取达到或接近理论最优性能
的代数指令。

5. 构建了基于 UCP 5G 算法代数指令库
本文利用所提的代数指令开发方法,在所设计的 UCP 上构建了 5G 算法代数
指令库,以代数指令的形式实现了符号级与比特级运算中的绝大多数核心算法。其
中,符号级代数指令包括
FFT、复数 FIR 滤波、实数 FIR 滤波、矩阵乘法、 IDFT
比特级代数指令包括
64QAM 调制、 CRC 校验、 LDPC 编码、 LDPC 译码。上述
代数指令的平均微操作指令覆盖率和指令槽利用率均接近
80%,充分说明了所设
计微操作指令集与微体系结构的高效性。在处理性能上,符号级性能较
MaPU1.0
提升近 4 倍,比特级性能均达到 Gbps,可满足 5G 移动通信系统的吞吐率需求。
; The 5th Generation (5G) mobile communication system is a new generation of
mobile communication system proposed for the demand in 2020. Compared with
existing mobile communication systems, 5G systems will be signifcantly improved
in terms of wireless coverage capability, transmission delay, system security, and user
experience. At the same time, the 5G mobile communication system will be closely
integrated with other mobile communication technologies to form a new generation
of ubiquitous mobile information networks which will meet the development needs
of 100 times increase in mobile Internet trafc in the next 10 years. According to
the regulations of the IMT2020 white paper, the peak downlink throughput of 5G
systems will reache 10Gbps, and the peak uplink throughput rate will reache 2Gbps.
How to deal with massive data rapidly and efciently has become the top issue for
future 5G mobile communication systems.
Communication algorithms has fve characteristics: Strong classics, large amount
of data, high complexity, high degree of parallelism, and strong real-time performance. Commercial communication processors continue to accelerate in parallel.
Taking into account the power consumption, cost and other factors, commercial
communication processors commonly use a combination of DSP and coprocessor for
data process. The coprocessor is composed of multiple ASICs. The DSP completes
general signal processing, and schedules ASICs with different functions to perform
algorithm fxing and strong data correlation, etc., which cannot be efciently implemented by DSP. Due to the customized hardware circuitry, co-processors are less
flexible and cannot meet multiple protocol requirements at the same time. In addition, the current 5G standard has not yet been frozen, and the relevant algorithm
research is not yet mature. It is difcult to design high-performance coprocessors
for 5G systems now.
The MaPU1.0 processor is a universal digital signal processor independently developed by the National Application Specifc Integrated Circuit Design Engineering
Technology Research Center of the Institute of Automation, Chinese Academy of Sciences. The MaPU1.0 processor has the
AppAISArcTM instruction set architecture
with independent intellectual property rights. It has innovatively proposed a ”concentric” architecture for storage and has a flexible multi-granular parallel storage
architecture. The above features make MaPU1.0 have a leading performance/power
ratio when dealing with high-density operators such as FFT and FIR. However,
MaPU1.0 is a universal verifcation processor for supercomputing, communications,
multimedia and many other felds. It can be further optimized in the feld of communication, and it can be more suitable for applications in the communications feld
while utilizing the above characteristics and advantages.
Faced with the massive computing needs of 5G systems, this topic breaks many
limitations and problems in ASIC implementation, drawing on the characteristics
and advantages of MaPU 1.0 processors, designing a universal communications processor UCP for 5G systems to implement the entire baseband algorithm processing
through software programming. This paper studies several key technologies of UCP
design and application. The main work and innovations of the thesis are summarized
as follows:

1.This article studies the bottleneck problem of 5G baseband link
implementation, combines the hardware parallel implementation to propose algorithms with high efciency and low complexity, and reduces the
difculty of 5G baseband link implementation.

1A flexible and parallel low-complexity Massive MIMO detection
method is proposed

To solve the problem of the high complexity and low degree of parallelism of
traditional Massive MIMO detection algorithms, this paper presents a class of flexible and parallel low-complexity Massive MIMO detection algorithms. The algorithm
simplifes the complex matrix operations in Massive MIMO detection. According
to the hardware structure, it can flexibly construct operators with different degrees
of parallelism, thus fully utilizing all hardware resources to implement the Massive
MIMO detection process. The algorithm reduces the detection complexity from the
traditional method of
O (K3) to O (K2). Compared with other simplifcation algorithms, the algorithm constructed in this paper can reduce the 1/2 division and
part of multiplications, and the parallelism can be doubled. Compared to traditional
methods, the number of equivalent multiplications can be reduced by 2/3 or more.

2A low-complexity LDPC decoding method is proposed
To solve the problem of many multiplication and comparison of traditional
LDPC decoding algorithms, this paper proposes a low-complexity LDPC decoding
algorithm. The algorithm simplifes the horizontal updating process during decoding. The adjustment factor is constructed by using the scaling factors and iteration times in the traditional algorithm, and the multiplication and comparison in
the sub-small value calculation are converted into shift operations. The algorithm
reduces the multiplication and comparison operations in the traditional decoding
algorithm by half, and only adds the same amount of shift operations. Compared
with other simplifed algorithms, the algorithm constructed in this paper reduces

the multiplication operation by half and converts the equal addition operation into
a shift operation that is more easily implemented by hardware. Loss of 0.1dB over
traditional methods is more stable than other simplifed algorithms.

3A low-delay Polar decoding method is proposed
In order to solve the problem of large time-delay in the sorting process in the
traditional Polar decoding algorithm, this paper proposes three sorting algorithms
based on Pairwise sorting. The FS-PMS makes full use of the relationship between
path metrics to reduce the unnecessary comparison process. Compared with the
existing sorting algorithm for Polar decoding, it has fewer comparison times. The
HS-PMS simplifes the fnal merging process on the Pairwise sorting. Compared
with the existing sorting algorithm for Polar decoding, it has fewer comparison
stages. The M-PMS is suitable for the multi-bit parallel decoding, and it has fewer
comparison stages than other algorithms. M-PMS can obtain shorter delays in
multi-bit parallel decoding.

2. A UCP micro-operation instruction set based on the AppAISArcTM
instruction set architecture is proposed
For the MaPU1.0 instruction set has not been optimized for the communication feld, commercial communication processors rely too much on the coprocessor
for hardware acceleration. This paper proposes a set of instructions for 5G communication micro-operations based on the
AppAISArcTM instruction set architecture.
This instruction set is specifcally designed for deep optimization in the feld of
communications. It not only contains common basic data processing instructions,
but also has tailor-made instructions for highly intensive operators in the communications feld and bit/soft bit-level operators that are not easily implemented by
DSP. Several common instructions have been added to different components. The
micro-operation instruction set can effectively support the parallel acceleration of
high-intensive operators, and innovatively complete the hardware acceleration part
of the co-processor in the commercial communication processor through software
programming, which greatly improves the flexibility in implementation.

3. A UCP microarchitecture suitable for zero-time reconstruction
of communication algorithms is proposed

The MaPU1.0 microarchitecture is not specifcally optimized for the communication feld. The commercial processor supports the problem of less flexible operator
types. This paper proposes a UCP microarchitecture for zero-time reconstruction
of communication algorithms. The computing components in the structure can be
highly adaptive to different communication algorithms. The interconnection structure between components not only has strong application flexibility, but also can
satisfy various types of computing requirements. The storage structure follows the
”concentric circles” idea and the storage space at each level. The micro-architecture
can further integrate the communication algorithms and further give full play to the
advantages of the MaPU series processor performance-power ratio.

4. An efcient UCP algebraic instruction development method is
proposed

For the communication domain algorithms with high parallelism and strong
real-time characteristics, the hardware performance needs to be fully exploited.
This paper presents a set of efcient UCP algebraic instruction development methods. This method can instruct the programmer to fully exploit UCP’s processing
capabilities and effectively complete the development process from communication
algorithms to algebraic instructions. This method obtains the optimal performance
and the optimal conditions by using the optimal performance analysis algorithm.
The DAG diagram is implemented based on the optimal conditions. The key path
priority method is used to write the microcode pipeline, and then the pipeline is optimized according to the overall data characteristics. Using this method can achieve
or approach the theoretical optimal performance.

5. Building UCP-based 5G Algebraic Instruction Library
In this paper, using the proposed algebraic instruction development method,
the 5G algorithm algebraic instruction library is built on the designed UCP, and
most of the core algorithms in symbol-level and bit-level operations are implemented
in the form of algebraic instructions. Among them, symbol level algebraic instructions include FFT, complex FIR fltering, real FIR fltering, matrix multiplication,
IDFT, and bit-level algebraic instructions include 64QAM modulation, CRC check,
LDPC coding, and LDPC decoding. The average micro-manipulation instruction
coverage and instruction slot utilization of the above algebraic instructions reached
close to 80%, which fully demonstrated the efciency of the designed instruction
set and architecture. In terms of processing performance, symbol-level processing
performance is nearly 4 times higher than that of MaPU 1.0 processor, and bit-level
processing performance is up to Gbps, which can meet the throughput requirements

of 5G systems.

文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/21048
专题毕业生_博士学位论文
作者单位中国科学院大学
推荐引用方式
GB/T 7714
李桓. 面向 5G 的通用通信处理器关键技术研究与应用[D]. 北京. 中国科学院大学,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
毕业论文电子版-李桓.pdf(18218KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[李桓]的文章
百度学术
百度学术中相似的文章
[李桓]的文章
必应学术
必应学术中相似的文章
[李桓]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。