|Place of Conferral||中国科学院自动化研究所|
|Keyword||多核共享存储 互联协议 互联结构 任务调度 设计探索|
1.针对存储系统中传输并行性与连续性的需求，本文提出了一种面向存储单元的互联协议——AEC协议，使得存储系统的吞吐带宽不受传输协议层面的限制。具体来说，本文提出的AEC协议将写数据与写地址合并到一个传输通道，提出地址信号与数据信号一一对应的交互方式，既保证了传输信号的合理利用，又降低了存储单元协议转换的硬件开销。同时借鉴于AXI协议，本文提出的AEC协议在每个传输通道中引入了互相独立的握手信号，以满足多通道与多端口的并行传输；引入了处理未完成操作继续传输与乱序传输的传输方式，以保证传输的连续性。此外，为了使基于AEC协议的互联总线在设计上可以满足多级互联、多级串联以及多种设计结构的扩展，本文提出的AEC协议对各个通道中ID信号的功能与使用方式进行了合理的定义，并引入了点对点的流式传输方式。硬件设计实验结果表明：基于AEC协议的交叉开关互联总线比AXI交叉开关总线具有更低的硬件开销；在28 nm工艺下，基于AEC交叉开关互联总线的16通道共享存储系统可以达到1.6 Tbps的峰值吞吐带宽。
2.针对分布式存储系统中高度并行的传输需求，本文提出了一种面向多通道存储系统的全互连结构——FMN，保证存储系统的并行性可以合理利用。具体来说，本文提出的FMN采用网状互联结构与分布式的节点布局，保证互联系统在物理设计上易于实现；在相邻节点间引入多条传输通道，保证互联结构的最大对分带宽与高度并行传输。根据多通道的定义，对FMN中的节点进行最优化结构设计，保证节点可以达到最低硬件开销。仿真结果表明：FMN互联结构可以获得与交叉开关结构一致的吞吐带宽。硬件设计实验结果表明：在28 nm工艺下，基于64节点FMN的分布式共享存储系统的峰值吞吐带宽可以达到11.2 Tbps。
With the development of semiconductor technology and integrated circuits, the performance of the processor which is an important branch of the integrated circuit field has been greatly improved. However, with the explosive growth of information processing, people's demand for the performance of the processor is becoming higher and more diverse, and traditional single-core processors face enormous challenges. Many problems such as poor performance between processor and memory, difficulty in designing instruction-level parallelism, and high power consumption constrain the development and application of traditional processors. As a design solution to effectively solve the "frequency wall" and "power wall", multi-core architecture is gradually applied to modern processor design. However, with the increasing resources of on-chip integration, the design scale of the processor's architecture and the design difficulty of the multi-core architecture on the chip are gradually increasing, and the problem of “bandwidth wall” is gradually exposed. As the on-chip memory's architecture is the heart of the current multi-core processor design, its design determines the overall performance of the cores and chip system.
At present, there are two main difficulties in memory system design: First, how to design a high-performance memory system, which can accelerate the data supply and increase the parallel transmission of data. So that it can meet the "data-supply requirements" of the cores and ensure that the cores operate under high peak performance utilization. The second is how to design an energy-efficient memory system and how to use it efficiently. So that the chip system can fully utilize the throughput performance and buffering function of the memory system through reasonable scheduling. The memory system can also meet the data requirements of the cores with the minimum hardware overhead. In view of the above two difficulties in the design of the memory system, and considering the impact of on-chip interconnect system on the bandwidth of memory bank, this paper conducts in-depth research from the perspectives of interconnect system design and shared-memory system design. The proposed interconnect protocol and interconnect architecture can guarantee high-performance memory system design and the proposed memory design exploration method can ensure efficient memory system design. The main research contents and contributions of this paper are summarized as follows:
1.Aiming at the requirement of transmission parallelism and continuity in memory system, this paper proposes an interconnection protocol for memory bank, AEC protocol, so that the throughput bandwidth of the memory system is not limited by the transmission protocol level. To be specific, the AEC protocol proposed in this paper combines the write data and the write address into one transmission channel, and proposes an interaction mode in which the address signal and the data signal correspond one-to-one, which not only ensures the reasonable utilization of the transmission signals, but also reduces the hardware overhead of memory bank's protocol conversion. At the same time, referring to the AXI protocol, the AEC protocol proposed in this paper introduces independent handshake signals in each transmission channel to meet the parallel transmission of multi-channel and multi-port; it introduces the transmission of issuing outstanding transactions to continue transmission and out-of-order transmission to ensure the continuity of the transmission. In addition, in order to make the interconnect based on AEC protocol meet the requirements of multi-level interconnection, multi-level series and multiple design architectures, the AEC protocol proposed in this paper makes reasonable use of the function and usage of ID signals in each channel, and defines and introduces a point-to-point streaming transmission. The hardware design experiment results show that the crossbar-interconnect based on AEC protocol has lower hardware overhead than the AXI-crossbar interconnect; under the 28 nm process, the 16-channel shared-memory system based on AEC-crossbar interconnect can reach the peak throughput of 1.6 Tbps.
2.Aiming at the highly parallel transmission requirements in distributed memory system, this paper proposes a full-interconnect architecture--FMN for multi-channel memory system, which ensures that the parallelism of memory system can be rationally utilized. To be specific, the FMN proposed in this paper adopts mesh architecture and distributed node layout to ensure that the interconnect system is easy to be implemented in physical design; multiple transmission channels are introduced between adjacent nodes to ensure maximum bisection bandwidth and high-parallel transmission of the interconnect architecture. According to the definition of multi-channel, the optimal architecture design of the nodes in the FMN ensures that the node can achieve the minimum hardware overhead. The simulation results show that the FMN interconnect architecture can obtain the throughput consistent with the crossbar architecture. The hardware design experiment results show that the peak throughput of the distributed shared-memory system based on a 64-node FMN can reach 11.2 Tbps under the 28 nm process.
3.For efficient memory system design, this paper proposes a memory system design exploration method based on task schedule, which gives design guidance for the number, capacity and bandwidth of the memory bank in memory system design. To be specific, for multi-core shared-memory architecture and SPM, this paper proposes a homogeneous multi-core task scheduling algorithm-HoEFT algorithm; the memory system modeling and data handling sub-algorithm is introduced in the algorithm to simulate multi-core data transmission; task partition and data partition based on the capacity of the memory to ensure maximum local computing efficiency; based on the list scheduling, task schedule is performed based on the rank of each task and the earliest finish time; the experimental results show that under the DGEMM target application, the HoEFT algorithm can achieve higher utilization cores than manual scheduling and Cache memory. In order to give reasonable guidance to the memory system design, this paper proposes a design exploration method based on task schedule; with DGEMM as the target application, the design of multi-core shared-memory system is explored, the data transmission mode and task execution mode are summarized, and the constraint relationship between memory system design parameters and cores design parameters is derived, thereby the design guidance of the whole system is given.
|孟洪宇. 片内多核共享存储体系结构研究与设计[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2019.|
|Files in This Item:|
|孟洪宇201618014629105.p（6898KB）||学位论文||暂不开放||CC BY-NC-SA||Application Full Text|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.