微处理器时钟网络设计的关键技术研究
刘檬
2018
页数155
学位类型博士
中文摘要

时钟网络设计是微处理器的物理设计相关技术环节中最关键的研究问题之一,这是因为时钟网络通常直接决定了片上系统(System-On-a-Chip, SOC)的性能,包括频率和功耗指标。随着集成电路半导体工艺的不断发展,单一拓扑结构且低设计精度的时钟网络结构已经难以作为完整的片上系统时钟问题的解决方案,定制化的时钟网络自动化综合工具是学术界以及工业界都在努力探究的设计手段。本文针对该问题展开了系统性的研究,提出了多种时钟网络设计与优化方法,覆盖了绝大多数集成电路芯片物理设计场景及设计指标要求。本文的主要工作和贡献点包括如下几个方面:

        1. 提出了一种可自动综合且可多扇出配置的均衡型时钟树的设计优化方法,旨在解决传统的均衡树需要手工全定制设计的现状。该设计解决了多扇出均衡型时钟树的拓扑规划问题,可充分利用时钟缓冲器的多驱动能力配置扇出选择,从而减小时钟树的级数结果;针对多扇出时钟树的设计要求,基于贪心算法和平面划分策略解决了节点簇的匹配聚类问题;针对多节点簇的父节点生成问题建立了多曼哈顿区域合成方法,从而建立了自底向上的均衡树拓扑节点组成结构;基于动态规划思想设计了低功耗缓冲器插入策略,该缓冲器插入策略充分考虑了版图中标准单元布局的诸多情况,并考虑了版图复杂障碍单元存在条件下的缓冲器的布线规划问题。通过基准测试电路实验,所得到结果相比较相关文献,时钟偏差(skew)平均减小了17.2%,同时时钟功耗开销平均减少了24.5%。此外,我们还进行了版图实验,验证了该类型时钟树的应用价值,其既可以作为顶层驱动级时钟树设计方案,同时也可以作为时序要求苛刻的局部电路的时钟网络解决方案。

        2. 提出了一种基于权重k均值且多校准机制的多源时钟树的设计优化方法,旨在解决复杂的微处理器时钟网络分布问题。该多源时钟树具备驱动时钟树、中间层tap点和底层子时钟树的多层次结构。其中驱动型时钟树设计可复用已提出的均衡型时钟树设计。通过解析初始电路布局条件下的静态时序信息,构建寄存器群与群之间的划分权值关系,并建立了基于权重k均值的tap点位置生成算法,从而引导了寄存器的聚簇生成,为子时钟树的综合过程提供了拓扑结构基础。在子时钟树综合过程中,考虑了边界时钟偏差约束时钟树生成的方案,同时集成了有用时钟偏差借用及线长校准的方法。最终通过版图实验论证了整个多源时钟树设计优化方法的有效性,其中分别应用到了MaPU处理器中的APE,PCIe和RapidIO模块,关键结构参数中的负数违例路径时序总和最多可提升240ps,同时功耗开销分别减小了61%,53%和44%。该型时钟树可作为解决大规模集成电路时钟网络设计的总体方案选择,可覆盖复杂版图特征并满足高性能低功耗设计要求。

        3. 提出了一种基于金属层线宽参数调整且集成门控单元聚类控制的网格型时钟网络的设计优化方法,旨在解决高频时钟要求的时钟网络分布问题。主要内容包括时钟网格结构的研究、门控时钟单元集成方法研究和快速与精确仿真方法研究三个部分。针对时钟网格结构的研究,提出了基于线宽参数解析时延模型的网格关键参数确定方法,探究了网格尺寸、时钟线宽参数和时序模型间的关系,建立了完整的可满足目标时钟偏差约束的优化金属线宽选择的时钟网格设计方法,可利用金属层多倍线宽特性快速收敛时钟偏差值,通过和其它文献经典实验方法比较,在时钟偏差达到同一水准的情况下,时钟功耗可以平均减小35.8%。针对门控时钟单元集成方法研究,提出了针对门控单元负载电容优化的聚簇方法以及优化时序约束的门控单元拆分方法,并划分为了标准模式、低功耗模式和增量低功耗模式,论证了低功耗和增量低功耗模式可以有效降低功耗开销。针对网格时钟结构不易分析测量的问题,建立了快速与精确仿真方法,包括基于前级时钟树拓扑的快速仿真方法和网格型时钟结构的设计与优化方案,快速仿真可以优化运行时间且作为一种前期验证方法融合到网格型时钟设计方案中,而精确仿真可以被用作完整布线后的时钟网格设计的结果确认方案。该网格型时钟设计方案复用了所提出的均衡型时钟树设计方法,网格设计同样保持了均衡性,同时门控单元聚类又保证了负载的均衡性和电路层面的低功耗特性,整个方案极大的提高了满足高频时钟信号的要求和抵抗工艺波动的能力。

        4. 提出了一种基于机器学习模型的混合型时钟网络拓扑设计及决策方法,旨在解决时钟网络拓扑不确定性导致的收敛迭代问题。具体来说,针对均衡型时钟树拓扑的发散结构,提出了基于搜索关联成分的拓扑压缩算法,完成了局部节点的聚类合并,从而得到近似均衡时钟树结构,时钟线长理论上可以得到有效减小。基于近似均衡时钟树中出现的U型、H型和Y型拓扑模式,分别建立了拓扑生成图形方法,使得时钟网络单条路径线长均具有一致性。针对不同拓扑模式相互并联造成可能出现的时钟偏差风险,建立了基于机器学习模型预测器的时钟拓扑决策框架,可根据固定拓扑模式提取的局部电路信息作为训练样本,并建立人工神经网络预测器模型,进而可通过预测器进行决策是否可固化聚类特征,经过决策之后可以再进行拓扑的树结构建立,并按照自底向上的合并生成过程完成拓扑结构设计。近似均衡时钟树的实验结果相比较经典均衡时钟树结构,在时钟网络线长结果上的减小比例可达到54.8%,时钟偏差可以按照时序约束优化在合理范围,最终实现了拓扑结构的快速设计和决策,该设计方法对于借助人工智能学习方法提升时钟网络电路的设计能力的相关研究具有重要的指导意义。

英文摘要

Clock network design is one of the most critical research issues in the physical design of microprocessors, since the clock network always directly determine the performance of the SOC (System-On-a-Chip), including the frequency and power consumption. With the development of integrated circuit semiconductor technology, a simple clock network structure with low design precision can hardly handle the entire SOC clock problem. Thus, both the academic and industry aim to develop the customized clock network automation synthesis tools. This paper has carried out systematic research on this problem, and proposed a variety of clock network design and optimization methods, covering most of the physical design scenarios and design indicators. The main work and contributions of this paper include the following aspects:

    1. We have proposed the symmetry clock tree design with multiple fan-out configurations which can be synthesized automatically. In this way, the manual custom efforts in the traditional symmetry tree design can be avoided. Our design can solve the tree architecture planning with multiple fan-out choices which can utilize the driving strength of clock buffers, thereby reducing the number of clock tree levels. According to the design requirements of multiple fan-out clock tree, the matching clustering problem of node clusters is solved based on greedy algorithm and plane partitioning strategy, so it can flexibly solve the different number requirements of node clusters at different clock tree levels. Multiple Manhattan distance regions are built for generating the parent nodes level by level in the bottom-up order. Based on the dynamic programming thought, a low-power buffer insertion strategy is developed. The buffer insertion strategy fully considers the possible placement situations in the layout. Additionally, with length-matching topology design, the routing algorithm is developed considering the complex layout situations. Compared with the typical approach, we can obtain 17.2% decrement on the skew result while using 24.5% less power consumption on the average.

    2. A design optimization method based on weight k-means and multi-calibration mechanism for multi-source clock tree is proposed to solve the complex clock network distribution problem for the microprocessor. The multi-source clock tree has a multi-level structure, including the driven clock tree, the middle layer tap points, and the sub-clock trees. The driven clock tree design can reuse the proposed symmetry clock tree design. By analyzing the static timing information from the initial circuit layout, the weight table for register-relationship can be built to help construct register groups. And the tap point position generation algorithm based on the weight k-means is established, which can guide the cluster generation of registers. In this way, the basis for the sub-clock trees can be provided for the synthesis process. In the process of sub-clock tree synthesis, the boundary-skew clock tree generation is considered, and the method of useful skew borrowing and wire length calibration is integrated. The modules of APE, PCIe and RapidIO in MaPU chip are used for comparison. Compared with the traditional method, the improvement in WNS (worst negative slack of setup) is up to 240ps. Our proposed flow separately reduces the power consumption by 61%, 53%, 44% because of the reduction of buffers amount. This type of clock tree can be used as an overall solution for large-scale integrated circuit clock network design, covering complex layout features and meeting high performance and low power design requirements.

    3. Based on the wire width parameter adjustment and clock gating clusters, a clock mesh design methodology is proposed to solve the high-frequency clock condition. The main contents include the research of clock grid structure, the research of clock gating cell integration method and the research of fast and accurate simulation methods. Aiming at the research of clock grid structure, this paper proposes a method for determining key parameters of grids, then the optimization of wire width parameter is designed with considering the timing delay model. The method explores the relationship between grid size, clock wire width parameters and timing delay model. A complete clock grid design framework for optimizing the metal wire width selection that satisfies the target clock skew constraint is established. The metal layer multi-wire width characteristic can be used to quickly converge the clock skew value. The skew results are kept at the same level as the results of original flow. The power consumption can be reduced up to average 35.8% since the reduction of wire resources. To solve the clock gating problem, we have proposed the clustering method for load balancing. Additionally, a method of splitting gating units for optimizing timing constraints are developed. Then, we can divide clock mesh design into three modes, including the standard mode, the low power mode and the incremental low power mode. We also have demonstrated that low power and incremental low power modes can effectively reduce power consumption overhead. A fast and accurate simulation method is established, including a fast simulation method based on the pre-clock tree topology and an accurate simulation framework for ensuring the final results. The entire methodology greatly improves the ability to meet high-frequency clock signal requirements and resist the on-chip variations.

    4. A hybrid clock network topology design and decision-making method based on machine learning model is proposed to solve the convergence iteration problem caused by clock network topology uncertainty. Specifically, we use the strongly connected components (SCCs) algorithm for clustering nodes to handle the arbitrary number of sink nodes. The clustering of local nodes can be done, and the approximately balanced clock tree structure is obtained. The clock wire length can be effectively reduced. By introducing different routing topologies, we try to optimize the wire length cost considering a skew bound. We have considered H, Y and U topologies in the bottom-level of a clock tree. Additionally, we have applied machine learning framework to evaluate the effects of routing topologies which can help explore the trade-off between the clock skew bound and wire length cost. The reduction of wire length can achieve up to 54.8% compared with baseline's results. The skew values are all constrained by the bound value. This design method has essential guiding significance for the related research on improving the design ability of clock network circuit by utilizing artificial intelligence learning method.

关键词计算机体系结构 集成电路设计 时钟网络
学科领域计算机科学技术
学科门类工学
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/22777
专题国家专用集成电路设计工程技术研究中心
推荐引用方式
GB/T 7714
刘檬. 微处理器时钟网络设计的关键技术研究[D]. 北京. 中国科学院大学,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
论文最终版20181123.pdf(20426KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[刘檬]的文章
百度学术
百度学术中相似的文章
[刘檬]的文章
必应学术
必应学术中相似的文章
[刘檬]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。