Modem dynamically scheduled super scalar processors achieve high performance by aggressively exploiting available instruction-level parallelism (ILP) from applications. When they keep increasing the instruction window size and the issue width, the demand of larger physical register file is also on the increase. As a result, the increasing physical register file access time has become one of the critical delays and can easily represent a performance bottleneck. However, after analyzing and making statistics on the usage of the physical registers in current high-performance processors, we found that there is a heavy wastage existing in the current physical register management. After discussions of possible solutions, we proposed a novel dynamic register renaming scheme implemented through a two-level hierarchical register file organization, named LAER (Late Allocation and Early Release) algorithm. In LAER algorithm, physical register allocations are delayed until the instructions are ready to be executed, and the physical registers in the first level are released once they become non-active. The register pressure and hence the access time is therefore reduced by shortening the lifetime of physical registers. We modeled the processors adopting the conventional register renaming scheme and LAER algorithm, and evaluating their performances with Spec95 benchmarks. We show that LAER algorithm can reduce the register pressure by 46~/'o and 60% for integer and FP programs respectively, with minimal hardware overhead, which means the same amount of ILP exploited with smaller physical register file, thus shorter register file access time and higher clock speed, or the same size of physical register file to support larger instruction window, and hence higher performance. In the end, based on the existing shortcomings in the algorithm, and new observations and statistics of program behavior, several suggestions for further improvement of LAER algorithm is proposed.
修改评论