IEICE Trans - FXA: Executing Instructions in Front-End for Energy Efficiency

FXA: Executing Instructions in Front-End for Energy Efficiency

Ryota SHIOYA
Ryo TAKAMI
Masahiro GOSHIMA
Hideki ANDO

Publication
IEICE TRANSACTIONS on Information and Systems Vol.E99-D No.4 pp.1092-1107
Publication Date: 2016/04/01
Publicized: 2016/01/06
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2015EDP7316
Type of Manuscript: PAPER
Category: Computer System
Keyword:
superscalar processor, hybrid in-order/out-of-order core, energy efficiency,

Full Text: PDF(1.3MB)>>

Summary:
Out-of-order superscalar processors have high performance but consume a large amount of energy for dynamic instruction scheduling. We propose a front-end execution architecture (FXA) for improving the energy efficiency of out-of-order superscalar processors. FXA has two execution units: an out-of-order execution unit (OXU) and an in-order execution unit (IXU). The OXU is the execution core of a common out-of-order superscalar processor. In contrast, the IXU consists only of functional units and a bypass network only. The IXU is placed at the processor front end and executes instructions in order. The IXU functions as a filter for the OXU. Fetched instructions are first fed to the IXU, and the instructions are executed in order if they are ready to execute. The instructions executed in the IXU are removed from the instruction pipeline and are not executed in the OXU. The IXU does not include dynamic scheduling logic, and thus its energy consumption is low. Evaluation results show that FXA can execute more than 50% of the instructions by using IXU, thereby making it possible to shrink the energy-consuming OXU without incurring performance degradation. As a result, FXA achieves both high performance and low energy consumption. We evaluated FXA and compared it with conventional out-of-order/in-order superscalar processors after ARM big.LITTLE architecture. The results show that FXA achieves performance improvements of 7.4% on geometric mean in SPECCPU INT 2006 benchmark suite relative to a conventional superscalar processor (big), while reducing the energy consumption by 17% in the entire processor. The performance/energy ratio (the inverse of the energy-delay product) of FXA is 25% higher than that of a conventional superscalar processor (big) and 27% higher than that of a conventional in-order superscalar processor (LITTLE).

open access publishing via