Register-Based Process Virtual Machine Acceleration Using Hardware Extension with Hybrid Execution

Surachai THONGKAEW  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E98-A   No.12   pp.2505-2518
Publication Date: 2015/12/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E98.A.2505
Type of Manuscript: Special Section PAPER (Special Section on VLSI Design and CAD Algorithms)
Category: High-Level Synthesis and System-Level Design
Keyword: 
register-base VM acceleration,  dalvik VM acceleration,  HW/SW co-designed VM,  dynamic languages,  

Full Text: PDF(4.1MB)
>>Buy this Article


Summary: 
The Process Virtual Machine (VM) is typical software that runs applications inside operating systems. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware, operating system and allows bytecodes (portable code) to be executed in the same way on any other platforms. The Process VMs are implemented using an interpreter to interpret bytecode instead of direct execution of host machine codes. Thus, the bytecode execution is slower than those of the compiled programming language execution. Several techniques including our previous paper, the “Fetch/Decode Hardware Extension”, have been proposed to speed up the interpretation of Process VMs. In this paper, we propose an additional methodology, the “Hardware Extension with Hybrid Execution” to further enhance the performance of Process VMs interpretation and focus on Register-based model. This new technique provides an additional decoder which can classify bytecodes into either simple or complex instructions. With “Hybrid Execution”, the simple instruction will be directly executed on hardware of native processor. The complex instruction will be emulated by the “extra optimized bytecode software handler” of native processor. In order to eliminate the overheads of retrieving and storing operand on memory, we utilize the physical registers instead of (low address) virtual registers. Moreover, the combination of 3 techniques: Delay scheduling, Mode predictor HW and Branch/goto controller can eliminate all of the switching mode overheads between native mode and bytecode mode. The experimental results show the improvements of execution speed on the Arithmetic instructions, loop & conditional instructions and method invocation & return instructions can be achieved up to 16.9x, 16.1x and 3.1x respectively. The approximate size of the proposed hardware extension is 0.04mm2 (or equivalent to 14.81k gates) and consumes an additional power of only 0.24mW. The stated results are obtained from logic synthesis using the TSMC 90nm technology @ 200MHz.