Improving Per-Node Computing Efficiency by an Adaptive Lock-Free Scheduling Model

Zhishuo ZHENG  Deyu QI  Naqin ZHOU  Xinyang WANG  Mincong YU  

IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.10   pp.2423-2435
Publication Date: 2018/10/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018EDP7038
Type of Manuscript: PAPER
Category: Fundamentals of Information Systems
job scheduling,  adaptive lock-free scheduling,  optimistic concurrency control,  high performance computing,  many-core,  

Full Text: PDF(3.8MB)
>>Buy this Article

Job scheduling on many-core computers with tens or even hundreds of processing cores is one of the key technologies in High Performance Computing (HPC) systems. Despite many scheduling algorithms have been proposed, scheduling remains a challenge for executing highly effective jobs that are assigned in a single computing node with diverse scheduling objectives. On the other hand, the increasing scale and the need for rapid response to changing requirements are hard to meet with existing scheduling models in an HPC node. To address these issues, we propose a novel adaptive scheduling model that is applied to a single node with a many-core processor; this model solves the problems of scheduling efficiency and scalability through an adaptive optimistic control mechanism. This mechanism exposes information such that all the cores are provided with jobs and the tools necessary to take advantage of that information and thus compete for resources in an uncoordinated manner. At the same time, the mechanism is equipped with adaptive control, allowing it to adjust the number of running tools dynamically when frequent conflict happens. We justify this scheduling model and present the simulation results for synthetic and real-world HPC workloads, in which we compare our proposed model with two widely used scheduling models, i.e. multi-path monolithic and two-level scheduling. The proposed approach outperforms the other models in scheduling efficiency and scalability. Our results demonstrate that the adaptive optimistic control affords significant improvements for HPC workloads in the parallelism of the node-level scheduling model and performance.