Enhancing Job Scheduling on Inter-Rackscale Datacenters with Free-Space Optical Links

Yao HU  Michihiro KOIBUCHI  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.12   pp.2922-2932
Publication Date: 2018/12/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018PAP0010
Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)
Category: Information networks
Keyword: 
rackscale architecture,  datacenter network,  free-space optics,  job scheduling,  

Full Text: PDF(913KB)
>>Buy this Article


Summary: 
Datacenter growth in traffic and scale is driving innovations in constructing tightly-coupled facilities with low-latency communication for different specific applications. A famous custom design is rackscale (RS) computing by gathering key server resource components into different resource pools. Such a resource-pooling implementation requires a new software stack to manage resource discovery, resource allocation and data communication. The reconfiguration of interconnection networks on their components is potentially needed to support the above demand in RS. In this context as an evolution of the original RS architecture the inter-rackscale (IRS) architecture, which disaggregates hardware components into different racks according to their own areas, has been proposed. The heart of IRS is to use a limited number of free-space optics (FSO) channels for wireless connections between different resource racks, via which selected pairs of racks can communicate directly and thus resource-pooling requirements are met without additional software management. In this study we evaluate the influences of FSO links on IRS networks. Evaluation results show that FSO links reduce average communication hop count for user jobs, which is close to the best possible value of 2 hops and thus provides comparable benchmark performance to that of the counterpart RS architecture. In addition, if four FSO terminals per rack are allowed, the CPU/SSD (GPU) interconnection latency is reduced by 25.99% over Fat-tree and by 67.14% over 2-D Torus. We also present the advantage of an FSO-equipped IRS system in average turnaround time of dispatched jobs for given sets of benchmark workloads.