
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

A CloudFriendly CommunicationOptimal Implementation for Strassen's Matrix Multiplication Algorithm
Jie ZHOU Feng YU
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E98D
No.11
pp.18961905 Publication Date: 2015/11/01 Publicized: 2015/07/27 Online ISSN: 17451361
DOI: 10.1587/transinf.2015EDP7065 Type of Manuscript: PAPER Category: Fundamentals of Information Systems Keyword: parallel algorithms, communicationoptimal, Strassen's matrix multiplication, cloud computing, MapReduce,
Full Text: PDF(1.1MB)>>
Summary:
Due to its ondemand and payasyougo properties, cloud computing has become an attractive alternative for HPC applications. However, communicationintensive applications with complex communication patterns still cannot be performed efficiently on cloud platforms, which are equipped with MapReduce technologies, such as Hadoop and Spark. In particular, one major obstacle is that MapReduce's simple programming model cannot explicitly manipulate data transfers between compute nodes. Another obstacle is cloud's relatively poor network performance compared with traditional HPC platforms. The traditional Strassen's algorithm of square matrix multiplication has a recursive and complex pattern on the HPC platform. Therefore, it cannot be directly applied to the cloud platform. In this paper, we demonstrate how to make Strassen's algorithm with complex communication patterns “cloudfriendly”. By reorganizing Strassen's algorithm in an iterative pattern, we completely separate its computations and communications, making it fit to MapReduce programming model. By adopting a novel data/task parallel strategy, we solve Strassen's data dependency problems, making it well balanced. This is the first instance of Strassen's algorithm in MapReducestyle systems, which also matches Strassen's communication lower bound. Further experimental results show that it achieves a speedup ranging from 1.03× to 2.50× over the classical Θ(n^{3}) algorithm. We believe the principle can be applied to many other complex scientific applications.

