A Parallel BBD Matrix Solution for MIMD Parallel Circuit Simulation

Tetsuro KAGE  Junichi NIITSUMA  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E78-A   No.1   pp.88-93
Publication Date: 1995/01/25
Online ISSN: 
DOI: 
Print ISSN: 0916-8508
Type of Manuscript: PAPER
Category: Computer Aided Design (CAD)
Keyword: 
parallel circuit simulation,  MIMD parallel computer,  circuit partitioning,  bordered-block-diagonal (BBD) matrix,  LU-decomposition,  

Full Text: PDF(523.2KB)>>
Buy this Article




Summary: 
We developed a parallel bordered-block-diagonal (BBD) matrix solution for parallel circuit simulation. In parallel circuit sumulation on a MIMD parallel computer, a circuit is partitioned into as many subcircuits as the processors of a parallel computer. Circuit partition produce a BBD matrix. In parallel BBD matrix solution, diagonal blocks are easily solved separately in each processor. It is difficult, however, to solve the interconnection (IC) submatrix of a BBD matrix effectively in parallel. To make matters worse, the more a circuit is partitioned into subcircuits for highly parallel circuit simulation, the larger the size of an IC submatrix becomes. From an examination, we found that an IC submatrix is more dense (about 30% of all entries are non-zeros) than a normal circuit matrix, and the non-zeros per row in an IC submatrix are almost constant with the number of subcircuits. To attain high-speed circuit simulation, we devised a data structure for BBD matrix processing and an approach to parallel BBD matrix solution. Our approach solves the IC submatrix in a BBD matrix as well as the diagonal blocks in parallel using all processors. In this approach, we allocate an IC submatrix in block-wise order rather than in dot-wise order onto all processors. Thus, we balance the processor perfomance with the communication capacity of a parallel computer system. When we changed the block size of IC submatrix allocation from dot-wise order to 88 block-wise order, the 88 block-wise order allocation almost halved the matrix solution time. The parallel simulation of a sample circuit with 3277 transistors was 16.6 times faster than a single processor when we used 49 processors.