Counting Algorithms for Recognizable and Algebraic Series

Bao Trung CHU  Kenji HASHIMOTO  Hiroyuki SEKI  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.6   pp.1479-1490
Publication Date: 2018/06/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017FOP0003
Type of Manuscript: Special Section PAPER (Special Section on Formal Approaches)
Category: Formal Approaches
Keyword: 
string counting,  recognizable series,  algebraic series,  context-free grammar,  

Full Text: PDF(435.1KB)
>>Buy this Article


Summary: 
Formal series are a natural extension of formal languages by associating each word with a value called a coefficient or a weight. Among them, recognizable series and algebraic series can be regarded as extensions of regular languages and context-free languages, respectively. The coefficient of a word w can represent quantities such as the cost taken by an operation on w, the probability that w is emitted. One of the possible applications of formal series is the string counting in quantitative analysis of software. In this paper, we define the counting problems for formal series and propose algorithms for the problems. The membership problem for an automaton or a grammar corresponds to the problem of computing the coefficient of a given word in a given series. Accordingly, we define the counting problem for formal series in the following two ways. For a formal series S and a natural number d, we define CC(S,d) to be the sum of the coefficients of all the words of length d in S and SC(S,d) to be the number of words of length d that have non-zero coefficients in S. We show that for a given recognizable series S and a natural number d, CC(S,d) can be computed in O(η log d) time where η is an upper-bound of time needed for a single state-transition matrix operation, and if the state-transition matrices of S are commutative for multiplication, SC(S,d) can be computed in polynomial time of d. We extend the notions to tree series and discuss how to compute them efficiently. Also, we propose an algorithm that computes CC(S,d) in square time of d for an algebraic series S. We show the CPU time of the proposed algorithm for computing CC(S,d) for some context-free grammars as S, one of which represents the syntax of C language. To examine the applicability of the proposed algorithms to string counting for the vulnerability analysis, we also present results on string counting for Kaluza Benchmark.