
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

Counting Algorithms for Recognizable and Algebraic Series
Bao Trung CHU Kenji HASHIMOTO Hiroyuki SEKI
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E101D
No.6
pp.14791490 Publication Date: 2018/06/01 Publicized: 2018/03/16 Online ISSN: 17451361
DOI: 10.1587/transinf.2017FOP0003 Type of Manuscript: Special Section PAPER (Special Section on Formal Approaches) Category: Formal Approaches Keyword: string counting, recognizable series, algebraic series, contextfree grammar,
Full Text: PDF(435.1KB)>>
Summary:
Formal series are a natural extension of formal languages by associating each word with a value called a coefficient or a weight. Among them, recognizable series and algebraic series can be regarded as extensions of regular languages and contextfree languages, respectively. The coefficient of a word w can represent quantities such as the cost taken by an operation on w, the probability that w is emitted. One of the possible applications of formal series is the string counting in quantitative analysis of software. In this paper, we define the counting problems for formal series and propose algorithms for the problems. The membership problem for an automaton or a grammar corresponds to the problem of computing the coefficient of a given word in a given series. Accordingly, we define the counting problem for formal series in the following two ways. For a formal series S and a natural number d, we define CC(S,d) to be the sum of the coefficients of all the words of length d in S and SC(S,d) to be the number of words of length d that have nonzero coefficients in S. We show that for a given recognizable series S and a natural number d, CC(S,d) can be computed in O(η log d) time where η is an upperbound of time needed for a single statetransition matrix operation, and if the statetransition matrices of S are commutative for multiplication, SC(S,d) can be computed in polynomial time of d. We extend the notions to tree series and discuss how to compute them efficiently. Also, we propose an algorithm that computes CC(S,d) in square time of d for an algebraic series S. We show the CPU time of the proposed algorithm for computing CC(S,d) for some contextfree grammars as S, one of which represents the syntax of C language. To examine the applicability of the proposed algorithms to string counting for the vulnerability analysis, we also present results on string counting for Kaluza Benchmark.

