Formal Verification of a Decision-Tree Ensemble Model and Detection of Its Violation Ranges

Naoto SATO  Hironobu KURUMA  Yuichiroh NAKAGAWA  Hideto OGAWA  

IEICE TRANSACTIONS on Information and Systems   Vol.E103-D   No.2   pp.363-378
Publication Date: 2020/02/01
Publicized: 2019/11/20
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019EDP7120
Type of Manuscript: PAPER
Category: Dependable Computing
machine learning,  formal verification,  decision-tree ensemble model,  

Full Text: PDF(1.4MB)>>
Buy this Article

As one type of machine-learning model, a “decision-tree ensemble model” (DTEM) is represented by a set of decision trees. A DTEM is mainly known to be valid for structured data; however, like other machine-learning models, it is difficult to train so that it returns the correct output value (called “prediction value”) for any input value (called “attribute value”). Accordingly, when a DTEM is used in regard to a system that requires reliability, it is important to comprehensively detect attribute values that lead to malfunctions of a system (failures) during development and take appropriate countermeasures. One conceivable solution is to install an input filter that controls the input to the DTEM and to use separate software to process attribute values that may lead to failures. To develop the input filter, it is necessary to specify the filtering condition for the attribute value that leads to the malfunction of the system. In consideration of that necessity, we propose a method for formally verifying a DTEM and, according to the result of the verification, if an attribute value leading to a failure is found, extracting the range in which such an attribute value exists. The proposed method can comprehensively extract the range in which the attribute value leading to the failure exists; therefore, by creating an input filter based on that range, it is possible to prevent the failure. To demonstrate the feasibility of the proposed method, we performed a case study using a dataset of house prices. Through the case study, we also evaluated its scalability and it is shown that the number and depth of decision trees are important factors that determines the applicability of the proposed method.