Bridging between Soft and Hard Thresholding by Scaling

Katsuyuki HAGIWARA

IEICE TRANSACTIONS on Information and Systems   Vol.E105-D    No.9    pp.1529-1536
Publication Date: 2022/09/01
Publicized: 2022/06/09
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2021EDP7223
Type of Manuscript: PAPER
Category: Artificial Intelligence, Data Mining
sparse modeling,  bridge thresholding method,  soft thresholding,  hard thresholding,  SURE,  

Full Text: PDF(347.5KB)>>
Buy this Article

This study considered an extension of a sparse regularization method with scaling, especially in thresholding methods that are simple and typical examples of sparse modeling. In this study, in the setting of a non-parametric orthogonal regression problem, we developed and analyzed a thresholding method in which soft thresholding estimators are independently expanded by empirical scaling values. The scaling values have a common hyper-parameter that is an order of expansion of an ideal scaling value to achieve hard thresholding. We simply refer to this estimator as a scaled soft thresholding estimator. The scaled soft thresholding method is a bridge method between soft and hard thresholding methods. This new estimator is indeed consistent with an adaptive LASSO estimator in the orthogonal case; i.e., it is thus an another derivation of an adaptive LASSO estimator. It is a general method that includes soft thresholding and non-negative garrote as special cases. We subsequently derived the degree of freedom of the scaled soft thresholding in calculating the Stein's unbiased risk estimate. We found that it is decomposed into the degree of freedom of soft thresholding and the remainder term connecting to the hard thresholding. As the degree of freedom reflects the degree of over-fitting, this implies that the scaled soft thresholding has an another source of over-fitting in addition to the number of un-removed components. The theoretical result was verified by a simple numerical example. In this process, we also focused on the non-monotonicity in the above remainder term of the degree of freedom and found that, in a sparse and large sample setting, it is mainly caused by useless components that are not related to the target function.

open access publishing via