A First-Order Optimization Algorithm for Statistical Learning with Hierarchical Sparsity Structure

Zhang, Dewei; Liu, Yin; Tajbakhsh, Sam Davanloo

A First-Order Optimization Algorithm for Statistical Learning with Hierarchical Sparsity Structure

Dewei Zhang (), Yin Liu () and Sam Davanloo Tajbakhsh ()
Additional contact information
Dewei Zhang: The Ohio State University, Columbus, Ohio 43210
Yin Liu: The Ohio State University, Columbus, Ohio 43210
Sam Davanloo Tajbakhsh: The Ohio State University, Columbus, Ohio 43210

INFORMS Journal on Computing, 2022, vol. 34, issue 2, 1126-1140

Abstract: In many statistical learning problems, it is desired that the optimal solution conform to an a priori known sparsity structure represented by a directed acyclic graph. Inducing such structures by means of convex regularizers requires nonsmooth penalty functions that exploit group overlapping. Our study focuses on evaluating the proximal operator of the latent overlapping group lasso developed by Jacob et al. in 2009 . We implemented an alternating direction method of multiplier with a sharing scheme to solve large-scale instances of the underlying optimization problem efficiently. In the absence of strong convexity, global linear convergence of the algorithm is established using the error bound theory. More specifically, the paper contributes to establishing primal and dual error bounds when the nonsmooth component in the objective function does not have a polyhedral epigraph. We also investigate the effect of the graph structure on the speed of convergence of the algorithm. Detailed numerical simulation studies over different graph structures supporting the proposed algorithm and two applications in learning are provided. Summary of Contribution: The paper proposes a computationally efficient optimization algorithm to evaluate the proximal operator of a nonsmooth hierarchical sparsity-inducing regularizer and establishes its convergence properties. The computationally intensive subproblem of the proposed algorithm can be fully parallelized, which allows solving large-scale instances of the underlying problem. Comprehensive numerical simulation studies benchmarking the proposed algorithm against five other methods on the speed of convergence to optimality are provided. Furthermore, performance of the algorithm is demonstrated on two statistical learning applications related to topic modeling and breast cancer classification. The code along with the simulation studies and benchmarks are available on the corresponding author’s GitHub website for evaluation and future use.

Keywords: proximal methods; error bound theory; alternating direction method of multipliers; hierarchical sparsity structure; latent overlapping group lasso (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/ijoc.2021.1069 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orijoc:v:34:y:2022:i:2:p:1126-1140

Access Statistics for this article

More articles in INFORMS Journal on Computing from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().