Major Update for HUGE and SAM

Background

In high-dimensional sparse learning problems, some recent studies have demonstrated the computational superiority of second-order methods over the first-order methods.

In this project, we propose to apply efficient second-order proximal Newton solvers to the high-dimensional sparse learning domain. Specifically, we will focus on high-dimensional undirected graphical model estimation and nonparametric regression models with sparsity induced regularization.

Currently, the undirected graphical model packages are also lack of inference module. An inference module contains the model significance (p-value) and confidence interval. We propose to develop an efficient unbiased inference module for our solver. In particular, we consider the problems of testing the presence of a single edge and constructing a uniform confidence subgraph.

For nonparametric regression models, the current solver is based on the proximal gradient method, which is not efficient. Our preliminary simulation shows that the proximal newton method with blockwise coordinate descent can accelerate the algorithm compared with the first order counterparts. Furthermore, we aim to incorporate the current with spasity inducing regularization which enjoys great statistical properties.

The new solvers and new modules will be integrated into two existing popular packages: HUGE: High-dimensional Undirected Graph Estimation and SAM: Sparse Additive Modeling

Related work

For sparse graphical model estimation, first-order methods are usually computationally inefficient. For example, glasso package [1] is based on blockwise coordinate descent, which is not scalable for high dimensional and large size data. The state-of-the-art package QUIC [2] uses a second-order algorithm, which performs empirically better. However, it did not exploit the sparse structure and thus is not scalable for high dimensional data. Inspired by the package PICASSO [3] for estimating sparse generalized linear models, we aim to find a more efficient active set based Newton-type algorithms for solving this problem. In terms of inference on the high-dimensional graphical model, [4] propose a unified framework to quantify local and global inferential uncertainty for high dimensional nonparanormal graphical models. We aim to develop unbiased inference module by incorporating projection operator with the framework in [4].

Similarly, for nonparametric regression and classification models, current work, SAM [5] is only based on the proximal gradient method. We expect a Newton-type algorithm can also accelerate the computation. Furthermore, we aim to provide functional penalties to the current package [6] to enrich the functionality.

Details of your coding project

The student developer will first do code refactoring. The code of the core part should be implemented by C/C++ and under the support of Eigen library. At the same time, extensive benchmarking code should be written to ensure the correctness throughout the development. Next student will implement an efficient proximal Newton solver with novel active set updating strategies in R for a class of sparse learning problems, including sparse undirected graphical model estimation (HUGE), and nonparametric regression and classification model with functional penalties (SAM). We aim to improve the performance of existing solvers in terms of CPU time and estimation accuracy on real and synthetic datasets. Then an unbiased statistical inference module will be implemented for HUGE. Detailed documentation describing the algorithm and its theoretical properties will be provided in the vignettes.

Expected impact

HUGE(>161k downloads) and SAM (>13k downloads) are popular R package for sparse learning. With the delivered solvers and function module, these two packages that can solve a wide class of sparse learning problems under the same algorithmic framework and achieve state-of-the-art performance on each individual problem in terms of CPU time and estimation error. Updated HUGE will also be the first R package that provides inference module for the sparse graphical model.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

Primary mentor Xingguo Li, fifth year Ph.D. candidate, in the Department of Electrical and Computer Engineering, University of Minnesota.
Email: [email protected]
Website: http://people.ece.umn.edu/~lixx1661/
GSOC experience: 2014-2016 as a student; 2017 as a mentor
Secondary mentor Jason Ge, fourth year Ph.D. candidate, Department of Operations Research and Financial Engineering at Princeton University, Email: [email protected]
Github: https://github.com/jasonge27 GSOC experience: 2017 as a student developer
Tertiary mentor
Tuo Zhao, Assistant Professor, in School of Industrial and Systems Engineering at Georgia Institute of Technology.
Email: [email protected]
Website: www2.isye.gatech.edu/~tzhao80/
GSOC experience: 2011-2013 as a student; 2014-2017 as a mentor

Xingguo Li and Tuo Zhao are the original developer of SAM and HUGE

Tests

Students, please do one or more of the following tests before contacting the mentors above.

Easy: Download HUGE and SAM. Test them on a synthetic data set respectively.
Medium: For HUGE, test the performance under different sparsity level. For SAM, add pairwise correlation between columns in the dataset and check how the different solvers perform with highly correlated feature columns.
Hard: Since the core code should be implemented in C/C++. Write a simple package implementing Matrix multiplication. The main code should use C/C++.

Solutions of tests

Students, please post a link to your test results here.

Haoming Jiang: Github Link

References

[1]Jerome Friedman, Trevor Hastie and Rob Tibshirani. "glasso: Graphical lasso- estimation of Gaussian graphical models." R package version 1.8 (2014)
[2] Hsieh, Cho-Jui, et al. "BIG & QUIC: Sparse inverse covariance estimation for a million variables." Advances in Neural Information Processing Systems, 2013.
[3] Jason Ge, Xingguo Li, et al. "Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python" R package version 1.20 (2017)
[4] Quanquan Gu, Yuan Cao, Yang Ning, and Han Liu. "Local and Global Inference for High Dimensional Nonparanormal Graphical Models" arXiv:1502.02347, 2015.
[5] Tuo Zhao and Han Liu. "Sparse Additive Machine" International Conference on Artificial Intelligence and Statistics, 2012.
[6] Tuo Zhao, Xingguo Li, et al. "SAM: Sparse Additive Machine" R package version 1.05 (2015)

Home | Table of proposed coding projects

Major Update for HUGE and SAM

Background

Related work

Details of your coding project

Expected impact

Mentors

Tests

Solutions of tests

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally