New Algorithms and Software for Significance Controlled Variable Selection

  • Adriano Zambom Department of Mathematics, California State University Northridge
  • Jongwook Kim Department of Statistics, Indiana University Bloomington
Keywords: multiple testing, forward selection, backward elimination, stepwise selection, p-value correction

Abstract

Stepwise regression algorithms have been widely used for a variety of applications and continue to be a fundamental tool in variable selection. Most functions available in statistical software packages deliver models that may contain insignificant predictors because of the criterion of the optimization at each step. Here we introduce an R package that provides the user with several measures of the prospective model at each step of the algorithm. These prospective models are checked with multiple testing p-value corrections such as Bonferroni and False Discovery Rate and hence the algorithm's final model includes only predictors that have their significance controlled by  the choice of correction type and alpha level. Moreover, the steps forward or backward can have an entry or drop criterion that is a combination of the p-values of prospective models. We illustrate the functionality of the package with examples and simulations.

References

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, 1995.

Yoav Benjamini and Daniel Yekutieli. The control of the false discovery rate in multiple testing under dependency.

Annals of statistics, pages 1165–1188, 2001.

N. Draper and H. Smith. Applied regression analysis. John Wiley & Sons, New York, 1966.

M. Efroymson. Stepwise regression: a backward and forward look. Eastern Regional Meetings of the Institute of

Mathematical Statistics, 1966.

Y. Hochberg. A sharper bonferroni procedure for multiple tests of significance. Biometrika, 75:800–803, 1988.

Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2):65–70,

ISSN 03036898, 14679469.

G. Hommel. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75(2):

–386, 06 1988. ISSN 0006-3444.

Cho-Ying Huang, Hsin-Lin Wei, Jiann-Yeou Rau, and Jyun-Ping Jhan. Use of principal components of uav-acquired

narrow-band multispectral imagery to map the diverse low stature vegetation fapar. GIScience & Remote Sensing, 56

(4):605–623, 2019.

A. B. Imran, K. Khan, N. Ali, N. Ahmad, A. Ali, and K. Shah. Narrow band based and broadband derived vegetation indices using sentinel-2 imagery to estimate vegetation biomass. Global Journal of Environmental Science and Management, 6:97–108, 2020.

Josely Correa Koury, Maria Almeida Ribeiro, Fabia Albernaz Massarani, Filomena Vieira, and Elisabetta Marini. Fatfree mass in adolescent athletes: Accuracy of bioimpedance equations and identification of new predictive equations. Nutrition, 60:59 – 65, 2019. ISSN 0899-9007.

Brett Lantz. Machine Learning with R. Packt Publishing, Birmingham, Mumbai, 2013.

James W. Longley. An appraisal of least-squares programs from the point of view of the user. Journal of the American Statistical Association, 62: 819 – 841, 1967.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing,

Vienna, Austria, 2021. URL https://www.R-project.org/.

S. Sarvepalli, C.A. Burke, M. Monachese, R. Lopez, B.H. Leach, L. Laguardia, M. O’Malley, M.F. Kalady, and J.M.

Church. Web-based model for predicting time to surgery in young patients with familial adenomatous polyposis: An

internally validated study. American Journal of Gastroenterology, 113:1881 – 1890, 2018.

S. Walter and H. Tiemeier. Variable selection: current practice in epidemiological studies. European Journal of

Epidemiololy, 24:733–736, 2009.

Adriano Zambom and Jongwook Kim. Consistent significance controlled variable selection in high-dimensional

regression. STAT, 7, 2018.

Published
2022-05-29
How to Cite
Zambom, A., & Kim, J. (2022). New Algorithms and Software for Significance Controlled Variable Selection. Statistics, Optimization & Information Computing, 10(3), 949-967. https://doi.org/10.19139/soic-2310-5070-1520
Section
Research Articles