TY - JOUR AU - Christina Parpoula AU - Christos Koukouvinos AU - Dimitrios Simos AU - Stella Stylianou PY - 2014/06/01 Y2 - 2024/03/29 TI - Supersaturated plans for variable selection in large databases JF - Statistics, Optimization & Information Computing JA - Stat., optim. inf. comput. VL - 2 IS - 2 SE - Research Articles DO - 10.19139/soic.v2i2.75 UR - http://iapress.org/index.php/soic/article/view/20140607 AB - Over the last decades, the collection and storage of data has become massive with the advance of technology and variable selection has become a fundamental tool to large dimensional statistical modelling problems. In this study we implement data mining techniques, metaheuristics and use experimental designs in databases in order to determine the most relevant variables for classification in regression problems in cases where observations and labels of a large database are available. We propose a database-driven scheme for the encryption of specific fields of a database in order to select an optimal supersaturated design consisting of the variables of a large database which have been found to influence significantly the response outcome. The proposed design selection approach is quite promising, since we are able to retrieve an optimal supersaturated plan using a very small percentage of the available runs, a fact that makes the statistical analysis of a large database computationally feasible and affordable. ER -