Hybridized Support Vector Machine and Recursive Feature Elimination with Information Complexity
AbstractIn statistical data mining research, datasets often have nonlinearity and at the same time high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. In this paper, a novel wrapper method called SVM-ICOMP-RFE based on a hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the features in the original data space for finding the best subset of features which are discriminating between the groups. Recursive feature elimination (RFE) ranks features based on information complexity (ICOMP) criterion. ICOMP plays an important role not only in choosing an optimal kernel function from a portfolio of many other kernel functions, but also in selecting important subset(s) of features. The potential and the flexibility of our approach are illustrated on two real benchmark data sets, one is ionosphere data which includes radar returns from the ionosphere, and another is aorta data which is used for the early detection of atheroma most commonly resulting heart attack. Also, the proposed method is compared with other RFE based methods using different measures (i.e., weight and gradient) for feature rankings.
M. Aizerman, E. Braverman, and L. Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning, Automation and Remote Control, vol. 25, pp. 821–837, 1964.
H. Akaike, Information theory and an extension of the maximum likelihood principle, in Second international symposium on information theory, edited by B.N. Petrov, and B.F. Csaki, Academiai Kiado, Budapest, pp. 267–281, 1973.
H. Bozdogan, ICOMP: a new model-selection criteria, in Classification and related methods of data analysis, edited by H.H. Bock, North-Holland, Amsterdam, 1988.
H. Bozdogan, The theory and applications of information-theoretic measure of complexity (ICOMP) as a new model selection criterion, InIpublished Report, The Institute of Statistical Mathematics,Tokyo,Japan,and the Department of Mathematics,University of Virginia, Charlottesville, VA, USA, 1988.
H. Bozdogan, On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models, Communications in Statistics Theory and Methods, vol. 19, pp. 221–278, 1990.
H. Bozdogan, Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity, in Multivariate Statistical Modeling, edited by H. Bozdogan, Academic Publishers, Dordrecht, Netherland, pp. 69–113, 1994.
H. Bozdogan, Akaike’s information criterion and recent develpments in information complexity, Journal of Mathematical Psychology, vol. 44, pp. 62–91, 2000.
M. Chen, Estimation of covariance matrices under a quadratic loss function, Technical Report S–46, Department of Mathematics, SUNY at Albany, 1976.
P. Chen, C. Lin, and B. Scholkopf, A tutorial on v-support vector machine, Applied Stochastic Models in Business and Industry, vol. 21, no. 2, pp. 111–136, 2005.
H. Cho, S.H. Baek, E. Youn, M.K. Jeong, and A. Taylor, A two-tage classification procedure for near-infrared spectra based on multi-scale vertical energy wavelet thresholding and SVM-based gradient-recursive feature elimination, Journal of the Operational Research Society, vol. 60, no. 8, pp. 1107–1115, 2009.
H. Frohlich, Feature selection for support vector machines by means of genetic algorithms, M.S. Thesis, University of Tuebingen, 2002.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, vol. 46, no. 1/3, pp. 389–422, 2002.
C.J. Harris, An information theoretic approach to estimation, in Recent Theoretical Developments in Control, edited by M.J. Gregson, Academic Press, London, pp. 563–590, 1978.
J. Howe, and H. Bozdogan, Regularized SVM classification with a new complexity-driven stochastic optimizer, European Journal of Pure and Applied Mathematics, vol. 9, no. 2, pp. 216–230, 2016.
S. Kullback, Information theory and statistics, Dover Publications, New York, 1968.
S. Kullback, and R. Leiber, On information and sufficiency, Annals of Mathematical Statistics, vol. 22, pp. 79–86, 1951.
S. Mika, Kernel fisher discriminants, Ph.D. Dissertation, Technical University of Berlin, 2002.
V. Pareto, Manual of political economy, Kelly, New York, 1909.
J. Pearlman, Nuclear magnetic resonance spectral signatures of liquid crystals in human atheroma as basis for multi-dimensional digital imaging of atherosclerosis, Ph.D. Dissertation, University of Virginia,Charlottesville, VA, 1986.
S. Press, Estimation of a normal covariance matrix, Technical Report P–5436, The Rand Corporation, Santa Monica, CA, 1975.
J. Rissanen, Stochastic complexity and modeling, Annals of Statistics, vol. 14, pp. 1080–1100, 1986.
J. Rissanen, Stochastic complexity, Journal of Royal Statistical Society: Series B, vol. 49, no. 3, pp. 223–239 252–265, 1987.
J. Rissanen, Stochastic complexity in statistical inquiry, World Scientific Publishing Company, New Jersey, 1989.
G. Schwarz, Estimating the dimension of a model, Annals of Statistics, vol. 6, pp. 461–464, 1978.
C.E. Shannon, A mathematical theory of communication, Bell Systems Technology Journal, vol. 27, pp. 279–423, 1948.
V. Sigillito, S. Wing, L. Hutton, and K. Baker, Classification of radar returns from the ionosphere using neural networks, Johns Hopkins APL Technical Digest, vol. 10, pp. 262–266, 1989.
A. Smola, and B. Scholkopf, A tutorial on support vector regression, Statistics and Computing, vol. 14, pp. 199–222, 2004.
C. Thomaz, Maximum entropy covariance estimate for statistical pattern recognition, Ph.D. Dissertation, Imperial College London, 2004.
M.H. Van Emden, An analysis of complexity, Mathematisch Centrum, Amsterdam, 1971.
V. Vapnik, The nature of statistical learning theory, Springer- Verlag, New York, 1995.
S. Watanabe, Pattern recognition: human and mechanical, Wiley, New York, 1985.
E. Youn, Feature selection in support vector machines, M.S. thesis, University of Florida, 2002.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).