A Basis Approach to Surface Clustering

Adriano Zanin Zambom; Qing Wang; Ronaldo Dias

doi:10.19139/soic-2310-5070-1486

Adriano Zanin Zambom Department of Mathematics, California State University Northridge, USA
Qing Wang Department of Mathematics, Wellesley College, USA
Ronaldo Dias Department of Statistics, State University of Campinas, Brazil

DOI: https://doi.org/10.19139/soic-2310-5070-1486

Keywords: natural splines, k-means, spectral clustering, surface clustering

Abstract

This paper presents a novel method for clustering surfaces. The proposal involves first using natural splines basis functions in a tensor product to smooth the data and thus reduce the dimension to a finite number of coefficients, and then using these estimated coefficients to cluster the surfaces via k-means or spectral clustering. An extension of the algorithm to clustering higher-dimensional tensors is also discussed. We show that the proposed algorithm exhibits the property of strong consistency, with or without measurement errors, in correctly clustering the data as the sample size increases. Simulation studies suggest that the proposed method outperforms the benchmark k-means and spectral algorithm which use the original data. In addition, an EGG real data example is considered to illustrate the practical application of the proposal.

References

C. Abraham, P. A. Cornillon, E. Matzner-Laber, and N. Molinari. Unsupervised curve clustering using b-splines, Scandinavian Journal of Statistics, vol. 30, pp. 581-595, 2003.

B. Abu-Jamous, R. Fa, and A. Nandi. Interactive cluster analysis in bioinformatics, Wiley, 2015.

L. M. Abualigah, A. T. Khader, and E. S. Hanandeh. A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis, Engineering Applications of Artificial Intelligence, vol. 73, pp. 111-125, 2018.

J. Blackhurst, M. J. Rungtusanatham, K. Scheibe, and S. Ambulkar. Supply chain vulnerability assessment: A network based visualization and clustering analysis approach, Journal of Purchasing and Supply Management, vol. 24, pp. 21-30, 2018.

C. de Boor. Subroutine package for calculating with b-splines. Techn.Rep. LA-4728-MS, Los Alamos Sci.Lab, Los Alamos NM, pp.. 109-121, 1971.

C. de Boor. On calculating with b-splines, Journal of Approximation Theory, vol. 6, number 1, pp. 50-62, 1972.

C. de Boor. Package for calculating with b-splines, SIAM Journal on Numerical Analysis, vol. 14, number 3, pp. 441–472, 1977.

M. Boull´e. Functional data clustering via piecewise constant nonparametric density estimation, Pattern Recognition, vol. 45, number 12, pp. 4389-4401, 2012.

K. Deng and X. Zhang. Tensor envelope mixture model for simultaneous clustering and multiway dimension reduction, Biometrics - online: doi = https://doi.org/10.1111/biom.13486, 2021.

R. Dias and D. Gamerman. A Bayesian approach to hybrid splines nonparametric regression, Journal of Stat. Comp. and Simul., vol. 72, number 4, pp. 285-297, 2002.

A. H. Duran, T. M. Greco, B. Vollmer, C. I. M., K. Crunewald, and M. Topf. Protein interactions and consensus clustering analysis uncover insights into herpesvirus virion structure and function relationships, PLOS Biology, 2019.

M. Febrero-Bande and M. de la Fuente. Statistical computing in functional data analysis: The r package fda.usc, Journal of Statistical Software, vol. 51, number 4, 2012.

F. Ferraty and P. Vieu. Nonparametric functional data analysis. Springer Series in Statistics, 2006.

D. Floriello. Functional sparse k-means clustering, Thesis, Politecnico di Milano, 2011.

P. Franti and S. Sieranoja. How much can k-means be improved by using better initialization and repeats?, Pattern Recognition, vol. 93, pp. 95-112, 2019.

C. Fr´event, M.-S. Ahmed, M. Marbac, and M. Genin. Detecting spatial clusters in functional data: New scan statistic approaches, Spatial Statistics, vol. 46, pp. 100550, 2021.

M. L. L. Garc´ıa, R. Garc´ıa-Rodenas, and A. G. G´omez. K-means algorithms for functional data, NEUROCOMPUTING, vol. 151, pp. 231-245, 2015.

J. A. Hartigan. Clustering algorithms. Wiley, 1975.

J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm, JRSSC, vol. 28, number 1, pp. 100-108, 1979.

T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2 edition, 2016.

G. Hu, M. Kaur, K. Hewage, and R. Sadiq. Fuzzy clustering analysis of hydraulic fracturing additives for environmental and human health risk mitigation, Clearn Technologies and Environmental Policy, vol. 21, pp. 39–53, 2019.

F. Ieva, A. M. Paganoni, D. Pigoli, and V. Vitelli. Multivariate functional clustering for the morphological analysis of

electrocardiograph curves, JRSSC, vol. 62, number 3, pp. 401-418, 2013.

S. Karlin. Some variational problems on certain sobolev spaces and perfect splines, Bull. Amer. Math. Soc., vol. 79, number 1, pp. 124-128, 01 1973.

C. Kooperberg and C. J. Stone. Logspline density estimation for censored data, Journal of Computational and Graphical Statistics, vol. 1, number 4, pp. 301-328, 1992.

J. Lemaire. Proprietes asymptotiques en classification, Statistiques et analyse des donnees, vol. 8, pp. 41-58, 1983.

S. Mallat. A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way. Academic Press, Inc., USA, 3rd edition, 2008.

S. Jim´enez-Murcia. Phenotypes in gambling disorder using sociodemographic and clinical clustering analysis: an unidentified new subtype?, Front Psychiatry, vol. 10, number 173, 2019.

J. Kim and H.-S. Oh. Pseudo-quantile functional data clustering, Journal of Multivariate Analysis, vol. 178, pp. 104626, 2020.

P. Lachout, E. Liebscher, and S. Vogel. Strong convergence of estimators as n-minimisers of optimisation problemsof optimisation problems, Annals of the Institute of Statistical Mathematics, vol. 57, number 2, pp. 291-313, 2005.

S. R. Lindemann and S. M. LaValle. Simple and efficient algorithms for computing smooth, collision-free feedback laws over given cell decompositions, The International Journal of Robotics Research, vol. 28, number 5, pp. 600-621, 2009.

H. Lu, S. Liu, H. Wei, and J. Tu. Multi-kernel fuzzy clustering based on auto-encoder for fmri functional network, Expert Systems with Applications, vol. 159, pp. 113513, 2020.

U. von Luxburg. A tutorial on spectral clustering, Statistics and Computing, vol. 17, number 4, pp. 395-416, 2004.

Q. Mai, X. Zhang, Y. Pan, and K. Deng. A doubly enhanced em algorithm for model-based tensor clustering, Journal of the American Statistical Association, vol. 0, number 0, pp. 1-15, 2021.

A. Martino, A. Ghiglietti, F. Ieva, and A. M. Paganoni. A k-means procedure based on a mahalanobis type distance for clustering multivariate functional data, Statistical Methods & Applications, vol. 28, number 2, pp. 301-322, 2019.

J.M Pena, J.A Lozano, and P Larranaga. An empirical comparison of four initialization methods for the k-means algorithm, Pattern Recognition Letters, vol. 20, number 10, pp. 1027-1040, 1999.

U. Reif. Uniform b-spline approximation in sobolev spaces, Numerical Algorithms, vol. 15, number 1, pp. 1-14, 1997.

T. Tarpey and K. K. J. Kinateder. Clustering functional data, Journal of Classification, vol. 20, number 1, pp. 93-114, May 2003.

S. Tokushige, H. Yadohisa, and K. Inada. Crisp and fuzzy k-means clustering algorithms for multivariate functional data,

Computational Statistics, vol. 22, number 1, pp. 1-16, 2007.

M. S. Udler, J. Kim, M. von Grotthuss, S. Bons-Guarch, J. B. Cole, J. Chiou, C. D. Anderson, M. Boehnke, M. Laakso, G. Atzmon, J. M. Glaser, B. Mercader, K. Gaulton, J. Flannick, G. Getz, and J. C. Florez. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis, PLOS Medicine, 2018.

M. A. Unser. Ten good reasons for using spline wavelets, In Wavelet Applications in Signal and Image Processing V, vol. 3169, pp. 422-431, 1997.

G. Wang, N. Lin, and B. Zhang. Functional k-means inverse regression, Computational Statistics & Data Analysis, vol. 70, number C, pp. 172-182, 2014.

J. H. Ward Jr. Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, vol. 58, number 301, pp. 236-244, 1963.

M. Wedel and W. Kamakura. Market segmentation: conceptual and methodological foundations. Springer Science & Business Media, 2 edition, 1999.

M. Yamamoto. Clustering of functional data in a low-dimensional subspace, Advances in Data Analysis and Classification, vol. 6, number 3, pp. 219-247, Oct 2012.

M. Yamamoto and Y. Terada. Functional factorial k-means analysis, Computational Statistics and Data Analysis, vol. 79, pp. 133-148, 2014.

X.L. Zhang, H. Begleiter, B. Porjesz, W. Wang, and A. Litke. Event related potentials during object recognition tasks, Brain

Research Bulletin, vol. 38, number 6, pp. 531-538, 1995.

Y. Zhang, X. Bi, N. Tang, and A. Qu. Dynamic tensor recommender systems, Journal of Machine Learning Research, vol. 22, pp. 1-35, 2021.