Permutation tests to estimate significances on Principal Components Analysis
Vasco M. N. C. S. Vieira
Computational Ecology and Software
2220-721X
2012
2
103-123
06/2012
International Academy of Ecology and Environmental Sciences
multivariate
permutation tests
principal components analysis
randomization
significance
stopping rules
Principal Component Analysis is the most widely used multivariate technique to summarize information in a
data collection with many variables. However, for it to be valid and useful the meaningful information must be
retained and the noisy information must be sorted out. To achieve it an index from the original data set is
estimated, after which three classes of methodologies may be used: (i) the analytical solution to the distribution
of the index under the assumption the data has a multivariate normal distribution, (ii) the numerical solution to
the distribution of the index by means of permutation tests without any assumption about the data distribution
and (iii) the bootstrap numerical solution to the percentiles of the index and the comparison to its assumed
value for the null hypothesis without any assumption about the data distribution. New indices are proposed to
be used with permutation tests and compared with previous ones from application to several data sets. Their
advantages and draw-backs are discussed together with the adequacy of permutation tests and inadequacy of
both bootstrap techniques and methods that rely on the assumption of multivariate normal distributions.
DOI 10.0000/issn-2220-721x-compuecol-2012-v2-0009
http://www.iaees.org/publications/journals/ces/articles/2012-2(2)/permutation-tests-to-estimate-significances.pdf