Numerous different methods have been elaborated in modern analytical chemistry. One of the most important key problems in chemometrics is to compare various methods in a fair way. If the data can be arranged in a matrix form without empty cells the ranking of methods (or models) can be carried out. Features (variables, columns) and samples (cases, rows) characterizing the methods form the input matrix. The novel ranking procedure is based on sum of ranking differences (SRD) [1]. The features should be expressed on the same scale. The absolute values of differences for the ideal and actual ranking are summed up and the procedure is repeated for each (actual) feature. The SRD values obtained such a way order and group the features simply. If the ideal ranking is not known, it can be replaced by the average (consensus modeling).
The background idea is the same as that for collaborative tests: if the systematic errors of a given method are not known, it is expedient to measure it with various methods and assuming that the errors from different sources follow normal distribution. Whereas the noise is random, the signal is systematic. i.e. the noise cancels out, but the signal accumulates.
The validation can be performed with a kind of permutation test. Random features should be simulated and ordered by SRD procedure. The comparison of random and real SRD values shows unambiguously the reliability of ranking.
Random features (>100 000) were generated for each number of samples 13 < n < 61, and for each fifths number of samples (65, 70,... etc.) if n > 60. An approximation with Gaussian distribution has been used for these larger number of samples, the error term was less than 10–5. For small number of samples (n < 13) the true discrete distribution has been used.
The novel ranking method can be applied in many different instances from determination of panel consistency [2] via column selection in chromatography to the selection of the best models [1]. Several examples, determination of number of latent variables [3], selection of the best method for measuring antioxidant capacity in bears [4] and comparison of chemometric methods for classification [5], show the usefulness of the procedure unambiguously in diverse fields.
References:
1. K. Heberger, Sum of ranking differences compares methods or models fairly, Trends Anal. Chem. 28 (2009), doi:10.1016/j.trac.2009.09.009
2. K. Kollar-Hunek, J. Heszberger, Z. Kokai, M. Lang-Lazi, E. Papp, Testing panel consistency with GCAP method in food profile analysis, J. Chemometr. 22 (2008) 218-226.
3. R. Todeschini Data correlation, number of significant principal components and shape of molecules. The K correlation index, Anal. Chim. Acta 348 (1997) 419-430.
4. Haifeng Zhao, Wenfen Chen, Jian Lu, Mouming Zhao, Phenolic profiles and antioxidant activities of commercial beers, Food Chemistry 119 (2010) 1150-1158.
5. R. Todeschini, D. Ballabio, V. Consonni, A. Mauri, M. Pavan, CAIMAN (Classification And Influence Matrix Analysis): A new approach to the classification based on leverage-scaled functions, Chemometr. Intell. Lab. Syst. 87 (2007) 3-17.