P13. Application of chemometric tools for coal calibration by DRIFT spectroscopy

Yu. M. Possokhov, V. K. Popov, V. I. Butakova

Eastern Research & Development Institute of Coal Chemistry, Yekaterinburg, Russia

The traditional determination of bituminous coal properties is time-consuming, requires special equipment and sample preparation routines. The innovation consists in use of diffuse reflectance mid-infrared Fourier-transform (DRIFT) spectroscopy which provides rapid molecular analysis of coals and coal blends in the 350–7500 cm–1 region from laboratory-comminuted coal powders and thus reveals the most valuable information about the constitution of organic matter. It also stands for elaboration of the methodology able to determine coal properties with high reliability attending only to its DRIFT spectral data. Thus the coal calibration is a problem of QSPR/QSAR modeling.

Summarizing our long-term experience in coal calibration we mark out the next stages: (a) single spectrum preprocessing, (b) preprocessing of the spectrum population formed, (c) regression model building, and (d) regression model tuning.

Single spectrum preprocessing consists in choosing intensity units, specular and scatter correction, background estimation. None of these may be applied independently towards the coal property to model and the model accuracy. The choice of intensity units among theoretically valid log-scaled, Kubelka-Munk and reflectance ones is exposed as a non-linear transformation of predictor variables in terms of the Generalized Additive Model technique.

In order to eliminate specular and scatter artifacts, adversely affecting DRIFT spectra of coal powders, several data treatments may be applied. But neither traditional Multiplicative Scatter Correction (MSC), nor modern Extended Multiplicative and Extended Inverse Scatter corrections suit for single spectrum preprocessing being a part of population-level preprocessing routines. Moreover these treatments assume the mean spectrum to represent a reference (ideal) one and we have showed that it lacks of regards for FTIR-spectrometer peculiarities: its light source energy distribution and detector sensitivity. In exchange for above-mentioned corrections we have proposed algorithms for the hallmarked selection [1]. Also Standard Normal Variate (SNV) or sub-spectrum area normalization are then may be applied.

Background estimation is a very difficult problem for coal spectra as they have broad bands. For a long time we are using the expert-dependent technique of linear baselines built upon our earlier findings in bituminous coal structure [2]. Being at most automated this technique brings high selectivity in peak identification of structural groups. Such extraction of analytical signal from raw spectra may be implemented by difference spectra approach too [3].

There are two main tasks in preprocessing of the spectrum population. The first is to reach a uniform-like distribution of the modeled property while forming a training subset. The second is to identify multivariate outliers. Manipulating populations of thousands spectra we can achieve much success in uniforming the distributions of the modeled properties. The second task is the outstanding problem. Traditionally the studentized residuals over ±3 are considered as outliers. But in practice we usually observe the outlier masking and swamping effects well described in [4]. Therefore we apply the precalibration identification methods based on Mahalanobis distance and robust regression for outlier unmasking.

Regression model building is the culmination procedure in coal calibration. We have arrived at a conclusion that the competitive approach is the best for coal calibration as it combines several regression approaches into the unified predictive algorithm. Also we underline two types of regression techniques: the first based on spectrum-like data (functional data) and the second based on expert grouping of predictors. PLS-like approaches with non-linear effects such as Orthogonal Signal Correction (OSC) are powerful in the first case, and in the second case several machine learning methods (usually referred to as stochastic gradient boosting trees) are preferred.

Regression model tuning is that we first propose to use. The idea comes from the fact that there are obvious errors in a coal property determination. We can iteratively variate these errors in a valid tolerance range to reach a minimum of the model RMSEC. For that purpose a some kind of genetic algorithm may suit.

As a matter of preceding facts we have developed the on-line coal analyzer called SPEK-TROTEST employed in laboratories of coke and byproduct producers, cleaning plants, and mines in Russia and Kazakhstan [5]. The SPEKTROTEST is introduced in the National Standard Grading System of these countries. This commercial analyzer can predict up to 25 coal properties characterizing its quality. For example, ash is the hardest coal property to predict. Applying the above-mentioned chemometric tools in the SPEKTROTEST elaboration we have reached the predictive accuracy for Ekibastuz coals essentially close to required for ash range 10–70 abs. %. [3].

Further accuracy improvements of coal calibration is a conjunction of chemometric tools and our findings in coal structure. If the latter is not abandoned the prediction of coal properties by DRIFT spectroscopy will be extremely important in the near future.

References:
1. V. K. Popov, Yu. M. Posokhov, Coke and Chemistry 52 (11) (2009).
2. N. D. Rus'yanova, Coal Chemistry [in Russian], Nauka, Moscow, 2000, ISBN 5-02-004404-0.
3. V. K. Popov, Yu. M. Posokhov, Coke and Chemistry 52 (12) (2009).
4. Ben-Gal I., Outlier detection, In: Maimon O. and Rockach L. (Eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers, 2005, ISBN 0-387-24435-2.
5. V. K. Popov, Coke and Chemistry 3 (2006) 13.