Database searches based on EI-MS, where the unknown spectrum is compared with spectra of known compounds in database, are well-established. EI spectra are relatively reproducible, so finding a matching spectrum is often a relatively good indication of the compound structure. NIST and Wiley mass-spectra databases are widely used for these searches. But they are applicable only for identification of substances contained in these databases. There are about 220 000 for NIST and 310 000 substances for Wiley that gives hardly any great opportunities for identification of unknowns in real samples.
Structure databases (e.g. PubChem, Merck Index, ChemIndex) contain much more compounds than mass-spectra databases (for example PubChem Database contains more than 6 000 000 structures). Using the chemical formula as an input for a compound database search is becoming a viable tool to provide indications or tentative identifications for unknown compounds. In this approach chemical formula predicted from GC-AED or HR-MS data is used for separation of possible structure candidates that then can be matched due to spectral classifiers and spectral data, so unknown compounds can be determined.
In this research three compounds (afobazol and its metabolites M-3 and M-11) were used to simulate unknown substances to be determined by structure databases search. It was supposed that their chemical formulas (C15H21N3O2S, C13H17N3O2S, C15H19N3O3S) were determined by GC-AED. Then all structures possessing these formulas were separated from ChemExper database (in average 138 possible structure candidates per each unknown compound). Next prediction of spectra of structure candidates and comparison with real EI-MS spectrum of unknown compound were used to choose the most appropriate structures for each unknown compound. Computer program Mass Frontier 3.0 was applied to predict EI-mass-spectra of structure candidates using their structural formula and general fragmentation rules common for all candidates. After that these generated spectra were compared with real mass-spectra of unknown compound, match value (MV) for every structure candidate was calculated, and candidates were ranged. NIST MS Search 2.0 software was used for mass-spectra comparison and calculation of MV. It is based on dot product calculation and used to be applied for mass-spectra database search. The candidates with the highest MV are to be the most probable structures of unknown compounds.
It was found that right candidate structures possessed the highest MV in all three cases, so they were isolated from wrong candidate structures, and structure of unknown compounds were determined unequally.
Proposed approach includes only available in most laboratories and relatively non-cost techniques GC-AED and GC-MS-EI. Furthermore these techniques can be applied for identification of impurities in complex mixtures without their isolation and concentrating (usually most time-consuming steps) and without using of other techniques such as IR and NMR spectroscopy. So it provides fast and effective tentative identification.