Thomasimov's tags:


EXPORT LIST RSS ?
Thomasimov's bookmarks matching tag spectrometry
 
Number of articles per page:
10 | 25 | 50 | 100
 
A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection.
Yutaka Yasui et al.
Biostatistics (Oxford, England) 4 (3), 449-63 (Jul 2003)
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of ?signature? protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.
 
The use of plasma surface-enhanced laser desorption/ionization time-of-flight mass spectrometry proteomic patterns for detection of head and neck squamous cell cancers.
Clinical cancer research : an official journal of the American Association for Cancer Research 10 (14), 4806-12 (15 Jul 2004)
PURPOSE: Our study was undertaken to determine the utility of plasma proteomic profiling using surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry for the detection of head and neck squamous cell carcinomas (HNSCCs). EXPERIMENTAL DESIGN: Pretreatment plasma samples from HNSCC patients or controls without known neoplastic disease were analyzed on the Protein Biology System IIc SELDI-TOF mass spectrometer (Ciphergen Biosystems, Fremont, CA). Proteomic spectra of mass:charge ratio (m/z) were generated by the application of plasma to immobilized metal-affinity-capture (IMAC) ProteinChip arrays activated with copper. A total of 37356 data points were generated for each sample. A training set of spectra from 56 cancer patients and 52 controls were applied to the "Lasso" technique to identify protein profiles that can distinguish cancer from noncancer, and cross-validation was used to determine test errors in this training set. The discovery pattern was then used to classify a separate masked test set of 57 cancer and 52 controls. In total, we analyzed the proteomic spectra of 113 cancer patients and 104 controls. RESULTS: The Lasso approach identified 65 significant data points for the discrimination of normal from cancer profiles. The discriminatory pattern correctly identified 39 of 57 HNSCC patients and 40 of 52 noncancer controls in the masked test set. These results yielded a sensitivity of 68% and specificity of 73%. Subgroup analyses in the test set of four different demographic factors (age, gender, and cigarette and alcohol use) that can potentially confound the interpretation of the results suggest that this model tended to overpredict cancer in control smokers. CONCLUSIONS: Plasma proteomic profiling with SELDI-TOF mass spectrometry provides moderate sensitivity and specificity in discriminating HNSCC. Further improvement and validation of this approach is needed to determine its usefulness in screening for this disease.
 
Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum.
Bioinformatics 21 (9), 1764 (2005)
MOTIVATION: Mass spectrometry yields complex functional data for which the features of scientific interest are peaks. A common two-step approach to analyzing these data involves first extracting and quantifying the peaks, then analyzing the resulting matrix of peak quantifications. Feature extraction and quantification involves a number of interrelated steps. It is important to perform these steps well, since subsequent analyses condition on these determinations. Also, it is difficult to compare the performance of competing methods for analyzing mass spectrometry data since the true expression levels of the proteins in the population are generally not known. RESULTS: In this paper, we introduce a new method for feature extraction in mass spectrometry data that uses translation-invariant wavelet transforms and performs peak detection using the mean spectrum. We examine the method?s performance through examples and simulation, and demonstrate the advantages of using the mean spectrum to detect peaks. We also describe a new physics-based computer model of mass spectrometry and demonstrate how one may design simulation studies based on this tool to systematically compare competing methods. AVAILABILITY: MATLAB scripts to implement the methods described in this paper and R code for the virtual mass spectrometer are available at http://bioinformatics.mdanderson.org/software.html SUPPLEMENTARY INFORMATION: http://bioinformatics.mdanderson.org/supplements.html.
 
Feature selection and nearest centroid classification for protein mass spectrometry.
BMC Bioinformatics 6 (1), 68 (2005)
BACKGROUND: The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. RESULTS: This study examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. From the wrapper approaches we tested sequential forward selection and a modified version of sequential backward selection. Embedded approaches included shrunken nearest centroid and a novel version of boosting based feature selection we developed. In addition, we tested several dimensionality reduction approaches, namely principal component analysis and principal component analysis coupled with linear discriminant analysis. To fairly assess each algorithm, evaluation was done using stratified cross validation with an internal leave-one-out cross-validation loop for automated feature selection. Comprehensive experiments, conducted on five popular cancer data sets, revealed that the less advocated sequential forward selection and boosted feature selection algorithms produce the most consistent results across all data sets. In contrast, the state-of-the-art performance reported on isolated data sets for several of the studied algorithms, does not hold across all data sets. CONCLUSION: This study tested a number of popular feature selection methods using the nearest centroid classifier and found that several reportedly state-of-the-art algorithms in fact perform rather poorly when tested via stratified cross-validation. The revealed inconsistencies provide clear evidence that algorithm evaluation should be performed on several data sets using a consistent (i.e., non-randomized, stratified) cross-validation procedure in order for the conclusions to be statistically sound.
 
Quality control and quality assessment of data from surface-enhanced laser desorption/ionization (SELDI) time-of flight (TOF) mass spectrometry (MS).
BMC Bioinformatics 6 Suppl 2 (suppl 2), S5 (15 Jul 2005)
BACKGROUND: Proteomic profiling of complex biological mixtures by the ProteinChip technology of surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry (MS) is one of the most promising approaches in toxicological, biological, and clinic research. The reliable identification of protein expression patterns and associated protein biomarkers that differentiate disease from health or that distinguish different stages of a disease depends on developing methods for assessing the quality of SELDI-TOF mass spectra. The use of SELDI data for biomarker identification requires application of rigorous procedures to detect and discard low quality spectra prior to data analysis. RESULTS: The systematic variability from plates, chips, and spot positions in SELDI experiments was evaluated using biological and technical replicates. Systematic biases on plates, chips, and spots were not found. The reproducibility of SELDI experiments was demonstrated by examining the resulting low coefficient of variances of five peaks presented in all 144 spectra from quality control samples that were loaded randomly on different spots in the chips of six bioprocessor plates. We developed a method to detect and discard low quality spectra prior to proteomic profiling data analysis, which uses a correlation matrix to measure the similarities among SELDI mass spectra obtained from similar biological samples. Application of the correlation matrix to our SELDI data for liver cancer and liver toxicity study and myeloma-associated lytic bone disease study confirmed this approach as an efficient and reliable method for detecting low quality spectra. CONCLUSION: This report provides evidence that systematic variability between plates, chips, and spots on which the samples were assayed using SELDI based proteomic procedures did not exist. The reproducibility of experiments in our studies was demonstrated to be acceptable and the profiling data for subsequent data analysis are reliable. Correlation matrix was developed as a quality control tool to detect and discard low quality spectra prior to data analysis. It proved to be a reliable method to measure the similarities among SELDI mass spectra and can be used for quality control to decrease noise in proteomic profiling data prior to data analysis.
 
Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform.
PROTEOMICS 5 (16), 4107 (2005)
Mass spectrometry is being used to find disease-related patterns in mixtures of proteins derived from biological fluids. Questions have been raised about the reproducibility and reliability of peak quantifications using this technology. We collected nipple aspirate fluid from breast cancer patients and healthy women, pooled them into a quality control sample, and produced 24 replicate SELDI spectra. We developed a novel algorithm to process the spectra, denoising with the undecimated discrete wavelet transform (UDWT), and evaluated it for consistency and reproducibility. UDWT efficiently decomposes spectra into noise and signal. The noise is consistent and uncorrelated. Baseline correction produces isolated peak clusters separated by flat regions. Our method reproducibly detects more peaks than the method implemented in Ciphergen software. After normalization and log transformation, the mean coefficient of variation of peak heights is 10.6%. Our method to process spectra provides improvements over existing methods. Denoising using the UDWT appears to be an important step toward obtaining results that are more accurate. It improves the reproducibility of quantifications and supplies tools for investigation of the variations in the technology more carefully. Further study will be required, because we do not have a gold standard providing an objective assessment of which peaks are present in the samples.

<< Prev 0      Showing entries 1 to 6 of 6 total      Next 0 >>