Thomasimov's tags:


EXPORT LIST RSS ?
Thomasimov's bookmarks matching tag "Prostatic Neoplasms"
 
Number of articles per page:
10 | 25 | 50 | 100
 
A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection.
Yutaka Yasui et al.
Biostatistics (Oxford, England) 4 (3), 449-63 (Jul 2003)
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of ?signature? protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.
 
Feature selection and nearest centroid classification for protein mass spectrometry.
BMC Bioinformatics 6 (1), 68 (2005)
BACKGROUND: The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. RESULTS: This study examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. From the wrapper approaches we tested sequential forward selection and a modified version of sequential backward selection. Embedded approaches included shrunken nearest centroid and a novel version of boosting based feature selection we developed. In addition, we tested several dimensionality reduction approaches, namely principal component analysis and principal component analysis coupled with linear discriminant analysis. To fairly assess each algorithm, evaluation was done using stratified cross validation with an internal leave-one-out cross-validation loop for automated feature selection. Comprehensive experiments, conducted on five popular cancer data sets, revealed that the less advocated sequential forward selection and boosted feature selection algorithms produce the most consistent results across all data sets. In contrast, the state-of-the-art performance reported on isolated data sets for several of the studied algorithms, does not hold across all data sets. CONCLUSION: This study tested a number of popular feature selection methods using the nearest centroid classifier and found that several reportedly state-of-the-art algorithms in fact perform rather poorly when tested via stratified cross-validation. The revealed inconsistencies provide clear evidence that algorithm evaluation should be performed on several data sets using a consistent (i.e., non-randomized, stratified) cross-validation procedure in order for the conclusions to be statistically sound.

<< Prev 0      Showing entries 1 to 2 of 2 total      Next 0 >>