Wednesday, September 16, 2009
3:00 - 4:00
Room 457 Blocker
Department of Statistics
Texas A&M University
Statistical Approaches to Protein Quantitation in Bottom-Up MS-based Proteomics
In mass spectrometry-based, bottom-up, proteomics, protein abundance measurements must be translated from MS peak heights for constituent peptides. However, this translation is complicated by many factors. MS intensities are derived from peak heights or areas but do not represent absolute abundance levels. Intensities can vary greatly across peptides from the same protein, due to, for example, differing ionization efficiencies or other chemical characteristics. For “shotgun” proteomics measurements, many peptides that are observed in some samples are not observed in others, resulting in widespread missing values. Furthermore, the fact that a peak was not observed for a peptide is often due to that peptide’s presence at a lower abundance than the instrument
can detect. Because of this informative missingness, care must be taken when handling the missing values to avoid biasing abundance estimates.
Further analysis is complicated by the need to extensively normalize MS data as
systematic biases are usually present. Normalization models need to be flexible enough to capture biases of arbitrary complexity, while avoiding overfitting that would invalidate downstream statistical inference. Careful normalization of MS peak intensities would enable greater accuracy and precision in quantitative comparisons of protein abundance levels.
I will present (i) a statistical model that carefully accounts for informative missingness in peak intensities and allows unbiased, model-based, protein-level estimation and inference; (ii) a normalization algorithm, EigenMS, which removes biases of arbitrary complexity while preventing overfitting.