Bioinformatics Seminar
Wednesday, September 16,
2009
3:00 - 4:00
Room 457 Blocker
Yuliya Karpievitch
Department of Statistics
Texas A&M University
Statistical Approaches to
Protein Quantitation in Bottom-Up MS-based Proteomics
In mass spectrometry-based, bottom-up, proteomics, protein
abundance measurements must be translated from MS peak heights for constituent
peptides. However, this translation is complicated by many factors. MS
intensities are derived from peak heights or areas but do not represent
absolute abundance levels. Intensities can vary greatly across peptides from
the same protein, due to, for example, differing ionization efficiencies or
other chemical characteristics. For “shotgun” proteomics measurements, many
peptides that are observed in some samples are not observed in others,
resulting in widespread missing values. Furthermore,
the fact that a peak was not observed for a peptide is often due to that
peptide’s presence at a lower abundance than the instrument
can detect. Because of this
informative missingness, care must be taken when handling the missing values to
avoid biasing abundance estimates.
Further analysis is complicated by the need to extensively
normalize MS data as
systematic biases are usually
present. Normalization models need to be flexible enough to capture biases of
arbitrary complexity, while avoiding overfitting that would invalidate
downstream statistical inference. Careful normalization of MS peak intensities
would enable greater accuracy and precision in quantitative comparisons of
protein abundance levels.
I will present (i) a statistical model that carefully accounts for
informative missingness in peak intensities and allows unbiased, model-based,
protein-level estimation and inference; (ii) a normalization algorithm,
EigenMS, which removes biases of arbitrary complexity while preventing
overfitting.