Department of Statistics and Data Science, College of Natural Sciences
University of Texas, Austin
Bayesian Feature Allocation Models for Tumor Heterogeneity
We characterize tumor variability by hypothetical latent cell types that are defined by the presence of some subset of recorded SNV’s (single nucleotide variants, that is, point mutations). Assuming that each sample is composed of some sample-specific proportions of these cell types we can then fit the observed proportions of SNV’s for each sample. In other words, by fitting the observed proportions of SNV’s in each sample we impute latent underlying cell types, essentially by a deconvolution of the observed proportions as a weighted average of binary indicators that define cell types by the presence or absence of different SNV’s. In the first approach we use the generic feature allocation model of the Indian buffet process (IBP) as a prior for the latent cell subpopulations. In a second version of the proposed approach we make use of pairs of SNV’s that are jointly recorded on the same reads, thereby contributing valuable haplotype information. Inference now requires feature allocation models beyond the binary IBP. We introduce a categorical extension of the IBP. Finally, in a third approach we replace the IBP by a prior based on a stylized model of a phylogenetic tree of cell subpopulations.