Wednesday, May 5, 2010
3:00 - 4:00
Room 457 Blocker
Department of Statistics
Discriminative learning and background modeling for motif discovery
One of the challenging problems in computational molecular biology and bioinformatics is to decode gene regulatory circuits. We developed two new methods for computational identification of transcription factor binding motifs, i.e., the sequence binding patterns of transcription factors (TFs). The first method, the Contrast Motif Finder (CMF), utilizes the contrast between two sets of sequences to find motifs that separate the sequence sets. Applying CMF to a collection of genome-wide ChIP-seq/chip data in mice, we achieved higher accuracy in motif finding compared to a few popular methods and discovered different motifs that may be recognized by the same TF dependent on its co-regulators. The second method builds a generative model to consider the background heterogeneity in nucleotide composition and evolutionary conservation across multiple species. Simulation studies and empirical evidence from biological data sets reveal the dramatic effect of background modeling on motif finding, and demonstrate that the proposed approach is able to achieve substantial improvements over commonly used background models.