Drosophila Curated Transcription Factor Motifs
 
Background

This page provides motif models reported in 51 primary references in the form of position weight matrices (PWMs) for 56 Drosophila melanogaster transcription factors. These curated data should be of use for verifying the results of computational motif inference projects in the genus Drosophila (e.g. the Tiffin database), and are derived from two different sources: 1) in vitro binding site selection experiments (e.g. SELEX-like methods), and 2) consensus sequences derived from compiled genomic binding site sequences. PWMs are reported as frequencies (not raw counts) to standardize across publications and are not rescaled relative to a background model.

Matrices are presented in .xms format, an xml-like format for motif models developed by Thomas Down at the Sanger Institue, which can be viewed using his MotifExplorer tool distributed in the NestedMICA package. Tags in these files provide information such as the source, primary reference and (where available) IDs for similar matrices in the Transfac and Jaspar databases. Since data presented here were curated independently, there are small discrepencies for some motifs with those in Jaspar/Transfac, which were resolved here to reflect how motifs are reported in the primary literature. Of the 62 PWMs reported here, 13 can be found in Jaspar (core) and 24 can be found in Transfac 7.0. Since the Jaspar motifs are a subset of the Transfac motifs, 38 of the motifs reported here are found in neither of these other online resources.

General notes: Two of the 56 factors have PWMs for different DNA binding domains in the same protein (prd-HD/prd-PD and shn-ZFP1/shn-ZFP2), four PWMs are from different isoforms of the same gene (br-Z1, br-Z2, br-Z3 & br-Z4), two PWMs are for heterodimeric factors (EcR-usp and dif-Rel), and two PWMs are reported for the same factor (dl-A and dl-B) from different experiments, giving a total of 62 PFMs. Site selection experiments were excluded from this dataset where a segment of the oligo was held constant to a partial recognition sequence (e.g. Dfd and ftz). Frequencies for compiled data are derived from the reported IUPAC consensus string, not the frequencies of the alignment block, since many of these are derived from small samples.

A graphical representation of the data in these motif models can be found here.

 
Download (v1.1)

Download Motif Models: A flatfile in multi-xms format for 62 curated motif models (38 "selex" and 24 "consensus"). Note: this file includes 12 PWMs corresponding to 13 JASPAR core PWMs not included in Supplemental File 3 of Down et al. (2007) Large scale discovery of promoter motifs in Drosophila melanogaster. PLoS Computational Biology 3:e7.

Download Fasta sequences: A .tar archive of available sequence data from in vitro site selection experiments used to generate 26 of the selex PWMs above.

 
Credits

Sequence data was kindly provided by Yasuko Akiyama-Oda, Olivier Bardot, Mark van Doren, Naoyuki Fuse, Mark Garfinkel, Shigeo Hiyashi, Jim Posakony, Kate Senger, David Wilson and Riqiang Yan. Please email casey.bergman@manchester.ac.uk for questions or comments.

Please consider these data open access, but please cite this URL if you use or redistribute these data.

This page was last updated 07-Feb-2007