BindGene

A DNA Binding Site Searcher

This website is to help biologists search for transcription factor binding sites - especially those of the transcription factor HNF1A. (Non-scientists who don't know what a transcription factor is can read this explanation).

The main facilities at this website

Search for binding sites near a known gene - you tell this page which gene you are interested in, then it gets the upstream DNA sequence and searches it. This works for human, mouse and rat genes.

Search for binding sites in DNA sequence - here you can paste in some actual DNA sequence and get it searched. (This page can also be used to search aligned DNA).

Score a list of sites - here you can enter a list of known binding site sequences, to see what score they get using our system.

How does BindGene differ from other binding-site searchers?

There are several websites that will search for transcription factor binding sites, such as TFBind, PROMO, several programs associated with the TRANSFAC database (such as Match and MatInspector), ConSite (especially for searching for conserved sites) - so what are the distinguishing features of BindGene?

p-values - from BindGene results, you can estimate if a "match" is the type of match that would occur by chance anyway, or if it is such an unusually good match that it is likely to be a genuine binding site.

Automatic sequence retrieval - you don't have to paste in actual DNA sequence; tell this website which gene to search, and it will try to obtain the upstream DNA sequence automatically.

Research done using this website

Research done using this website has recently been published by the journal "In Silico Biology" - Combining genome and mouse knockout expression data to highlight binding sites for the transcription factor HNF1alpha. This project found good evidence for eight HNF1 binding sites, three of which were known already, and five of which are novel (near the genes F13B, PRODH2, HSD17B2, SLC7A9, SLC16A7).

A second article in "Mol Genet Metab" describes more results. Particularly interesting was a predicted HNF1 site found 42202 bases upstream of the gene for HNF1B, at a significance level <1%. In the early phases of the project, I didn't really expect to detect sites such a long way upstream of a gene with such a good significance level. If you have read this paper and want to try "Method 2" yourself, do read the instructions about using aligned DNA.

Citing this site

If you publish a result obtained using this website, please include in the publication our web address www.BindGene.org, and cite the "Mol Genet Metab" paper given earlier (or the "In Silico Biology" paper if that is more appropriate). Oh and send me an e-mail when it is published please!

Unless you pasted in your own DNA, this website will have contacted other website(s) to get publicly available genome sequence, so they should be acknowledged/cited as well. (Visit the websites listed in Acknowledgements below to find who provided the data on whichever genome you used).

Resources behind this website

This website was written by C R Lockwood, and involved two departments of the University of Exeter - the BioInformatics Centre, and the Molecular Genetics group (who are located at the Royal Devon & Exeter Hospital, and are interested in diabetes research). Recently I have moved to the Manchester BioInformatics Group, where this website is now located.

Please send any comments to Chris Lockwood (lockwood@bioinf.man.ac.uk).

I have not finished developing this site (March 2003). Please be aware that some of the features are experimental!

Acknowledgements

When processing a query, this website uses the search system provided at the US National Library of Medicine (NLM) to retrieve some of the data from their GenBank databases. You may wish to read their Disclaimer and Copyright notice. The NCBI BLAST searcher is used for some searches (BLAST reference: Altschul et al, Nucleic Acids Research, 25:3389-3402). Genome data is normally obtained from the Santa Cruz web database. Rodent genome reads are also sometimes obtained (if you ask for this) using the SSAHA search system on the Ensembl Trace Server (their Conditions of Data Release can be found here). These organisations have not provided any endorsement of my webpage.

Funding for this work has been provided by the EPSRC and Diabetes UK.