Crescendo - Program Information

Information

Background

Crescendo is a program which identifies functional sites in proteins. The conservation of amino acid residues has been shown to be strongly dependent on the environment in which they occur in the folded protein and amino acid substitution tables that give the likely substitutions of amino acids in particular local environments have been derived.

This method uses these substitution tables to distinguish those restraints placed on protein structure from additional restraints due to particular functions mediated by interactions with other molecules. For more information see Chelliah et al. (2004)

Running Crescendo

There are two ways in which Crescendo can be run. You can either upload your PDB and multiple sequence alignments to be used or you can simply give the name (and chain identifier if applicable) of your protein of interest and have an alignment created automatically. The first option is recommended as an automatically generated alignment from top scoring BLAST hits may not offer enough sequence variation for Crescendo to accurately identify functionally constrained sites.

Scoring Systems

When uploading your own files, there are two optional scoring systems used which are sequence-based and rely on the environment-specific substitution tables.

The conservation score
This is based on identifying sequence conservation within the alignment above that predicted from the environment-specific substitution tables.
The divergence score
This scoring system identifies where environment-specific substitution tables make poor predictions of the overall sequence distribution at each alignment position. It identifies atypical substitution patterns.

If more than one structure is included in the sequence alignment then these sequence-based scores can be combined with a structure-based score which describes the degree of structural conservation of each residue. The structure-based score should be approached with caution as it is sensitive to conformation changes in the proteins. Conformation will depend on the state of the protein, i.e. holo, apo or ternary complex for an enzyme or state of complexation in a non-obligate multiprotein complex.

The overall score is calculated for each amino acid.

Alignment File Format

The alignment file can contain sequence information for proteins with known structure and homologous sequences with no known structure.

The aligned sequences should be in PIR/NBRF format

The first line should consist of:
a ">" sign,
followed by a two letter code indicating the sequence type (eg. P1),
followed by a semicolon,
followed by the sequence identification code (eg. PDB code)
The next line should begin with the words "structure" if the corresponding PDB file has been uploaded, or "sequence" if only sequence information is available. A description of the sequence can follow. Please note that any proteins marked as "structure" will also have to be uploaded.
The actual sequence begins on the following line with the end indicated by a "*" (asterisk)

Example of PIR/NBRF format alignment:

>P1;2MM1
structure:2MM1:   1 : : 153 : :myoglobin:Homo sapiens: 2.80:15.8
----------GLSDGEWQLVLNVWGKVEA--DIPGHGQEVLIRLFKGHPE
TLEKFDRFK-HLKSEDEMKASEDLKKHGATVLTALGGILKKKG-----HH
EAEIKPLAQSHATKH--KIPVKYLEFISEAIIQVLQSKHPG-DFGADAQG
AMNKALELFRKDMASNYKELGFQG*
>P1;1PMB
structure:1PMB:   1 :A: 153 :A:myoglobin:Sus scrofa:2.50:18.5
----------GLSDGEWQLVLNVWGKVEA--DVAGHGQEVLIRLFKGHPE
TLEKFDKFK-HLKSEDEMKASEDLKKHGNTVLTALGGILKKKG-----HH
EAELTPLAQSHATKH--KIPVKYLEFISEAIIQVLQSKHPG-DFGADAQG
AMSKALELFRNDMAAKYKELGFQG*
>P1;1YMB
structure:1YMB:   1 : : 153 : :myoglobin:Equus caballus: 1.90:15.5
----------GLSDGEWQQVLNVWGKVEA--DIAGHGQEVLIRLFTGHPE
TLEKFDKFK-HLKTEAEMKASEDLKKHGTVVLTALGGILKKKG-----HH
EAELKPLAQSHATKH--KIPIKYLEFISDAIIHVLHSKHPG-DFGADAQG
AMTKALELFRNDIAAKYKELGFQG*
>P1;gi|127663|sp|P02181|MYG_INIGE
sequence
----------GLSDGEWQLVLNIWGKVEA--DLAGHGQDVLIRLFKGHPE
TLEKFDKFK-HLKTEAEMKASEDLKKHGNTVLTALGGILKKKG-----HH
EAELKPLAQSHATKH--KIPIKYLEFISEAIIHVLHSRHPG-DFGADAQA
AMNKALELFRKDIAAKYKELGFHG*

The names of your PDB files must correspond exactly with the names of the structures within the alignment

Program Output

Once Crescendo has run successfully you can view the alignment file (important if you have opted to have Crescendo run automatically). You also have the option of downloading two output files: crescendo.out and crescendo.kin.

crescendo.out
This file contains the scores for each amino acid in the PDB file. The first column contains the CRESCENDO score for that residue. Information about each residue then follows including the x,y and z coordinates and amino acid type and number. The final column gives the local environment of the residue in the form of a five letter code.

The first letter describes main-chain conformation and secondary structure:
H is helix
E is sheet
C is coil
P is the unusual positive phi main-chain angle

The second letter describes the solvent accessibility:
A is solvent accessible
a is solvent inaccessible (residues with side-chains of relative accessibility less than 7%)

The final three letters describe the side-chain interactions, for example hydrogen bonding

The third letter shows interactions between two side-chains:
S - Interactions present
s - No interactions

The fourth letter shows interactions between a side-chain and a main-chain carbonyl:
O - Interactions present
o - No interactions

The fifth letter shows interactions between a side-chain and a main-chain amide hydrogen:
N - Interactions present
n - No interactions

For example, a residue with the environment HASon would be in a helix conformation, be solvent accessible and form side-chain to side-chain interactions.
crescendo.kin
This is a file which produces the 3D kinemage of target protein. To view this file locally as a kinemage you will need to download mage

Another output option is also available. You can download a PDB file (or your protein of interest) with the Crescendo scores in the B-factor column.

Automatic Sequence Partitioning

If you have chosen to run Crescendo automatically, the alignment will have been generated from BLAST top scoring hits. As these may not represent the required sequence diversity for Crescendo to optimally predict the functional sites, we have chosen to split these sequences up based upon their evolutionary relationships. A phylogenetic tree will be generated (using the Neighbor-Joining method) which will then be partitioned at different intervals based on branch length to split the tree up into groups of varying divergence. Crescendo will be run on each of these groups and the results for each will be shown separately on the kinemage applet detailed below. As well as running on each of the subgroups, Crescendo will also run on the whole set of sequences for comparison. A PDF of the phylogenetic tree, with associated partitions can be viewed by clicking on the link.

The alignment files containing the sequences in each subgroup are also available to be viewed or downloaded.

Predicted Functional Residues

The scores of individual amino acids are mapped onto the protein structure, smoothed, converted into a grid of conservation density and this grid contoured. The kinemage of your protein structure is shown and its proposed functional site can be seen by clicking on the "contours" box (if you have uploaded your own files) or by clicking the "ALL_SEQS" box if running Crescendo automatically. The different contour cutoffs can be selected.

To see a list of residues which crescendo predicts to be functional at these various cutoffs you can select a value from the drop down box. These residues are then displayed on the screen. The higher scoring regions in the contouring represent residues within the alignment with greater functional restraints on their evolution.

Results will be present on the server for at least 30 minutes.

References

CRESCENDO:
V. Chelliah, L. Chen, T.L. Blundell and S.C. Lovell
Distinguishing Structural and Functional Restraints in Evolution in Order to Identify Interaction Sites
Journal of Molecular Biology, 342(5):1487-1504, 2004
JOY:
K. Mizuguchi, C.M. Deane, T.L. Blundell, M.S. Johnson and J.P. Overington
JOY: Protein Sequence-Structure Representation and Analysis
Bioinformatics, 14:617-623, 1998
ENVIRONMENT-SPECIFIC SUBSTITUTION TABLES:
J. Overington, D. Donnelly, M.S. Johnson, A. Sali and T.L. Blundell
Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds Protein Science, 1:216-226, 1992