Information

Background

Crescendo is a program which identifies functional sites in proteins. The conservation of amino acid residues has been shown to be strongly dependent on the environment in which they occur in the folded protein and amino acid substitution tables that give the likely substitutions of amino acids in particular local environments have been derived.

This method uses these substitution tables to distinguish those restraints placed on protein structure from additional restraints due to particular functions mediated by interactions with other molecules. For more information see Chelliah et al. (2004)


Running Crescendo

There are two ways in which Crescendo can be run. You can either upload your PDB and multiple sequence alignments to be used or you can simply give the name (and chain identifier if applicable) of your protein of interest and have an alignment created automatically. The first option is recommended as an automatically generated alignment from top scoring BLAST hits may not offer enough sequence variation for Crescendo to accurately identify functionally constrained sites.


Scoring Systems

When uploading your own files, there are two optional scoring systems used which are sequence-based and rely on the environment-specific substitution tables.
  1. The conservation score
    This is based on identifying sequence conservation within the alignment above that predicted from the environment-specific substitution tables.

  2. The divergence score
    This scoring system identifies where environment-specific substitution tables make poor predictions of the overall sequence distribution at each alignment position. It identifies atypical substitution patterns.
If more than one structure is included in the sequence alignment then these sequence-based scores can be combined with a structure-based score which describes the degree of structural conservation of each residue. The structure-based score should be approached with caution as it is sensitive to conformation changes in the proteins. Conformation will depend on the state of the protein, i.e. holo, apo or ternary complex for an enzyme or state of complexation in a non-obligate multiprotein complex.

The overall score is calculated for each amino acid.


Alignment File Format

  • The alignment file can contain sequence information for proteins with known structure and homologous sequences with no known structure.
  • The aligned sequences should be in PIR/NBRF format
    1. The first line should consist of:
      a ">" sign,
      followed by a two letter code indicating the sequence type (eg. P1),
      followed by a semicolon,
      followed by the sequence identification code (eg. PDB code)
    2. The next line should begin with the words "structure" if the corresponding PDB file has been uploaded, or "sequence" if only sequence information is available. A description of the sequence can follow. Please note that any proteins marked as "structure" will also have to be uploaded.
    3. The actual sequence begins on the following line with the end indicated by a "*" (asterisk)
  • Example of PIR/NBRF format alignment:
    >P1;2MM1
    structure:2MM1:   1 : : 153 : :myoglobin:Homo sapiens: 2.80:15.8
    ----------GLSDGEWQLVLNVWGKVEA--DIPGHGQEVLIRLFKGHPE
    TLEKFDRFK-HLKSEDEMKASEDLKKHGATVLTALGGILKKKG-----HH
    EAEIKPLAQSHATKH--KIPVKYLEFISEAIIQVLQSKHPG-DFGADAQG
    AMNKALELFRKDMASNYKELGFQG*
    >P1;1PMB
    structure:1PMB:   1 :A: 153 :A:myoglobin:Sus scrofa:2.50:18.5
    ----------GLSDGEWQLVLNVWGKVEA--DVAGHGQEVLIRLFKGHPE
    TLEKFDKFK-HLKSEDEMKASEDLKKHGNTVLTALGGILKKKG-----HH
    EAELTPLAQSHATKH--KIPVKYLEFISEAIIQVLQSKHPG-DFGADAQG
    AMSKALELFRNDMAAKYKELGFQG*
    >P1;1YMB
    structure:1YMB:   1 : : 153 : :myoglobin:Equus caballus: 1.90:15.5
    ----------GLSDGEWQQVLNVWGKVEA--DIAGHGQEVLIRLFTGHPE
    TLEKFDKFK-HLKTEAEMKASEDLKKHGTVVLTALGGILKKKG-----HH
    EAELKPLAQSHATKH--KIPIKYLEFISDAIIHVLHSKHPG-DFGADAQG
    AMTKALELFRNDIAAKYKELGFQG*
    >P1;gi|127663|sp|P02181|MYG_INIGE
    sequence
    ----------GLSDGEWQLVLNIWGKVEA--DLAGHGQDVLIRLFKGHPE
    TLEKFDKFK-HLKTEAEMKASEDLKKHGNTVLTALGGILKKKG-----HH
    EAELKPLAQSHATKH--KIPIKYLEFISEAIIHVLHSRHPG-DFGADAQA
    AMNKALELFRKDIAAKYKELGFHG*
    
  • The names of your PDB files must correspond exactly with the names of the structures within the alignment


    Program Output

    Once Crescendo has run successfully you can view the alignment file (important if you have opted to have Crescendo run automatically). You also have the option of downloading two output files: crescendo.out and crescendo.kin.
    • crescendo.out
      This file contains the scores for each amino acid in the PDB file. The first column contains the CRESCENDO score for that residue. Information about each residue then follows including the x,y and z coordinates and amino acid type and number. The final column gives the local environment of the residue in the form of a five letter code.

      The first letter describes main-chain conformation and secondary structure:
      H is helix
      E is sheet
      C is coil
      P is the unusual positive phi main-chain angle

      The second letter describes the solvent accessibility:
      A is solvent accessible
      a is solvent inaccessible (residues with side-chains of relative accessibility less than 7%)

      The final three letters describe the side-chain interactions, for example hydrogen bonding

      The third letter shows interactions between two side-chains:
      S - Interactions present
      s - No interactions

      The fourth letter shows interactions between a side-chain and a main-chain carbonyl:
      O - Interactions present
      o - No interactions

      The fifth letter shows interactions between a side-chain and a main-chain amide hydrogen:
      N - Interactions present
      n - No interactions

      For example, a residue with the environment HASon would be in a helix conformation, be solvent accessible and form side-chain to side-chain interactions.

    • crescendo.kin
      This is a file which produces the 3D kinemage of target protein. To view this file locally as a kinemage you will need to download mage

    Another output option is also available. You can download a PDB file (or your protein of interest) with the Crescendo scores in the B-factor column.

    Automatic Sequence Partitioning

    If you have chosen to run Crescendo automatically, the alignment will have been generated from BLAST top scoring hits. As these may not represent the required sequence diversity for Crescendo to optimally predict the functional sites, we have chosen to split these sequences up based upon their evolutionary relationships. A phylogenetic tree will be generated (using the Neighbor-Joining method) which will then be partitioned at different intervals based on branch length to split the tree up into groups of varying divergence. Crescendo will be run on each of these groups and the results for each will be shown separately on the kinemage applet detailed below. As well as running on each of the subgroups, Crescendo will also run on the whole set of sequences for comparison. A PDF of the phylogenetic tree, with associated partitions can be viewed by clicking on the link.

    The alignment files containing the sequences in each subgroup are also available to be viewed or downloaded.

    Predicted Functional Residues

    The scores of individual amino acids are mapped onto the protein structure, smoothed, converted into a grid of conservation density and this grid contoured. The kinemage of your protein structure is shown and its proposed functional site can be seen by clicking on the "contours" box (if you have uploaded your own files) or by clicking the "ALL_SEQS" box if running Crescendo automatically. The different contour cutoffs can be selected.

    To see a list of residues which crescendo predicts to be functional at these various cutoffs you can select a value from the drop down box. These residues are then displayed on the screen. The higher scoring regions in the contouring represent residues within the alignment with greater functional restraints on their evolution.

    Results will be present on the server for at least 30 minutes.


    References

    • CRESCENDO:
      V. Chelliah, L. Chen, T.L. Blundell and S.C. Lovell
      Distinguishing Structural and Functional Restraints in Evolution in Order to Identify Interaction Sites
      Journal of Molecular Biology, 342(5):1487-1504, 2004

    • JOY:
      K. Mizuguchi, C.M. Deane, T.L. Blundell, M.S. Johnson and J.P. Overington
      JOY: Protein Sequence-Structure Representation and Analysis
      Bioinformatics, 14:617-623, 1998

    • ENVIRONMENT-SPECIFIC SUBSTITUTION TABLES:
      J. Overington, D. Donnelly, M.S. Johnson, A. Sali and T.L. Blundell
      Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds Protein Science, 1:216-226, 1992