THE DELPHOS USER GUIDE
An interactive query language for the
OWL composite sequence database.
A.J.Bleasby
EBI, Hinxton Genome Campus, UK.
ajb@ebi.ac.uk
DELPHOS
1 Introduction
There are lots of interesting problems in information retrieval from
molecular biological databases which have not previously been approached.
These include:-
(a) The problem of molecular biology linguistics. Difficulties of
text retrieval arise because of the terminology of molecular
biology and its usage. Complex queries with many parameters,
perhaps including both text and sequence terms, are commonly
needed.
(b) Integration of the results of sequence similarity searches and
text retrieval. Similarity searches based on sequence alignment
algorithms, for example those using the Lipman-Pearson method
(Pearson and Lipman, 1988), frequently retrieve only a subset
of the proteins of interest. It is useful if further retrieval
and removal of unwanted entries can use textual and other
parameters. Similarly, the results of textual searches can
usefully be strengthened by sequence similarity searches.
(c) The creation of mini-databases for further research. Specialised
subsets of the main database may be required as inputs to other
programs, for example for multiple sequence alignment prior to
modelling, searches by sequence alignment or pattern recognition.
These and other possible applications demand considerable flexibility in
both the query language and the means of controlling information flow.
So far none of the standard available systems possess the combination
of properties required in molecular biology. Relational databases have
the necessary query flexibility through languages such as SQL but lack
facilities for character string searching of long sequences and strings
of text and for efficient integration with the scientific search
methods. Other information retrieval packages offer relational text
queries but fall down on criteria of integration and inappropriate
text indexing. Integration could be achieved using the list processing
facilities of high level languages such as LISP but the speed of
retrieval would be slow. When DELPHOS was written the only available
retrieval software designed specifically for sequence databases was
the NBRF PSQ system (George et al., 1986), which is suitable only
for simple queries and uses a very slow sequential scan method for
text retrieval.
Because of the inadequacies of existing software systems an integrated
query and retrieval system has been designed and implemented de novo.
The DELPHOS system combines some of the characteristics of relational
database query methods, information retrieval systems and list
processing languages, in a way which is appropriate for the
manipulation of protein sequence data and related information for
research in protein engineering and molecular biology.
DELPHOS is a robust system providing rapid retrieval and manipulation
of data from protein sequence databases. It has a flexible query language
for retrieval from both textual and sequence areas of a database.
Queries of any degree of complexity can be constructed and the results
integrated by list manipulations. Results of such queries can also be
integrated with the hit lists derived from sequence similarity searches
using SWEEP. Entries retrieved by these search methods can readily be
saved as disc files of specialised mini-databases to facilitate further
research.
The index files, used for both sequence and text retrieval, permit
simple, flexible parameter specification and ensure very rapid data
retrieval. Simple queries take only a few seconds of cpu time on a VAX
minicomputer. Highly complex queries involving relational operations
and list integrations rarely take more than a minute. Only fuzzy
sequence searches generate `comparatively' slow searches as these have
to be performed by a serial scan of the database.
This combination of properties makes DELPHOS a useful tool for
increasing the accessibility of data and knowledge relating to protein
sequences. It can be employed in a wide range of applications, from
browsing of the database in an exploratory fashion using simple text
and sequence strings, to creation of specialised subsets of the
database with very high precision and completeness using integration of
sequence similarity searches and text retrieval. It can also be used
to assist analysis of terminology in molecular biology and thesaurus
construction.
2 Using DELPHOS.
The DELPHOS language is easy to understand but formal descriptions of
query languages tend to be rather difficult to absorb.
For this reason the formal description is deferred to later in this
document and a tutorial-style is given immediately to show how simple
the query language is to use. An example of the use of DELPHOS in
a practical example is given later.
2.1 How to enter DELPHOS
This depends on how your computer manager has set the system up. There
will generally be site-specific commands you will need to type to run
DELPHOS; ask your system manager what this is or refer to local
documentation.
When DELPHOS is invoked it will display a summary of the current version
of the OWL database and then present you with the
DELPHOS>
prompt. In the rest of this documentation all queries will be shown
with this prompt, it is reproduced solely for clarity and you do not
need to type it. The examples in this documentation are based on
a test sequence database.
2.2 A sample sequence search
Lets take the search for a directly matching peptide as our first
example. The SEQ function is used for this. Type
DELPHOS> display seq "sgksir" (fig 1)
Figure No.1
WORKLIST ENTRIES (14):
DEHUAA Alcohol dehydrogenase (EC 1.1.1.1) alpha chain - Human
DEHUAB Alcohol dehydrogenase (EC 1.1.1.1) beta-1 chain - Human
DEHUAG Alcohol dehydrogenase (EC 1.1.1.1) gamma-1 chain - Human
DEMSAA Alcohol dehydrogenase (EC 1.1.1.1) A chain - Mouse
ADH_PAPHA ALCOHOL DEHYDROGENASE (EC 1.1.1.1) (ADH). - BABOON (PAPIO
HAMADRYAS).
ADHG_HUMAN ALCOHOL DEHYDROGENASE GAMMA CHAIN (EC 1.1.1.1) (GENE NAME: ADH3).
- HUMAN (HOMO SAPIENS).
ADHS_HORSE ALCOHOL DEHYDROGENASE S CHAIN (EC 1.1.1.1). - HORSE (EQUUS
CABALLUS).
ADHX_HUMAN ALCOHOL DEHYDROGENASE CLASS III CHI CHAIN (EC 1.1.1.1) (GENE
NAME: ADH5). - HUMAN (HOMO SAPIENS).
HUMADH21C HUMADH21C Human class I alcohol dehydrogenase beta-1 subunit,
allele 1 mRNA, complete cds. - Homo sapiens Eukaryota
HUMADH2BA HUMADH2BA Human class I alcohol dehydrogenase (ADH2) beta-1
subunit mRNA, complete cds. - Homo sapiens Eukaryota
HUMADH2C2 HUMADH2C2 Human class I alcohol dehydrogenase (ADH2) beta-1
subunit mRNA, complete cds. - Homo sapiens Eukaryota
HUMADH3G2 HUMADH3G2 Human class I alcohol dehydrogenase (ADH3) gamma
subunit, allele 2 mRNA, complete cds. - Homo sapiens Eukaryota
HUMADH5C3 HUMADH5C3 Human alcohol dehydrogenase class III (ADH5) mRNA,
complete cds. - Homo sapiens Eukaryota
ADHB_HUMAN ALCOHOL DEHYDROGENASE BETA CHAIN (EC 1.1.1.1). - HOMO SAPIENS (
HUMAN).
This query finds and gives brief information about all sequences in the
database which contain the hexapeptide ser-gly-lys-ser-ile-arg. The first
word `display' is the `command' used in DELPHOS to say "show on the
screen the results of the following query". The second word `seq' is
the start of the query and is the `function' used to find exact
sequence matches. Any function like `seq' is always followed by a
parameter which tells DELPHOS what you want the function to find, in
the above case this is the ALANT peptide. Parameter string are always
bounded by double quotes.
The query given above produces the output shown in figure 1. At the
top it says there are XX entries in the `worklist'. Lists will be
described more fully later. In the meantime the worklist can be
regarded as a store cupboard within DELPHOS where the results of
your query are kept. This means that, if you want to see the results
of your query again, all you need do is type
DELPHOS> display
Typing `display' on its own always shows what is in the worklist. Try it.
What does the output mean?
By default the `display' command shows only brief information about
proteins which match your query. This information consists of the
protein code (pcode) and a summary line. Every sequence in the
database must have a unique identifying name; this name is the pcode.
For example, the pcode for ovine opsin is `OOSH', similarly the pcode
for Factor VIII is `EZHU'. Whenever you want to explicitly refer to
a particular protein you use its pcode. The summary line, also called
the title line, tells you the English name of the protein and often
tells you the source e.g. `Cytochrome C - Human'.
DELPHOS does, of course, allow you to display protein information in
detail. How to display sequences and bibliographic information is
described later.
2.3 A sample title search
Something you'll frequently want to do is to find a protein, or group
of proteins, in the database given its name. There are two ways of
doing this in DELPHOS. The first uses the `title' function; the second
method which uses the `text' function is described in the next section.
Suppose you want to find all opsins in the database. Type
DELPHOS> display title "opsin" (fig 2)
Figure No.2
WORKLIST ENTRIES (11):
OOBO Rhodopsin - Bovine
OOFF Rhodopsin - Fruit fly
OOFF2 Opsin 2 - Fruit fly
OOHUB Blue-sensitive opsin - Human
OOHUR Red-sensitive opsin - Human
OOHUG Green-sensitive opsin - Human
OPS3_DROME OPSIN RH3 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NAME: RH3 OR
RH92CD). - FRUIT FLY (DROSOPHILA MELANOGASTER).
OPS4_DROME OPSIN RH4 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NAME: RH4). -
FRUIT FLY (DROSOPHILA MELANOGASTER).
OPSD_HUMAN RHODOPSIN. - HUMAN (HOMO SAPIENS).
OPSD_MOUSE RHODOPSIN. - MOUSE (MUS MUSCULUS).
OPSD_OCTDO RHODOPSIN. - GIANT OCTOPUS (OCTOPUS DOFLEINI).
The output is shown in figure 2. The title function searches all the
title (summary) lines for any occurences of the character string `opsin'.
The search does not differentiate between upper and lower case letters so
it will find for example both `OPSIN' and `oPsIn' given the query
above. Note that the database text describing a protein may contain
the word `opsin' within it but, if the word doesn't appear in the
title line, the `title' function will not find it.
DELPHOS treats textual information in a special way. It allows you
to search for any alphanumeric characters (A-Z, a-z and 0-9) but
ignores all other characters (e.g. % & - $ etc). It even ignores any
space characters within the database. This is called `free text
searching' and has many advantages. As an example type
DELPHOS> display title "p450" (fig 3)
Figure No.3
WORKLIST ENTRIES (10):
O4HU6 Cytochrome P450IA1 - Human (fragment)
O4RBM4 Cytochrome P450IA2, isosafrole-inducible - Rabbit (fragments)
CP41_RAT CYTOCHROME P450 IVA1 (P-450-LA-OMEGA) (LAURIC ACID OMEGA-
HYDROXYLASE) (EC 1.14.15.3) (P452) (GENE NAME: CYP4A1). - RAT (
RATTUS NORVEGICUS).
CPAX_HUMAN CYTOCHROME P450 IIA (EC 1.14.14.1). - HUMAN (HOMO SAPIENS).
CPD1_RAT CYTOCHROME P450 IID1 (P450 DB1) (P450 CMF1A) (DEBRISOQUINE 4-
HYDROXYLASE) (EC 1.14.14.1) (GENE NAME: CYP2D1). - RAT (RATTUS
NORVEGICUS).
CPM1_PIG CYTOCHROME P450 XIA1 (P450(SCC)) (EC 1.14.15.6), MITOCHONDRIAL (
CHOLESTEROL SIDE-CHAIN CLEAVAGE ENZYME) (GENE NAME: CYP11A1). -
PIG (SUS SCROFA).
RABCT450G RABCT450G Rabbit cytochrome P-450Bc2 DNA, 5' flanking region. -
Oryctolagus cuniculus Eukaryota
RATP45GMS RATP45GMS Rat polymorphic, male-specific cytochrome P-450g mRNA,
complete cds. - Rattus norvegicus Eukaryota
PAHDN3 PAHDN3 Plasmid pAH-delta-N3, junction area between the ADH1
promoter and cytochrome P-450 (pHP3) cDNA. - Artificial gene
Artificial sequences
NRL_2CPP1 CYTOCHROME P450CAM (CAMPHOR MONOOXYGENASE) (E.C.1.14.15.1) WITH
BOUND CAMPHOR - (PSEUDOMONAS PUTIDA)
You'll see that it picks up proteins with both `p450' and `p-450' (fig 3) in
their title lines. This is because it ignores the hyphen character.
In other query languages you'd have had to use two queries to pick up
all the proteins you wanted. Another example of the effectiveness of
free text searching would be `Cytochrome C'. There is obviously no
problem in searching for `cytochrome' but the letter `C' presents
some difficulty. If you gave other query languages this string they
would certainly pick up any proteins which contained `cytochrome' but
would be happy if the title line also contained a letter `C' anywhere
within it i.e. not immediately after the word `cytochrome'. Because
DELPHOS ignores spaces the query can be formulated precisely using
DELPHOS> display title "cytochromec"
In the last example the `c' will always be in the right place!
Another example of free text searching is that you don't have to give
an entire word, just a bit of it will do. An example would be
DELPHOS> display title "osai" (fig 4)
Figure No.4
WORKLIST ENTRIES (13):
Y1_FMV HYPOTHETICAL PROTEIN 1. - FIGWORT MOSAIC VIRUS (FMV).
Y16K_BGMV HYPOTHETICAL 15.6 KD PROTEIN. - BEAN GOLDEN MOSAIC VIRUS.
Y30K_BGMV POTENTIAL 29.7 KD PROTEIN (PUTATIVE INSECT TRANSMISSION PRODUCT).
- BEAN GOLDEN MOSAIC VIRUS.
Y40K_BGMV HYPOTHETICAL 40.2 KD PROTEIN. - BEAN GOLDEN MOSAIC VIRUS.
MBGBCG MBGBCG Bean golden mosaic virus (BGMV), DNA B, complete sequence.
- Bean golden mosaic virus Viridae
MBGBCG1 MBGBCG Bean golden mosaic virus (BGMV), DNA B, complete sequence.
- Bean golden mosaic virus Viridae
MBGBCG2 MBGBCG Bean golden mosaic virus (BGMV), DNA B, complete sequence.
- Bean golden mosaic virus Viridae
MBGBCG3 MBGBCG Bean golden mosaic virus (BGMV), DNA B, complete sequence.
- Bean golden mosaic virus Viridae
MBGBCG5 MBGBCG Bean golden mosaic virus (BGMV), DNA B, complete sequence.
- Bean golden mosaic virus Viridae
MTGMVS1 MTGMVS1 Tomato golden mosaic virus subgenomic DNA derived from
DNA B cccds = covalently closed circular double-stranded
molecule. - Tomato golden mosaic virus Viridae
JU0041 Hypothetical 14.5K protein - Chloris striate mosaic virus (CSMV)
JU0044 Hypothetical 15.8K protein - Chloris striate mosaic virus (CSMV)
JU0043 Hypothetical 33.2K protein - Chloris striate mosaic virus (CSMV)
which will pick up, for example, all the entries with `mosaic' in the
title line (fig 4). Furthermore
DELPHOS> display title "iatemosaicvi" (fig 5)
Figure No.5
WORKLIST ENTRIES (3):
JU0041 Hypothetical 14.5K protein - Chloris striate mosaic virus (CSMV)
JU0044 Hypothetical 15.8K protein - Chloris striate mosaic virus (CSMV)
JU0043 Hypothetical 33.2K protein - Chloris striate mosaic virus (CSMV)
will pick up all those entries containing `strIATE MOSAIC VIrus'! (fig 5)
2.4 An example text search
The DELPHOS `title' function restricts the search to the title (summary)
lines; the `text' function doesn't. The scope of the `text' function
is the ENTIRE title, bibliographic, comment and feature information
within the database. Because DELPHOS is so fast the speed of retrieval
using `text' is virtually distinguishable from `title' searches.
Try the query
DELPHOS> display text "opsin" (fig 6)
Figure No.6
WORKLIST ENTRIES (15):
OOBO Rhodopsin - Bovine
OOFF Rhodopsin - Fruit fly
OOFF2 Opsin 2 - Fruit fly
OOHUB Blue-sensitive opsin - Human
OOHUR Red-sensitive opsin - Human
OOHUG Green-sensitive opsin - Human
QRHYB2 Beta-2-adrenergic receptor - Hamster
GBT1_BOVIN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-1 SUBUNIT (
TRANSDUCIN ALPHA-1 CHAIN). - BOVINE (BOS TAURUS).
GBT1_HUMAN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-1 SUBUNIT (
TRANSDUCIN ALPHA-1 CHAIN) (GENE NAME: GNAT1). - HUMAN (HOMO
SAPIENS).
GBT2_BOVIN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-2 SUBUNIT (
TRANSDUCIN ALPHA-2 CHAIN). - BOVINE (BOS TAURUS).
OPS3_DROME OPSIN RH3 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NAME: RH3 OR
RH92CD). - FRUIT FLY (DROSOPHILA MELANOGASTER).
OPS4_DROME OPSIN RH4 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NAME: RH4). -
FRUIT FLY (DROSOPHILA MELANOGASTER).
OPSD_HUMAN RHODOPSIN. - HUMAN (HOMO SAPIENS).
OPSD_MOUSE RHODOPSIN. - MOUSE (MUS MUSCULUS).
OPSD_OCTDO RHODOPSIN. - GIANT OCTOPUS (OCTOPUS DOFLEINI).
The output from this query is shown in figure 6.
Again just the title lines of matching entries are shown. But wait!
What is entry QRHYB2 doing there? The title line doesn't contain the
word `opsin'! The answer of course is that `opsin' appears elsewhere
in the text for that protein. We'll deal with all the functionality
of the display command later but for now, just to prove QRHYB2 does
contain `opsin' type
(fig 7)
DELPHOS> display/comment
Figure No.7
WORKLIST ENTRIES (15):
OOBO Rhodopsin - Bovine
Species: Bos primigenius taurus (cattle)
Accession: A03154
Introns: 121/1, 177/2, 232/3, 312/3
Superfamily: vertebrate rhodopsin
Keywords: photoreceptor; chromoprotein; glycoprotein;
acetylation; transmembrane protein
1/Modified site: acetylated amino end
2,15/Binding site: carbohydrate (Asn)
296/Binding site: retinal chromophore
OOFF Rhodopsin - Fruit fly
Species: Drosophila melanogaster
Accession: A22012
The domains were proposed from hydropathy indices.
Some or all of the carboxyl-terminal Ser or Thr residues may be
phosphorylated.
Map position: 3R66 (92B8-11)
Gene name: ninaE
Introns: 3/2, 190/2, 239/3, 332/2
Superfamily: vertebrate rhodopsin
Keywords: photoreceptor; chromoprotein; transmembrane protein
1-49/Domain: extracellular I
50-74/Domain: transmembrane I
75-86/Domain: intracellular I
87-109/Domain: transmembrane II
110-127/Domain: extracellular II
128-153/Domain: transmembrane III
154-160/Domain: intracellular II
161-181/Domain: transmembrane IV
182-215/Domain: extracellular III
216-243/Domain: transmembrane V
244-276/Domain: intracellular III
277-300/Domain: transmembrane VI
301-308/Domain: extracellular IV
309-332/Domain: transmembrane VII
333-373/Domain: intracellular IV
OOFF2 Opsin 2 - Fruit fly
Species: Drosophila melanogaster
Accession: A24058
This protein is specifically expressed in photoreceptor cell R8
of the Drosophila compound eye.
Map position: 3R (91D1-2)
Gene name: Rh2
Introns: 33/3, 339/2, 350/3
Superfamily: vertebrate rhodopsin
Keywords: photoreceptor; chromophore; transmembrane protein
326/Binding site: retinal chromophore (Lys) (by homology)
1-56/Domain: extracellular 1 (by homology)
57-81/Domain: transmembrane 1 (by homology)
82-93/Domain: intracellular 1 (by homology)
94-116/Domain: transmembrane 2 (by homology)
117-134/Domain: extracellular 2 (by homology)
135-160/Domain: transmembrane 3 (by homology)
161-167/Domain: intracellular 2 (by homology)
168-188/Domain: transmembrane 4 (by homology)
189-222/Domain: extracellular 3 (by homology)
223-250/Domain: transmembrane 5 (by homology)
251-283/Domain: intracellular 3 (by homology)
284-307/Domain: transmembrane 6 (by homology)
308-315/Domain: extracellular 4 (by homology)
316-339/Domain: transmembrane 7 (by homology)
340-381/Domain: intracellular 4 (by homology)
OOHUB Blue-sensitive opsin - Human
Species: Homo sapiens (man)
Accession: A03156
The source of this protein is retinal cones.
Map position: 7q22-qter
Gene name: BCP
Introns: 118/1, 174/2, 229/3, 309/3
Superfamily: vertebrate rhodopsin
Keywords: color vision; membrane protein
34-57,71-94,113-136,149-172,198-221,250-273,283-306/Region:
transmembrane segment (probable)
OOHUR Red-sensitive opsin - Human
Species: Homo sapiens (man)
Accession: A03157
The source of this protein is retinal cones.
Map position: Xq22-qter
Gene name: RCP
Introns: 38/1, 137/1, 193/2, 248/3, 328/3
Superfamily: vertebrate rhodopsin
Keywords: color vision; sex-linked inheritance; membrane protein
53-76,90-113,132-155,168-191,217-240,269-292,302-325/Region:
transmembrane segment (probable)
OOHUG Green-sensitive opsin - Human
Species: Homo sapiens (man)
Accession: A03158
The source of this protein is retinal cones.
Map position: Xq22-q28
Gene name: GCP
Introns: 38/1, 137/1, 193/2, 248/3, 328/3
Superfamily: vertebrate rhodopsin
Keywords: color vision; sex-linked inheritance; membrane protein
53-76,90-113,132-155,168-191,217-240,269-292,302-325/Region:
transmembrane segment (probable)
QRHYB2 Beta-2-adrenergic receptor - Hamster
Species: Cricetinae gen. sp. (hamster)
Accession: A03159
This protein may have up to seven hydrophobic membrane-spanning
helices, as does rhodopsin, but the exact limits have not yet
been determined.
This protein was isolated from the lung.
Superfamily: vertebrate rhodopsin
Keywords: transmembrane protein; glycoprotein; receptor; lung;
phosphoprotein; rhodopsin homolog
6,15/Binding site: carbohydrate (Asn) (putative)
261,262,345,346,347/Binding site: phosphate (putative)
GBT1_BOVIN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-1 SUBUNIT (
TRANSDUCIN ALPHA-1 CHAIN). - BOVINE (BOS TAURUS).
Species: BOVINE (BOS TAURUS).
Accession: P04695
13-AUG-1987 (REL. 05, CREATED)
13-AUG-1987 (REL. 05, LAST SEQUENCE UPDATE)
01-NOV-1988 (REL. 09, LAST ANNOTATION UPDATE)
-!- FUNCTION: GUANINE NUCLEOTIDE-BINDING PROTEINS (G PROTEINS)
ARE
INVOLVED AS A MODULATOR OR TRANSDUCER IN VARIOUS TRANSMEMBRANE
SIGNALING SYSTEMS.
-!- FUNCTION: TRANSDUCIN IS AN AMPLIFIER AND ONE OF THE
TRANSDUCERS OF
A VISUAL IMPULSE THAT PERFORMS THE COUPLING BETWEEN RHODOPSIN AND
CGMP-PHSOPHODIESTERASE.
-!- SUBUNIT: G PROTEIN ARE COMPOSED OF 3 UNITS (ALPHA, BETA &
GAMMA).
THE BETA AND GAMMA UNITS APPEAR TO BE COMMON TO ALL G PROTEINS.
-!- TRANSDUCIN ALPHA-1 CHAIN IS FOUND IN ROD.
EMBL; K03253; BTTRA.
EMBL; K03254; BTTRNAM.
EMBL; X02440; BTTRDAR.
BINDING 174 174 ADP-RIBOSE (BY ACTION OF CHOLERA TOXIN).
BINDING 347 347 ADP-RIBOSE (BY ACTION OF IAP).
NP_BIND 31 50 GTP (PROBABLE).
NP_BIND 80 101 GTP (PROBABLE).
NP_BIND 113 116 GTP (PROBABLE).
NP_BIND 208 222 GTP (PROBABLE).
NP_BIND 265 268 GTP (PROBABLE).
Keywords: GTP-BINDING; TRANSDUCER; MULTIGENE FAMILY.
GBT1_HUMAN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-1 SUBUNIT (
TRANSDUCIN ALPHA-1 CHAIN) (GENE NAME: GNAT1). - HUMAN (HOMO
SAPIENS).
Species: HUMAN (HOMO SAPIENS).
Accession: P11488
01-OCT-1989 (REL. 12, CREATED)
01-OCT-1989 (REL. 12, LAST SEQUENCE UPDATE)
01-OCT-1989 (REL. 12, LAST ANNOTATION UPDATE)
-!- FUNCTION: GUANINE NUCLEOTIDE-BINDING PROTEINS (G PROTEINS)
ARE
INVOLVED AS A MODULATOR OR TRANSDUCER IN VARIOUS TRANSMEMBRANE
SIGNALING SYSTEMS.
-!- FUNCTION: TRANSDUCIN IS AN AMPLIFIER AND ONE OF THE
TRANSDUCERS OF
A VISUAL IMPULSE THAT PERFORMS THE COUPLING BETWEEN RHODOPSIN AND
CGMP-PHSOPHODIESTERASE.
-!- SUBUNIT: G PROTEIN ARE COMPOSED OF 3 UNITS (ALPHA, BETA &
GAMMA).
THE BETA AND GAMMA UNITS APPEAR TO BE COMMON TO ALL G PROTEINS.
-!- TRANSDUCIN ALPHA-1 CHAIN IS FOUND IN ROD.
EMBL; X15088; HSGNAT1.
BINDING 174 174 ADP-RIBOSE (BY ACTION OF CHOLERA TOXIN).
BINDING 347 347 ADP-RIBOSE (BY ACTION OF IAP).
NP_BIND 31 50 GTP (PROBABLE).
NP_BIND 80 101 GTP (PROBABLE).
NP_BIND 113 116 GTP (PROBABLE).
NP_BIND 208 222 GTP (PROBABLE).
NP_BIND 265 268 GTP (PROBABLE).
Keywords: GTP-BINDING; TRANSDUCER; MULTIGENE FAMILY.
GBT2_BOVIN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-2 SUBUNIT (
TRANSDUCIN ALPHA-2 CHAIN). - BOVINE (BOS TAURUS).
Species: BOVINE (BOS TAURUS).
Accession: P04696
13-AUG-1987 (REL. 05, CREATED)
13-AUG-1987 (REL. 05, LAST SEQUENCE UPDATE)
01-NOV-1988 (REL. 09, LAST ANNOTATION UPDATE)
-!- FUNCTION: GUANINE NUCLEOTIDE-BINDING PROTEINS (G PROTEINS)
ARE
INVOLVED AS A MODULATOR OR TRANSDUCER IN VARIOUS TRANSMEMBRANE
SIGNALING SYSTEMS
-!- FUNCTION: TRANSDUCIN IS AN AMPLIFIER AND ONE OF THE
TRANSDUCERS OF
A VISUAL IMPULSE THAT PERFORMS THE COUPLING BETWEEN RHODOPSIN AND
CGMP-PHSOPHODIESTERASE.
-!- SUBUNIT: G PROTEIN ARE COMPOSED OF 3 UNITS (ALPHA, BETA &
GAMMA).
THE BETA AND GAMMA UNITS APPEAR TO BE COMMON TO ALL G PROTEINS.
-!- TRANSDUCIN ALPHA-2 CHAIN IS FOUND IN OUTER SEGMENTS.
EMBL; M11116; BTNA2.
MOD_RES 2 2 ACETYLATION (BY HOMOLOGY WITH RAS).
BINDING 178 178 ADP-RIBOSE (BY ACTION OF CHOLERA TOXIN).
BINDING 351 351 ADP-RIBOSE (BY ACTION OF IAP).
NP_BIND 260 276 GTP (PROBABLE).
Keywords: GTP-BINDING; TRANSDUCER; MULTIGENE FAMILY.
OPS3_DROME OPSIN RH3 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NAME: RH3 OR
RH92CD). - FRUIT FLY (DROSOPHILA MELANOGASTER).
Species: FRUIT FLY (DROSOPHILA MELANOGASTER).
Accession: P04950
13-AUG-1987 (REL. 05, CREATED)
13-AUG-1987 (REL. 05, LAST SEQUENCE UPDATE)
01-JAN-1990 (REL. 13, LAST ANNOTATION UPDATE)
-!- FUNCTION: VISUAL PIGMENTS ARE THE LIGHT-ABSORBING MOLECULES
THAT
MEDIATE VISION. THEY CONSIST OF AN APOPROTEIN, OPSIN, COVALENTLY
LINKED TO CIS-RETINAL.
-!- EACH DROSOPHILA EYE IS COMPOSED OF 800 FACETS OR OMMATIDIA.
EACH
OMMATIDIUM CONTAINS 8 PHOTORECEPTOR CELLS (R1-R8), THE R1 TO R6
CELLS ARE OUTER CELLS, WHILE R7 AND R8 ARE INNER CELLS.
-!- OPSIN RH3 IS SENSITIVE TO UV LIGHT.
-!- SOME OR ALL OF THE CARBOXYL-TERMINAL SER OR THR RESIDUES MAY
BE
PHOSPHORYLATED.
-!- SIMILARITY: TO ALL OTHER G-PROTEIN COUPLED RECEPTORS.
EMBL; Y00043; DMRH92CD.
EMBL; M17718; DMRH3A.
PROSITE; PS00237; G_PROTEIN_RECEPTOR.
PROSITE; PS00238; OPSIN.
CARBOHYD 13 13 PROBABLE.
BINDING 328 328 RETINAL CHROMOPHORE.
DOMAIN 1 62 EXTRACELLULAR.
TRANSMEM 63 83
DOMAIN 84 95 CYTOPLASMIC.
TRANSMEM 96 115
DOMAIN 116 130 EXTRACELLULAR.
TRANSMEM 131 151
DOMAIN 152 171 CYTOPLASMIC.
TRANSMEM 172 192
DOMAIN 193 219 EXTRACELLULAR.
TRANSMEM 220 240
DOMAIN 241 288 CYTOPLASMIC.
TRANSMEM 289 309
DOMAIN 310 319 EXTRACELLULAR.
TRANSMEM 320 340
DOMAIN 341 383 CYTOPLASMIC.
Keywords: PHOTORECEPTOR; RETINAL PROTEIN; TRANSMEMBRANE;
PHOSPHORYLATION; GLYCOPROTEIN; G-PROTEIN COUPLED RECEPTOR;
VISION.
OPS4_DROME OPSIN RH4 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NAME: RH4). -
FRUIT FLY (DROSOPHILA MELANOGASTER).
Species: FRUIT FLY (DROSOPHILA MELANOGASTER).
Accession: P08255
01-AUG-1988 (REL. 08, CREATED)
01-AUG-1988 (REL. 08, LAST SEQUENCE UPDATE)
01-JAN-1990 (REL. 13, LAST ANNOTATION UPDATE)
-!- FUNCTION: VISUAL PIGMENTS ARE THE LIGHT-ABSORBING MOLECULES
THAT
MEDIATE VISION. THEY CONSIST OF AN APOPROTEIN, OPSIN, COVALENTLY
LINKED TO CIS-RETINAL.
-!- EACH DROSOPHILA EYE IS COMPOSED OF 800 FACETS OR OMMATIDIA.
EACH
OMMATIDIUM CONTAINS 8 PHOTORECEPTOR CELLS (R1-R8), THE R1 TO R6
CELLS ARE OUTER CELLS, WHILE R7 AND R8 ARE INNER CELLS.
-!- OPSIN RH4 IS SENSITIVE TO UV LIGHT.
-!- SOME OR ALL OF THE CARBOXYL-TERMINAL SER OR THR RESIDUES MAY
BE
PHOSPHORYLATED.
-!- SIMILARITY: TO ALL OTHER G-PROTEIN COUPLED RECEPTORS.
EMBL; M17719; DMRH4A1.
EMBL; M17730; DMRH4A2.
PROSITE; PS00237; G_PROTEIN_RECEPTOR.
PROSITE; PS00238; OPSIN.
CARBOHYD 6 6 PROBABLE.
BINDING 324 324 RETINAL CHROMOPHORE.
DOMAIN 1 58 EXTRACELLULAR.
TRANSMEM 59 79
DOMAIN 80 91 CYTOPLASMIC.
TRANSMEM 92 111
DOMAIN 112 126 EXTRACELLULAR.
TRANSMEM 127 147
DOMAIN 148 167 CYTOPLASMIC.
TRANSMEM 168 188
DOMAIN 189 215 EXTRACELLULAR.
TRANSMEM 216 236
DOMAIN 237 284 CYTOPLASMIC.
TRANSMEM 285 305
DOMAIN 306 315 EXTRACELLULAR.
TRANSMEM 316 336
DOMAIN 337 378 CYTOPLASMIC.
Keywords: PHOTORECEPTOR; RETINAL PROTEIN; TRANSMEMBRANE;
PHOSPHORYLATION; GLYCOPROTEIN; G-PROTEIN COUPLED RECEPTOR;
VISION.
OPSD_HUMAN RHODOPSIN. - HUMAN (HOMO SAPIENS).
Species: HUMAN (HOMO SAPIENS).
Accession: P08100
01-AUG-1988 (REL. 08, CREATED)
01-AUG-1988 (REL. 08, LAST SEQUENCE UPDATE)
01-JAN-1990 (REL. 13, LAST ANNOTATION UPDATE)
-!- FUNCTION: VISUAL PIGMENTS ARE THE LIGHT-ABSORBING MOLECULES
THAT
MEDIATE VISION. THEY CONSIST OF AN APOPROTEIN, OPSIN, COVALENTLY
LINKED TO CIS-RETINAL.
-!- RHODOPSIN IS FOUND IN ROD SHAPED PHOTORECEPTOR CELLS WHICH
MEDIATES VISION IN DIM LIGHT.
-!- RHODOPSIN HAS AN ABSORPTION MAXIMA AT 495 NM.
-!- SOME OR ALL OF THE CARBOXYL-TERMINAL SER OR THR RESIDUES MAY
BE
PHOSPHORYLATED.
-!- SIMILARITY: TO ALL OTHER G-PROTEIN COUPLED RECEPTORS.
EMBL; K02281; HSOPS.
PROSITE; PS00237; G_PROTEIN_RECEPTOR.
PROSITE; PS00238; OPSIN.
MOD_RES 1 1 ACETYLATION (BY HOMOLOGY).
CARBOHYD 2 2 BY HOMOLOGY.
CARBOHYD 15 15 BY HOMOLOGY.
BINDING 296 296 RETINAL CHROMOPHORE.
BINDING 322 322 PALMITYL (BY HOMOLOGY).
BINDING 323 323 PALMITYL (BY HOMOLOGY).
DOMAIN 1 36 EXTRACELLULAR.
TRANSMEM 37 61
DOMAIN 62 73 CYTOPLASMIC.
TRANSMEM 74 98
DOMAIN 99 113 EXTRACELLULAR.
TRANSMEM 114 140
DOMAIN 141 152 CYTOPLASMIC.
TRANSMEM 153 176
DOMAIN 173 202 EXTRACELLULAR.
TRANSMEM 203 230
DOMAIN 231 252 CYTOPLASMIC.
TRANSMEM 253 276
DOMAIN 277 284 EXTRACELLULAR.
TRANSMEM 285 309
DOMAIN 310 348 CYTOPLASMIC.
Keywords: PHOTORECEPTOR; RETINAL PROTEIN; TRANSMEMBRANE;
GLYCOPROTEIN; VISION; PHOSPHORYLATION; LIPOPROTEIN; ACETYLATION;
G-PROTEIN COUPLED RECEPTOR.
OPSD_MOUSE RHODOPSIN. - MOUSE (MUS MUSCULUS).
Species: MOUSE (MUS MUSCULUS).
Accession: P15409
01-APR-1990 (REL. 14, CREATED)
01-APR-1990 (REL. 14, LAST SEQUENCE UPDATE)
01-APR-1990 (REL. 14, LAST ANNOTATION UPDATE)
-!- FUNCTION: VISUAL PIGMENTS ARE THE LIGHT-ABSORBING MOLECULES
THAT
MEDIATE VISION. THEY CONSIST OF AN APOPROTEIN, OPSIN, COVALENTLY
LINKED TO CIS-RETINAL.
-!- RHODOPSIN IS FOUND IN ROD SHAPED PHOTORECEPTOR CELLS WHICH
MEDIATES VISION IN DIM LIGHT.
-!- RHODOPSIN HAS AN ABSORPTION MAXIMA AT 495 NM.
-!- SOME OR ALL OF THE CARBOXYL-TERMINAL SER OR THR RESIDUES MAY
BE
PHOSPHORYLATED.
-!- SIMILARITY: TO ALL OTHER G-PROTEIN COUPLED RECEPTORS.
PIR; S01656; S01656.
PROSITE; PS00237; G_PROTEIN_RECEPTOR.
PROSITE; PS00238; OPSIN.
CARBOHYD 2 2 BY HOMOLOGY.
CARBOHYD 15 15 BY HOMOLOGY.
BINDING 296 296 RETINAL CHROMOPHORE.
BINDING 322 322 PALMITYL (BY HOMOLOGY).
BINDING 323 323 PALMITYL (BY HOMOLOGY).
DOMAIN 1 36 EXTRACELLULAR.
TRANSMEM 37 61
DOMAIN 62 73 CYTOPLASMIC.
TRANSMEM 74 98
DOMAIN 99 113 EXTRACELLULAR.
TRANSMEM 114 140
DOMAIN 141 152 CYTOPLASMIC.
TRANSMEM 153 176
DOMAIN 173 202 EXTRACELLULAR.
TRANSMEM 203 230
DOMAIN 231 252 CYTOPLASMIC.
TRANSMEM 253 276
DOMAIN 277 284 EXTRACELLULAR.
TRANSMEM 285 309
DOMAIN 310 348 CYTOPLASMIC.
OPSD_OCTDO RHODOPSIN. - GIANT OCTOPUS (OCTOPUS DOFLEINI).
Species: GIANT OCTOPUS (OCTOPUS DOFLEINI).
Accession: P09241
01-MAR-1989 (REL. 10, CREATED)
01-MAR-1989 (REL. 10, LAST SEQUENCE UPDATE)
01-JAN-1990 (REL. 13, LAST ANNOTATION UPDATE)
-!- FUNCTION: VISUAL PIGMENTS ARE THE LIGHT-ABSORBING MOLECULES
THAT
MEDIATE VISION. THEY CONSIST OF AN APOPROTEIN, OPSIN, COVALENTLY
LINKED TO CIS-RETINAL.
-!- RHODOPSIN IS FOUND IN ROD SHAPED PHOTORECEPTOR CELLS WHICH
MEDIATES VISION IN DIM LIGHT.
-!- RHODOPSIN HAS AN ABSORPTION MAXIMA AT 495 NM.
-!- SOME OR ALL OF THE CARBOXYL-TERMINAL SER OR THR RESIDUES MAY
BE
PHOSPHORYLATED.
-!- SIMILARITY: TO ALL OTHER G-PROTEIN COUPLED RECEPTORS.
EMBL; X07797; PDRHOD.
PROSITE; PS00237; G_PROTEIN_RECEPTOR.
PROSITE; PS00238; OPSIN.
CARBOHYD 9 9 PROBABLE.
CARBOHYD 15 15 PROBABLE.
BINDING 306 306 RETINAL CHROMOPHORE.
BINDING 337 337 PALMITYL (BY HOMOLOGY).
BINDING 338 338 PALMITYL (BY HOMOLOGY).
DOMAIN 1 36 EXTRACELLULAR.
TRANSMEM 37 61
DOMAIN 62 73 CYTOPLASMIC.
TRANSMEM 74 98
DOMAIN 99 107 EXTRACELLULAR.
TRANSMEM 108 131
DOMAIN 132 152 CYTOPLASMIC.
TRANSMEM 153 176
DOMAIN 173 200 EXTRACELLULAR.
TRANSMEM 201 224
DOMAIN 225 262 CYTOPLASMIC.
TRANSMEM 263 287
DOMAIN 288 299 EXTRACELLULAR.
TRANSMEM 300 323
DOMAIN 324 455 CYTOPLASMIC.
Keywords: PHOTORECEPTOR; RETINAL PROTEIN; TRANSMEMBRANE;
GLYCOPROTEIN; VISION; PHOSPHORYLATION; LIPOPROTEIN; G-PROTEIN
COUPLED RECEPTOR.
This command will show all comment information for every entry in the
worklist as well as the title lines. Look in the comment information
for entry QRHYB2 and you'll find `opsin' (figure 7).
2.5 An example fuzzy sequence search
Sometimes you'll not want to look for exact sequence matches. Instead,
you may want some latitude in the search. A fuzzy sequence match
allows you to specify mismatches in the query sequence, the DELPHOS
`fseq' function is used in these cases. Type
DELPHOS> display/info fseq "(c)aq(ch) 1" (fig 8)
Figure No.8
Matches for FSEQ probe (C)AQ(CH) are:
CCHU 14 GDVEKGKKIFIMK CSQCH TVEKGGKHKTGPNLHG
Cytochrome c - Human
CCOS 14 GDIEKGKKIFVQK CSQCH TVEKGGKHKTGPNLDG
Cytochrome c - Ostrich
CCSF 14 GQVEKGKKIFVQR CAQCH TVEKAGKHKTGPNLNG
Cytochrome c - Common European starfish
CCAB 22 APPGBAKAGEKIFKTK CAQCH TVEKGAGHKQGPNLNG
Cytochrome c - Chingma mallow
CCBF6 14 ADIENGERIFTAN CAACH AGGNNVIMPEKTLKKD
Cytochrome c6 - Bumilleriopsis filiformis
RDC8_CANFA 71 VGVLAIPFAITISTGF CAACH NCLFFACFVLVLTQSS
PROBABLE G PROTEIN-COUPLED RECEPTOR RDC8 (GENE NAME: RDC8). -
CYTO84 22 APPGNPKAGEKIFKTK CAQCH TVEKGAGHKQGPNLNG
CYTOCHROME C TOMATO - TOMATO (LYCOPERSICON ESCULENTUM)
NRL_1CYC1 14 GDVAKGKKTFVQK CAQCH TVENGGKHKVGPNLWG
FERROCYTOCHROME C - BONITO (KATSUWONUS PELAMIS, LINNAEUS)
NRL_3CYT1 14 GDVAKGKKTFVQK CAQCH TVENGGKHKVGPNLWG
CYTOCHROME C (OXIDIZED) - ALBACORE TUNA (THUNNUS ALALUNGA) HEA
WORKLIST ENTRIES (9):
CCHU Cytochrome c - Human
CCOS Cytochrome c - Ostrich
CCSF Cytochrome c - Common European starfish
CCAB Cytochrome c - Chingma mallow
CCBF6 Cytochrome c6 - Bumilleriopsis filiformis
RDC8_CANFA PROBABLE G PROTEIN-COUPLED RECEPTOR RDC8 (GENE NAME: RDC8). - DOG
(CANIS FAMILIARIS).
CYTO84 CYTOCHROME C TOMATO - TOMATO (LYCOPERSICON ESCULENTUM)
NRL_1CYC1 FERROCYTOCHROME C - BONITO (KATSUWONUS PELAMIS, LINNAEUS)
NRL_3CYT1 CYTOCHROME C (OXIDIZED) - ALBACORE TUNA (THUNNUS ALALUNGA) HEART
This query consists of two parts. The first part specifies the search
sequence in single letter amino acid codes with optional parentheses.
Mismatches are only allowed for those letters which are not enclosed
by parentheses. The second part is a positive integer (or a zero)
which specifies the maximum number of allowed mismatches. The query
above therefore gets all entries which contain
cys-ala-anything-cys-his or
cys-anything-gln-cys-his
This is because 1 mismatch is allowed and there are only 2 residues
where this may happen (fig 8).
Fseq also allows you to specify that you don't mind what a particular
amino acid is; this is done using the letter `x'. Type
DELPHOS> display/info fseq "(c)xx(ch) 0"
Figure No.9
Matches for FSEQ probe (C)XX(CH) are:
CCHU 14 GDVEKGKKIFIMK CSQCH TVEKGGKHKTGPNLHG
Cytochrome c - Human
CCOS 14 GDIEKGKKIFVQK CSQCH TVEKGGKHKTGPNLDG
Cytochrome c - Ostrich
CCSF 14 GQVEKGKKIFVQR CAQCH TVEKAGKHKTGPNLNG
Cytochrome c - Common European starfish
CCAB 22 APPGBAKAGEKIFKTK CAQCH TVEKGAGHKQGPNLNG
Cytochrome c - Chingma mallow
CCNA5A 13 GDVEAGKAAFNK CKACH EIGESAKNKVGPELDG
Cytochrome c550 - Nitrobacter winogradskyi
CCBF6 14 ADIENGERIFTAN CAACH AGGNNVIMPEKTLKKD
Cytochrome c6 - Bumilleriopsis filiformis
CCDS7 26 KGNVTFDHKAHAEKLG CDACH EGTPAKIAIDKKSAHK
Cytochrome c7 (c551.5) - Desulfuromonas acetoxidans
CCDS7 49 TPAKIAIDKKSAHKDA CKTCH KSNNGPTKCGGCHIK
Cytochrome c7 (c551.5) - Desulfuromonas acetoxidans
CCDS7 62 KDACKTCHKSNNGPTK CGGCH IK
Cytochrome c7 (c551.5) - Desulfuromonas acetoxidans
CCRFCX 117 GEASAFGPALKKLGGT CKACH DDYRAEH
Cytochrome c' - Rhodopseudomonas sp.
C553_DESVH 34 LAVSGVAADGAALYKS CIGCH GADGSKAAMGSAKPVK
CYTOCHROME C553 PRECURSOR. - DESULFOVIBRIO VULGARIS (STRAIN HI
CYCR_RHOVI 107 LRTMTAITEWVSPQEG CTYCH DENNLASEAKYPYVVA
CYTOCHROME C SUBUNIT OF THE PHOTOSYNTHETIC REACTION CENTER PRE
CYCR_RHOVI 152 AINTNWTQHVAQTGVT CYTCH RGTPLPPYVRYLEPTL
CYTOCHROME C SUBUNIT OF THE PHOTOSYNTHETIC REACTION CENTER PRE
CYCR_RHOVI 264 ATFALMMSISDSLGTN CTFCH NAQTFESWGKKSTPQR
CYTOCHROME C SUBUNIT OF THE PHOTOSYNTHETIC REACTION CENTER PRE
CYCR_RHOVI 325 LPASRLGRQGEAPQAD CRTCH QGVTKPLFGASRLKDY
CYTOCHROME C SUBUNIT OF THE PHOTOSYNTHETIC REACTION CENTER PRE
RDC8_CANFA 71 VGVLAIPFAITISTGF CAACH NCLFFACFVLVLTQSS
PROBABLE G PROTEIN-COUPLED RECEPTOR RDC8 (GENE NAME: RDC8). -
PDECYT550 35 AAQDGDAAKGEKEFNK CKACH MIQAPDGTDIIKGGKT
PDECYT550 cytochrome c550 precursor - Paracoccus denitrificans
PALMT13 14 RKVHAKGASLFFI CMYCH IGRGLYYG
PALMT13 cytochrome b (AA at 1) - Mitochondrion Paracentrotus l
CYTO84 22 APPGNPKAGEKIFKTK CAQCH TVEKGAGHKQGPNLNG
CYTOCHROME C TOMATO - TOMATO (LYCOPERSICON ESCULENTUM)
NRL_155C1 15 NEGDAAKGEKEFNK CKACH MIQAPDGTDIKGGKTG
CYTOCHROME C550 - (PARACOCCUS DENITRIFICANS) ATCC 13543
NRL_1CYC1 14 GDVAKGKKTFVQK CAQCH TVENGGKHKVGPNLWG
FERROCYTOCHROME C - BONITO (KATSUWONUS PELAMIS, LINNAEUS)
NRL_2C2C1 14 EGDAAAGEKVSKK CLACH TFDQGGANKVGPNLFG
CYTOCHROME C2 (OXIDIZED) - (RHODOSPIRILLUM RUBRUM)
NRL_2CCY1 117 AGPDALKAQAAATGKV CKACH EEFKQD
CYTOCHROME C' - (RHODOSPIRILLUM MOLISCHIANUM)
NRL_2CDV1 30 TKQPVVFNHSTHKAVK CGDCH HPVNGKENYQKCATAG
CYTOCHROME C3 - (DESULFOVIBRIO VULGARIS MIYAZAKI IAM 12604)
NRL_2CDV1 79 KGYYHAMHDKGTKFKS CVGCH LETAGADAAKKKELTG
CYTOCHROME C3 - (DESULFOVIBRIO VULGARIS MIYAZAKI IAM 12604)
NRL_351C1 12 EDPEVLFKNKG CVACH AIDTKMVGPAYKDVAA
CYTOCHROME C551 (OXIDIZED) - (PSEUDOMONAS AERUGINOSA)
NRL_3CYT1 14 GDVAKGKKTFVQK CAQCH TVENGGKHKVGPNLWG
CYTOCHROME C (OXIDIZED) - ALBACORE TUNA (THUNNUS ALALUNGA) HEA
WORKLIST ENTRIES (21):
CCHU Cytochrome c - Human
CCOS Cytochrome c - Ostrich
CCSF Cytochrome c - Common European starfish
CCAB Cytochrome c - Chingma mallow
CCNA5A Cytochrome c550 - Nitrobacter winogradskyi
CCBF6 Cytochrome c6 - Bumilleriopsis filiformis
CCDS7 Cytochrome c7 (c551.5) - Desulfuromonas acetoxidans
CCRFCX Cytochrome c' - Rhodopseudomonas sp.
C553_DESVH CYTOCHROME C553 PRECURSOR. - DESULFOVIBRIO VULGARIS (STRAIN
HILDENBOROUGH).
CYCR_RHOVI CYTOCHROME C SUBUNIT OF THE PHOTOSYNTHETIC REACTION CENTER
PRECURSOR (C558/C559). - RHODOPSEUDOMONAS VIRIDIS.
RDC8_CANFA PROBABLE G PROTEIN-COUPLED RECEPTOR RDC8 (GENE NAME: RDC8). - DOG
(CANIS FAMILIARIS).
PDECYT550 PDECYT550 P.denitrificans cytochrome c550 gene, complete cds, and
iso-cytochrome oxidase subunit I (iso-COI) gene, 5' end. -
Paracoccus denitrificans Prokaryota
PALMT13 PALMT13 P.lividus mitochondrial (Bam2 B fragment) cytochrome b,
partial cds. - Mitochondrion Paracentrotus lividus Eukaryota
CYTO84 CYTOCHROME C TOMATO - TOMATO (LYCOPERSICON ESCULENTUM)
NRL_155C1 CYTOCHROME C550 - (PARACOCCUS DENITRIFICANS) ATCC 13543
NRL_1CYC1 FERROCYTOCHROME C - BONITO (KATSUWONUS PELAMIS, LINNAEUS)
NRL_2C2C1 CYTOCHROME C2 (OXIDIZED) - (RHODOSPIRILLUM RUBRUM)
NRL_2CCY1 CYTOCHROME C' - (RHODOSPIRILLUM MOLISCHIANUM)
NRL_2CDV1 CYTOCHROME C3 - (DESULFOVIBRIO VULGARIS MIYAZAKI IAM 12604)
NRL_351C1 CYTOCHROME C551 (OXIDIZED) - (PSEUDOMONAS AERUGINOSA)
NRL_3CYT1 CYTOCHROME C (OXIDIZED) - ALBACORE TUNA (THUNNUS ALALUNGA) HEART
2.6 Looking at one protein and introducing DISPLAY flexibility
You can look at any protein you want in the database by using the
`code' function. Type
DELPHOS> display code "oobo" (fig 10)
Figure No.10
WORKLIST ENTRIES (1):
OOBO Rhodopsin - Bovine
This will put just one entry into the worklist, that entry with the pcode
`oobo' namely bovine rhodopsin. You'll just have got the title line
again. Obviously a database entry contains more information than the
title line, the sequence for example!
To display the sequence of this entry, which is now in the worklist
type
DELPHOS> display/sequence (fig 11)
Figure No.11
WORKLIST ENTRIES (1):
OOBO Rhodopsin - Bovine
Ala A 29 Cys C 10 Asp D 5 Glu E 17
Phe F 31 Gly G 23 His H 6 Ile I 22
Lys K 11 Leu L 28 Met M 16 Asn N 15
Pro P 20 Gln Q 12 Arg R 7 Ser S 15
Thr T 27 Val V 31 Trp W 5 Tyr Y 18
Mol. wt. (calc) = 38962 Residues = 348
1 M N G T E G P N F Y V P F S N K T G V V R S P F E A P Q Y Y
31 L A E P W Q F S M L A A Y M F L L I M L G F P I N F L T L Y
61 V T V Q H K K L R T P L N Y I L L N L A V A D L F M V F G G
91 F T T T L Y T S L H G Y F V F G P T G C N L E G F F A T L G
121 G E I A L W S L V V L A I E R Y V V V C K P M S N F R F G E
151 N H A I M G V A F T W V M A L A C A A P P L V G W S R Y I P
181 E G M Q C S C G I D Y Y T P H E E T N N E S F V I Y M F V V
211 H F I I P L I V I F F C Y G Q L V F T V K E A A A Q Q Q E S
241 A T T Q K A E K E V T R M V I I M V I A F L I C W L P Y A G
271 V A F Y I F T H Q G S D F G P I F M T I P A F F A K T S A V
301 Y N P V I Y I M M N K Q F R N C M V T T L C C G K N P L G D
331 D E A S T T V S K T E T S Q V A P A
Similarly, to display authors and papers, alternative names and comment
information try typing
DELPHOS> display/author (fig 12)
Figure No.12
WORKLIST ENTRIES (1):
OOBO Rhodopsin - Bovine
Nathans, J., and Hogness, D.S.Cell 34, 807-814, 1983 (Sequence
translated from the DNA sequence)
Ovchinnikov, Y.A.FEBS Lett. 148, 179-191, 1982 (Complete sequence)
Koike, S., Nabeshima, Y., Ogata, K., Fukui, T., Ohtsuka, E.,
Ikehara, M., and Tokunaga, F.Biochem. Biophys. Res. Commun. 116,
563-567, 1983 (Sequence of residues 205-348 translated from the
mRNA sequence)
This sequence differs from that shown in having 213-Val.
Hargrave, P.A.submitted to the Protein Sequence Database, June
1984 (Carbohydrate binding sites)
Mullen, E., and Akhtar, M.Biochem. J. 211, 45-54, 1983 (Retinal
binding site)
DELPHOS> display/alternative (oobo has no alternative names) (fig 13)
Figure No.13
WORKLIST ENTRIES (1):
OOBO Rhodopsin - Bovine
DELPHOS> display/comment (fig 14)
Figure No.14
WORKLIST ENTRIES (1):
OOBO Rhodopsin - Bovine
Species: Bos primigenius taurus (cattle)
Accession: A03154
Introns: 121/1, 177/2, 232/3, 312/3
Superfamily: vertebrate rhodopsin
Keywords: photoreceptor; chromoprotein; glycoprotein;
acetylation; transmembrane protein
1/Modified site: acetylated amino end
2,15/Binding site: carbohydrate (Asn)
296/Binding site: retinal chromophore
The commands above list each type of information separately. If you want
the whole lot in one go type
DELPHOS> display/full (fig 15)
Figure No.15
WORKLIST ENTRIES (1):
OOBO Rhodopsin - Bovine
Species: Bos primigenius taurus (cattle)
Accession: A03154
Nathans, J., and Hogness, D.S.Cell 34, 807-814, 1983 (Sequence
translated from the DNA sequence)
Ovchinnikov, Y.A.FEBS Lett. 148, 179-191, 1982 (Complete sequence)
Koike, S., Nabeshima, Y., Ogata, K., Fukui, T., Ohtsuka, E.,
Ikehara, M., and Tokunaga, F.Biochem. Biophys. Res. Commun. 116,
563-567, 1983 (Sequence of residues 205-348 translated from the
mRNA sequence)
This sequence differs from that shown in having 213-Val.
Hargrave, P.A.submitted to the Protein Sequence Database, June
1984 (Carbohydrate binding sites)
Mullen, E., and Akhtar, M.Biochem. J. 211, 45-54, 1983 (Retinal
binding site)
Introns: 121/1, 177/2, 232/3, 312/3
Superfamily: vertebrate rhodopsin
Keywords: photoreceptor; chromoprotein; glycoprotein;
acetylation; transmembrane protein
1/Modified site: acetylated amino end
2,15/Binding site: carbohydrate (Asn)
296/Binding site: retinal chromophore
Ala A 29 Cys C 10 Asp D 5 Glu E 17
Phe F 31 Gly G 23 His H 6 Ile I 22
Lys K 11 Leu L 28 Met M 16 Asn N 15
Pro P 20 Gln Q 12 Arg R 7 Ser S 15
Thr T 27 Val V 31 Trp W 5 Tyr Y 18
Mol. wt. (calc) = 38962 Residues = 348
1 M N G T E G P N F Y V P F S N K T G V V R S P F E A P Q Y Y
31 L A E P W Q F S M L A A Y M F L L I M L G F P I N F L T L Y
61 V T V Q H K K L R T P L N Y I L L N L A V A D L F M V F G G
91 F T T T L Y T S L H G Y F V F G P T G C N L E G F F A T L G
121 G E I A L W S L V V L A I E R Y V V V C K P M S N F R F G E
151 N H A I M G V A F T W V M A L A C A A P P L V G W S R Y I P
181 E G M Q C S C G I D Y Y T P H E E T N N E S F V I Y M F V V
211 H F I I P L I V I F F C Y G Q L V F T V K E A A A Q Q Q E S
241 A T T Q K A E K E V T R M V I I M V I A F L I C W L P Y A G
271 V A F Y I F T H Q G S D F G P I F M T I P A F F A K T S A V
301 Y N P V I Y I M M N K Q F R N C M V T T L C C G K N P L G D
331 D E A S T T V S K T E T S Q V A P A
you'll get all the information (authors, alternative names, comments
and the sequence) available in the database for this protein.
Two other qualifiers to the display command are `output' and `printer'.
The printer option sends everything that appears on the screen to the
printer associated with your computer enabling you to keep a
permanent record on paper of the results of your query. Some sites
do not have a direct computer-printer connection; in this case the
`output' qualifier is the most useful. The `output' qualifier sends
everything that appears on the screen to a specified disc file as well.
This file can then be sent to any printer you wish. We recommend the
use of `output' rather than `printer'. Try typing
DELPHOS> display/output=oobo.title
Everything appears as before, the title line for OOBO is shown, but if you
now leave DELPHOS by typing
DELPHOS> quit
you'll find a file called `oobo.title' in your directory. If you're
on a VMS system you can look at the file by typing
$ type oobo.title or, if you're on a UNIX machine type
% cat oobo.title
In lots of cases the title line information is not enough, you'll want
full information. DELPHOS allows any combination of qualifiers to the
display command. Reenter DELPHOS and type
DELPHOS> display code "oobo"
DELPHOS> display/full/output=oobo.full
The resulting file `oobo.full' will contain all available database
information on bovine rhodopsin. Leave DELPHOS again, examine the
file, then reenter DELPHOS.
The beauty of the display qualifiers is that they can be used with
any display command. The file `oosh.full' was created using two steps in
the last example but it could have been created in one step by typing
DELPHOS> display/full/output=oosh.full code "oobo"
The use of the display qualifiers is not restricted to the `code'
function; they can be used with `seq', `title', `text' and `fseq' as
well. For example, try
DELPHOS> display/sequence seq "vpfsn"
DELPHOS> display/author text "hogness"
DELPHOS> display/comment/author/output=opsin.dat title "opsin"
The one restriction on the use of the display qualifiers is that the
`output' and `printer' qualifiers are mutually exclusive.
One display qualifier not yet mentioned is `info'. This qualifier is
rather special and is described in greater detail later. However, one
of its properties is that it displays context information with the
`seq' function. Try typing
DELPHOS> display seq "vpfsn" and then
DELPHOS> display/info seq "vpfsn" (fig 16)
Figure No.16
Matches for SEQ probe VPFSN are:
No. of matches = 4
OOBO 11 MNGTEGPNFY VPFSN KTGVVRSPFEAPQYYL
COX2_PARLI 214 EICGANHSFMPILIES VPFSN FENWVAQYIEE
OPSD_HUMAN 11 MNGTEGPNFY VPFSN ATGVVRSPFEYPQYYL
OPSD_MOUSE 11 MNGTEGPNFY VPFSN VTGVGRSPFEQPQYYL
WORKLIST ENTRIES (4):
OOBO Rhodopsin - Bovine
COX2_PARLI CYTOCHROME C OXIDASE POLYPEPTIDE II (EC 1.9.3.1) (GENE NAME: COII)
. - SEA URCHIN (PARACENTROTUS LIVIDUS).
OPSD_HUMAN RHODOPSIN. - HUMAN (HOMO SAPIENS).
OPSD_MOUSE RHODOPSIN. - MOUSE (MUS MUSCULUS).
The matching bits of sequence are only displayed if you use the
`info' qualifier.
2.7 Introducing NOT
There are some circumstances where you'll want to find all the
database entries which DO NOT have a particular characteristic.
In these circumstances you use the `not' function. This function
can be put before any of the other functions i.e. `seq', `title',
`code', `text' and `fseq'. Try typing
DELPHOS> display not title "opsin"
What you'll get is all the database entries which DON'T contain the
word `opsin' in their title lines, 489 of them! The `not' function is
particularly useful in the complex queries described later.
2.8 Multiple parameters
Up to now the tutorial has only shown simple queries; those which only
have one parameter. Keeping with our opsin examples, lets assume
you're only interested in one particular opsin, bovine rhodopsin.
Further, lets assume you don't know the pcode of bovine rhodopsin in
the database. The two queries
DELPHOS> display title "rhodopsin"
DELPHOS> display title "bovine"
are not good enough for what you need. To find the sequence you want you'd
probably have to correlate the results from both queries. DELPHOS
provides the means to look for bovine rhodopsin in one go using
compound parameters. Type
DELPHOS> display title "rhodopsin bovine"
What you get back in the worklist are only those entries which have
BOTH the words `rhodopsin' and `bovine' in the title line. The order
of the parameters is unimportant, type
DELPHOS> display title "bovine rhodopsin"
and you'll get the same result. Note that if you'd typed
DELPHOS> display title "bovinerhodopsin" or abbreviated it to
DELPHOS> display title "vinerhod"
then you'd miss those entries which contained the word `rhodopsin' before
the word `bovine'. Also note that, using the flavour of free text
searching
DELPHOS> display title "ovin hodops"
would work equally well and would take less time to find the entries,
after all, there is less to search for.
You can use compound parameters with the `seq' and `text' functions as
well. They have the same meaning as for the `title' function i.e.
DELPHOS> display seq "vpfsn tetsq"
DELPHOS> display text "opsin hogness"
would, in the first example, find only those proteins which contained
both val-pro-phe-ser-asn and thr-glu-thr-ser-gln in the same sequence.
The second example would find only those proteins which contained
both the words `opsin' and `hogness' within the same entry.
Multiple parameters when used with the `code' function have a different
meaning e.g. try
DELPHOS> display code "opsd_human oobo"
It would be meaningless to find those entries which had both the pcode
`opsd_human' and the pcode `oobo', this would be a paradox as all pcodes are
unique! Instead, what happened was that DELPHOS put BOTH the entries
in the worklist. Multiple parameters to `code' therefore do what
you'd intuitively expect.
`Fseq' is the only exception as far as multiple parameters are
concerned. YOU CANNOT USE MULTIPLE PARAMETERS WITH FSEQ.
At this point we can reintroduce the `info' qualifier to the display
command. You may want a running commentary on the hits DELPHOS
finds for each parameter in a multiple parameter query. Try typing
DELPHOS> display/info seq "vpfsn tetsq" (fig 17)
Figure No.17
Matches for SEQ probe VPFSN are:
No. of matches = 4
OOBO 11 MNGTEGPNFY VPFSN KTGVVRSPFEAPQYYL
COX2_PARLI 214 EICGANHSFMPILIES VPFSN FENWVAQYIEE
OPSD_HUMAN 11 MNGTEGPNFY VPFSN ATGVVRSPFEYPQYYL
OPSD_MOUSE 11 MNGTEGPNFY VPFSN VTGVGRSPFEQPQYYL
Matches for SEQ probe TETSQ are:
No. of matches = 3
OOBO 340 GKNPLGDDEASTTVSK TETSQ VAPA
OPSD_HUMAN 340 GKNPLGDDEASATVSK TETSQ VAPA
OPSD_MOUSE 340 GKNPLGDDDASATASK TETSQ VAPA
WORKLIST ENTRIES (3):
OOBO Rhodopsin - Bovine
OPSD_HUMAN RHODOPSIN. - HUMAN (HOMO SAPIENS).
OPSD_MOUSE RHODOPSIN. - MOUSE (MUS MUSCULUS).
DELPHOS will show all the database proteins which contained `vpfsn'
and all the proteins which contained `tetsq', only then will it
show the worklist (as usual) which contains the entries possessing
both pentapeptides.
N.B. DELPHOS only saves the worklist obtained from a multiple parameter
query with the `info' parameter and not the intermediate results. If you
want to keep a record of the intermediate results use the `output'
qualifier to save all the displayed information to a disc file.
2.9 Complex queries
By now you should be able to give DELPHOS any simple query with single
or multiple parameters and display all or part of the database
information by using parameters. You now know how to redirect this
information to a disc file or to a printer.
Biological queries however are usually not as cut-and-dried as the
examples given above. Multiple parameters allow you to
relate sequence information to other sequence information and text
information to other text information. However, multiple parameters
do not allow you to relate sequence information to text information.
This is one of the reasons why DELPHOS allows complex queries. As an
example type
DELPHOS> display seq "vpfsn" and text "opsin" (fig 18)
Figure No.18
WORKLIST ENTRIES (3):
OOBO Rhodopsin - Bovine
OPSD_HUMAN RHODOPSIN. - HUMAN (HOMO SAPIENS).
OPSD_MOUSE RHODOPSIN. - MOUSE (MUS MUSCULUS).
This query will put in the worklist only those protein entries which
contain BOTH the sequence `val-pro-phe-ser-asn' AND the text `opsin',
one or the other will not do.
In other words, the above query allows sequence/text correlations.
We now have to introduce the concept of the `operator'. Operators are
words such as `and' which join parts of a complex query together. Lets
give another example using the `or' operator. Type
DELPHOS> display seq "vpfsn" or text "opsin" (fig 19)
Figure No.19
WORKLIST ENTRIES (16):
OOBO Rhodopsin - Bovine
OOFF Rhodopsin - Fruit fly
OOFF2 Opsin 2 - Fruit fly
OOHUB Blue-sensitive opsin - Human
OOHUR Red-sensitive opsin - Human
OOHUG Green-sensitive opsin - Human
QRHYB2 Beta-2-adrenergic receptor - Hamster
COX2_PARLI CYTOCHROME C OXIDASE POLYPEPTIDE II (EC 1.9.3.1) (GENE NAME: COII)
. - SEA URCHIN (PARACENTROTUS LIVIDUS).
GBT1_BOVIN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-1 SUBUNIT (
TRANSDUCIN ALPHA-1 CHAIN). - BOVINE (BOS TAURUS).
GBT1_HUMAN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-1 SUBUNIT (
TRANSDUCIN ALPHA-1 CHAIN) (GENE NAME: GNAT1). - HUMAN (HOMO
SAPIENS).
GBT2_BOVIN GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-2 SUBUNIT (
TRANSDUCIN ALPHA-2 CHAIN). - BOVINE (BOS TAURUS).
OPS3_DROME OPSIN RH3 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NAME: RH3 OR
RH92CD). - FRUIT FLY (DROSOPHILA MELANOGASTER).
OPS4_DROME OPSIN RH4 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NAME: RH4). -
FRUIT FLY (DROSOPHILA MELANOGASTER).
OPSD_HUMAN RHODOPSIN. - HUMAN (HOMO SAPIENS).
OPSD_MOUSE RHODOPSIN. - MOUSE (MUS MUSCULUS).
OPSD_OCTDO RHODOPSIN. - GIANT OCTOPUS (OCTOPUS DOFLEINI).
This query will retrieve the protein entries which contain EITHER
the peptide `vpfsn' or the word `opsin' or BOTH. That is, any
entry which contains one or the other or both gets put in the
worklist. Note how the `or' operator differs from the `and' operator.
Just to make it clearer we have provided an operator called `add'
which does just the same as `or'. For example
DELPHOS> display seq "vpfsn" add text "opsin"
is precisely equivalent to the preceding query. You can relate any
function to any other function using these operators. Other operators
available to you include `xor' which stands for `exclusive or'. The
query
DELPHOS> display seq "vpfsn" xor seq "tetsq" (fig 20)
Figure No.20
WORKLIST ENTRIES (4):
OOBO Rhodopsin - Bovine
COX2_PARLI CYTOCHROME C OXIDASE POLYPEPTIDE II (EC 1.9.3.1) (GENE NAME: COII)
. - SEA URCHIN (PARACENTROTUS LIVIDUS).
OPSD_HUMAN RHODOPSIN. - HUMAN (HOMO SAPIENS).
OPSD_MOUSE RHODOPSIN. - MOUSE (MUS MUSCULUS).
will put in the worklist those sequences which contain either `vpfsn'
or `tetsq' BUT NOT THOSE WHICH CONTAIN BOTH.
Another operator is `subtract'. What this does is to subtract the
results of one query from the results of another. For example, the
query
DELPHOS> display seq "vpfsn" subtract text "opsin" (fig 21)
Figure No.21
WORKLIST ENTRIES (1):
COX2_PARLI CYTOCHROME C OXIDASE POLYPEPTIDE II (EC 1.9.3.1) (GENE NAME: COII)
. - SEA URCHIN (PARACENTROTUS LIVIDUS).
will put into the work list only those proteins which contain the
peptide `vpfsn' that contain no mention of `opsin' within the text.
You can think of `subtract' as being equivalent to `and not' therefore
the query
DELPHOS> display seq "vpfsn" and not text "opsin"
is completely equivalent to the preceding example but is arguably more
difficult to understand. You can add the `not' operator after any
other operator. For example
DELPHOS> display seq "vpfsn" or not seq "tetsq"
will put in the worklist any sequence which contains `vpfsn' and also
any sequence which doesn't contain `tetsq'.
Re-read this section and make sure you understand it before proceeding.
2.10 Very complex queries
The complex query is not the limit of DELPHOS, it allows very complex
queries as well. Very complex queries can be defined as those with
more than two functions. Using functions and operators you can make any
arbitrarily complex query. For very complex queries you can make the
meaning clear by adding parentheses! Type the following
query
DELPHOS> display (seq "vpfsn" and seq "tetsq") or text "hogness"
What does it do? Well, this example is relatively easy and does
precisely what you'd expect. The term in parentheses finds those
entries which contain BOTH peptides, the term outside the
parentheses finds all entries containing `hogness', the sum of both
terms then forms the worklist. To put it another way, the worklist will
contain all the `hogness' entries plus those sequences with both `vpfsn'
and `tetsq' in them.
What about...
DELPHOS> display seq "vpfsn" and (seq "tetsq" or text "hogness")
... this is obviously different kettle of fish but again, if you look
closely it also does what you'd expect. First, it puts together
those proteins which contain either the peptide `tetsq' or the
name `hogness' or both, it then selects from this group only those sequences
which contain the peptide `vpfsn' and puts them in the worklist.
This begs the question... what does the following query do? Try it
DELPHOS> display seq "vpfsn" and seq "tetsq" or text "hogness"
It could do one or the other of the last two examples. It is actually
equivalent to
DELPHOS> display seq "vpfsn" and (seq "tetsq" or text "hogness")
This is an important point about DELPHOS. If you don't put parentheses
round terms in a very complex query DELPHOS works things out from
right to left. It is good practice to use parentheses to make the
meaning of a query entirely clear. You can put in as many as you
like providing they balance. For example
DELPHOS> display (seq "vpfsn") and ((seq "tetsq" or text "hogness"))
is perfectly acceptable and is equivalent to the last example.
DELPHOS allows you to nest parenthesised queries to any depth. In
practice though you can make things easier on yourself by breaking
up the query into manageable chunks and fitting everything together
using the DELPHOS list commands.
2.11 Complex queries made easy using lists.
`Display' is only one of many DELPHOS commands. It is the one most
used for browsing through the database. Some other commands deal with
lists. DELPHOS lists make life easy for you.
There are two lists available to the DELPHOS user. One you've already
met, the WORKLIST. The worklist, as its name implies, contains
the set of proteins you're currently working on, typically this will be
the results of the last query but not necessarily so. This is because
of the existence of the STORELIST. This list is just what it says, a
list which can act as a temporary store of a set of protein entries.
The WORKLIST and the STORELIST can hold a set of protein pcodes (the
unique identifiers of the database proteins). You can transfer
information from one list to another in several ways. Type
DELPHOS> display text "opsin"
DELPHOS> storework
DELPHOS> display code "xxx"
What this has done is to put all the opsins in the WORKLIST using the
`display' command. The `storework' command copied the contents of the
WORKLIST to the STORELIST overwriting anything that was there before
(if anything). The final `display' command looked for a pcode which
doesn't exist. This leaves you with no entries in the WORKLIST, you
can verify this by typing
DELPHOS> display
The `display' command works exclusively on the WORKLIST. You can
recover the list of opsins, currently held in the STORELIST, by
typing
DELPHOS> recallwork
DELPHOS> display
The `recallwork' command copies the contents of the STORELIST to the
WORKLIST overwriting what was there before (in this case nothing)
and then the `display' command shows you what is in the WORKLIST. At
this moment the WORKLIST and STORELIST contain precisely the same
set of pcodes. Now type
DELPHOS> display seq "vpfsn"
The WORKLIST now contains all the protein entries which contain the
pentapeptide "vpfsn", the STORELIST contains all the opsins. You can
interchange the contents of the two lists by typing
DELPHOS> swapwork then type
DELPHOS> display to verify the interchange. Type
DELPHOS> swapwork
DELPHOS> display and you're back where you started.
You can also save the contents of the WORKLIST or STORELIST to a
disc file and load it back in again later. The commands to use are
`worksave', `workread', `storesave' and `storeread'. Type
DELPHOS> display title "rhodopsin"
DELPHOS> worksave work.tmp
DELPHOS> display title "cytochrome"
DELPHOS> storework
DELPHOS> display/full code "oobo"
DELPHOS> workread work.tmp
DELPHOS> display
You end up with the rhodopsins in the WORKLIST and the cytochromes in the
STORELIST after having had a quick look at pcode `oobo'! Note that if
you don't give the list save and read commands the name of a disc
file (work.tmp in the last example) they will prompt you for one.
Having described the lists we can now explain how they can help you
break down complex queries into manageable chunks. Take as an example
the query we used earlier.
DELPHOS> display (seq "vpfsn" and seq "tetsq") or text "hogness"
First of all, you can type
DELPHOS> display seq "vpfsn" and seq "tetsq"
DELPHOS> storework
This takes the first part of the query (the section in parentheses),
works it out and puts it as usual into the WORKLIST. The second
command makes a copy of the WORKLIST in the STORELIST. Now you're
ready to type
DELPHOS> display text "hogness"
So now you've got the first part of the query in the STORELIST and
the last part of the complex query in the WORKLIST. All you need do
now is to `or' them. You do this by typing
DELPHOS> orlists
This command performs the `or' of the STORELIST with the WORKLIST
and leaves the result in the WORKLIST. The STORELIST remains
unchanged. You can now examine the WORKLIST by typing
DELPHOS> display
The other list operator commands are `andlists' and `xorlists'. They
both, like `orlists', put the result in the WORKLIST and leave the
STORELIST unchanged. So, for example, the query
DELPHOS> display seq "vpfsn" and (seq "tetsq" or text "hogness")
can be broken down into the following steps.
DELPHOS> display seq "tetsq" or text "hogness"
DELPHOS> storework
DELPHOS> display seq "vpfsn"
DELPHOS> andlists
DELPHOS> display
Another useful command is `negwork'. This command performs a `not' on
the WORKLIST i.e. it replaces whatever was in the WORKLIST with
whatever wasn't! For example, if you wished, you could emulate the
query
DELPHOS> display not text "cytochrome"
by typing
DELPHOS> display text "cytochrome"
DELPHOS> negwork
DELPHOS> display
To summarise, using the DELPHOS lists and list operations you can
break down any arbitrarily complex query into small chunks. You can
also save either or both lists to disc, leave DELPHOS, do something else,
return to DELPHOS, load back the lists from disc and carry on where you
left off.
2.12 Shortcuts
DELPHOS allows you to abbreviate commands and qualifiers down to the
point of no ambiguity with other commands or qualifiers. For example
DELPHOS> swapwork
can be replaced by,
DELPHOS> sw
it cannot be replaced by simply `s' as this could be confused with
`storework', `storesave' and `storework' and you'd be given a rude
message. Similarly, the query
DELPHOS> display/author/comment/output=a.a title "opsin"
can be replaced by
DELPHOS> d/au/c/o=a.a title "opsin"
as there is no conflict with any other command or qualifier. Remember
though that YOU CANNOT ABBREVIATE FUNCTIONS (e.g. `seq') OR OPERATORS
(e.g. `and').
The command `display' holds a privileged place in DELPHOS. If there is
no other command given and there is no ambiguity anywhere else then
DELPHOS assumes the `display' command has been given, so the following
queries are all equivalent.
DELPHOS> display/author/comment/output=a.a title "opsin"
DELPHOS> d/au/c/o=a.a title "opsin"
DELPHOS> /au/c/o=a.a title "opsin"
As are ..
DELPHOS> display seq "vpfsn"
DELPHOS> seq "vpfsn"
If you're at the DELPHOS> prompt though, to redisplay the worklist
you have to type at least
DELPHOS> d
Another useful facility is the ability to execute an operating system
command from within DELPHOS. You do this by preceding the command
with a dollar (`$') symbol. For example,
DELPHOS> $directory (VMS) DELPHOS> $ls (UNIX)
will list the current directory. If you just type..
DELPHOS> $
.. then a subprocess will be created and you'll be returned to the
operating system. You can type commands as normal then, when you've
finished, `logout' and you'll be returned to DELPHOS.
The rest of this documentation does not use these abbreviations for
the sake of clarity. It is extremely useful to remember these shortcuts
as they save a lot of finger-ache. Try using them in the remainder of
the tutorial.
2.13 Tailoring the worklist
Occasionally a query will give a false positive result in which case
it is advantageous to be able to remove an entry, or set of entries,
from the worklist. You can do this in DELPHOS using the `minuswork'
command. This command can accept a query just like `display'.
Assume for example you've got all the atpases by typing,
DELPHOS> display title "opsin"
but then remember you wanted all the opsins that don't contain the
sequence `vpfsn'. All you need to type is
DELPHOS> minuswork seq "vpfsn"
As another example, assume you've got all the opsins using the query
DELPHOS> display title "opsin"
but then remember that you didn't want the one with pcode `ooff'.
You can just type
DELPHOS> minuswork code "ooff"
The command `pluswork' has the opposite effect to `minuswork' it adds
the results of a query to the worklist. For example,
DELPHOS> pluswork code "oobo ooff"
will add the two pcodes to the worklist, assuming they're not already
there.
2.14 Creating database subsets
Typically a researcher is interested in a related group of proteins.
If a new sequence comes along and, for example, a similarity search
of this sequence against a predefined set of other proteins is wanted then it
does not make sense, either scientifically or with regard to computer
time, to compare the sequence against the whole database. In cases
like these it is advantageous to create a database subset and do
the similarity search against the subset.
The examples given in the tutorial so far have shown how easy it is
to create a worklist containing sequences of interest. DELPHOS makes
it easy to extract these sequences onto disc by providing the
`createseq' and `createdb' commands. Createseq is used for extracting
just the sequences into a file, `createdb' creates a sequence,
reference and title file in NBRF-PIR format. `Createdb' will be
infrequently required by a user, `createseq' is the command to use
to create a database subset for similarity searching.
Both the `create' commands can accept a query, just like `display'.
If no query is given then the contents of the worklist will form the
database, either way you are prompted for the name to give the disc
file(s). The following commands produce the same effect
DELPHOS> createseq title "opsin"
Enter a name for the sequence file: opsin.seq
or
DELPHOS> display title "opsin"
DELPHOS> createseq
Enter a name for the sequence file: opsin.seq
Both of the above operations will create the file `opsin.seq' which
contains the sequences of all proteins which had the word `opsin' in
their title line. This database is in the correct format for the
ISIS similarity searching program SWEEP.
DELPHOS also provides the `copy' command. This outputs the contents
of the worklist to a disc file in the NBRF-PIR PSQ COPY format.
2.15 Reading SWEEP hit lists
DELPHOS can also read hit lists produced from the SERPENT similarity
searching program SWEEP. Refer to the SERPENT documentation for a
description of SWEEP. It loads in the pcodes from any specified
block in the hit list file. The unsegmented sequence block produced
by SWEEP is referred to as `block 1', the first segmented sequence
block is referred to as `block 2' etc. These blocks can be read
into either the worklist of the storelist using the `hitwork'
and `hitstore' commands respectively. These commands ask for the
name of the SWEEP hits file and the block number of interest.
3. A summary of DELPHOS commands, qualifiers, functions and operators
3.1 Commands
Command Function
DISPLAY (Q,W) Query Result -> standard output & worklist
CREATEDB (Q,W) Worklist -> NBRF-PIR .seq .ref and .ttl files
CREATESEQ (Q,W) Worklist -> NBRF-PIR .seq file
STOREWORK Worklist -> Storelist
RECALLWORK Storelist -> Worklist
SWAPWORK Transposes Storelist and Worklist
WORKSAVE (D) Worklist -> Disc file
WORKREAD (D) Disc file -> Worklist
STORESAVE (D) Storelist -> Disc file
STOREREAD (D) Disc file -> Storelist
HITWORK (D) SWEEP Hit-list -> Worklist
HITSTORE (D) SWEEP Hit-list -> Storelist
ANDLISTS Storelist AND Worklist -> Worklist
ORLISTS Storelist OR Worklist -> Worklist
XORLISTS Storelist XOR Worklist -> Worklist
PLUSWORK (Q) Worklist OR Query Result -> Worklist
MINUSWORK (Q) Worklist AND NOT Query Result -> Worklist
NEGWORK NOT Worklist -> Worklist
NOT NOT Query -> Worklist
COPY (Q,W) Worklist -> NBRF-PIR Copy-format disc file
HELP Brief help -> standard output
QUIT/EXIT/BYE Return to operating system
Key: Q = accepts query
W = accepts current worklist contents (see tutorial)
D = accepts data
The DELPHOS commands are summarised above. There follows a more
detailed description of each.
DISPLAY: This is the default command. Its use is assumed if no other
command is given. This command may accept a query.
The default output of this command is to send only the title
line of each matching protein in the worklist to the screen.
The output can be expanded and redirected using the qualifiers
presented in the next table.
If no query is given then the current contents of the worklist
are redisplayed.
CREATEDB: This command may accept a query.
The contents of the worklist are used to create an NBRF-PIR
format database of SEQ, REF and TTL files. If no query is
given then the current contents of the worklist are used.
The command prompts for a name for the database files.
CREATESEQ: Similar to CREATEDB but only the NBRF-PIR SEQ file is
produced.
STOREWORK: Overwrites the storelist with the current worklist. The
worklist is unaffected.
RECALLWORK: Overwrites the worklist with the storelist contents. The
storelist is unaffected.
SWAPWORK: Transposes the worklist and storelist entries.
WORKSAVE: Saves the worklist as protein identification codes (pcodes),
to a named file. This command expects a filename but will
prompt if one is not given.
WORKREAD: Reads a file of pcodes into the worklist.
STORESAVE: As for WORKSAVE but acts on the storelist.
STOREREAD: As for WORKREAD but acts on the storelist.
HITWORK: Loads a block of pcodes from a SWEEP hitlist into the worklist.
Expects a filename and a block number but will prompt if either
or both are missing. The SWEEP unsegmented sequence block
is `block 1', the first segmented sequence block is `block 2'
etc.
HITSTORE: As for HITWORK but acts on the storelist.
ANDLISTS: Performs the boolean AND of the storelist and the worklist
leaving the result in the worklist. The storelist is
unaffected.
Only those pcodes common to both lists form the new worklist
ORLISTS: As for ANDLISTS but the boolean OR is performed. The pcodes
in both lists are added to form the new worklist.
XORLISTS: As for ANDLISTS but the boolean eXclusive OR is performed.
Only pcodes that were in one or other list, but not both,
form the new work list.
NEGWORK: The worklist entries are replaced by all the other entries
in the database.
PLUSWORK: This command expects a query and will prompt if none is given.
The results of the query are added with the current contents
of the worklist.
MINUSWORK: This command expects a query and will prompt if none is given.
The results of the query are subtracted from the
current contents of the worklist.
COPY: Emulates the NBRF-PIR PSQ COPY command.
You are asked whether text information is required.
3.2 Qualifiers
These are only available for use with the DISPLAY command
/AUTHOR Enables author information display
/ALTERNATIVE Enables alternative name display
/COMMENT Enables comment display
/INFO Displays results of subqueries
/FULL Equivalent to /AUTHOR/ALTERNATIVE/COMMENT/SEQUENCE
/OUTPUT=filename Sends screen information to a named file
/PRINTER Sends screen information to the default printer
/SEQUENCE Enables protein sequence display
3.3 Functions
NOT
TITLE
TEXT
SEQ
CODE
FSEQ
SEQ: Searches the database for any sequences which exactly
match a given peptide.
If multiple parameters are given (e.g. seq "xxx yyy") then
only those sequences are returned which contain all the
peptides specified.
TITLE: Searches the database titles for entries which contain the given
text. Multiple parameters have the same interpretation as for
the SEQ function.
TEXT: This command searches all textual information in the database
for entries which contain a given string. Multiple parameters
(e.g. text "xxx yyy") have the same interpretation as for the
SEQ function.
CODE: Searches the database for given pcodes.
Multiple pcodes can be given.
FSEQ: Searches the database for fuzzy sequence matches.
The parameter takes the form "probe mismatches". The 'probe' term
contains the sequence with parentheses enclosing invariate
residues. The 'mismatches' term is an integer giving the number
of allowed mismatches in the sequence. For example
FSEQ "a(c)def 1"
will find those entries containing the above pentapeptide with
one allowed mismatch in residues 'a','d','e' or 'f' (a 'c' must
always be the second relative residue.
Multiple parameters are not currently implemented for this
function.
3.4 Operators
AND
OR
XOR
ADD (equivalence = 'or')
SUBTRACT (equivalence = 'and not')
4. A strategy for creating database subsets from text queries
This section describes the development, using DELPHOS, of an effective
strategy for a typical problem of retrieval. The task is to...
"Create a specialised database of all Class I and Class II Major
Histocompatibility antigens, including any Tla/Qa or CD1
homologues."
The objective is to define a set of DELPHOS commands that will allow
such a database to be build FROM FUTURE OWL RELEASES without the
need for database similarity searching at every new release.
This task requires an initial stage of research into the retrieval power
of both text parameters and sequence similarity searches. Once a
strategy has been developed it can be applied very quickly to update the
specialised subset database following each new release of OWL.
This mini-database would have many research applications, for example in
the construction of multiple sequence alignments and phylogenies.
Detailed modelling of the structure and interactions of important parts
of this set of homologous molecules, such as the peptide presentation
groove, requires as much information as possible from sequence
alignments about alternative amino acids at different positions in the
structure and their functional effects. Sequence pattern discriminators
could be more readily refined by rapid evaluation using the small
subset of the OWL database. The information in the specialised database
could also be readily extended, using the `browsing' facilities of
DELPHOS to include, for example, other MHC-encoded polypeptides or
proteins that interact with MHC Class I and Class II antigens.
Particular problems arise in this case for achieving complete and
specific retrieval of only the relevant protein entries. A few hundred
of the Class I and Class II antigens have been sequenced. Retrieval by
sequence homology is likely to be relatively non-specific because of
many other homologues of individual domains of these antigens, for
example immunoglobulins, beta-2-microglobulins and other immune system
surface receptors. Retrieval by text strings is likely to be incomplete
because of the numerous synonyms of MHC antigen names present in the
component databases of OWL.
These problems can be overcome by using
(a) Connected combinations of many text search parameters
(b) Using the list manipulations of DELPHOS to integrate information
from both sequence similarity searches and text retrieval.
4.1 Gather information from sequence similarity searches
A good approach is to run a typical Class I and a typical Class II
protein through the SWEEP similarity searching program and gather
the top 500 or so preliminary alignments. In this example more than
250 homologous Class I and Class II proteins will be present in the
hit lists. Also, a few other homologues of the immunoglobulin domain
will appear in the hit lists and (with OWL 7.0) predominated from
position 270 downwards.
Load the two lists into DELPHOS using the `hitstore' and `hitwork'
commands and intersect using `andlists' to check homology between
the two hitlists. The two lists can then be combined using `orlists'.
Both the intersected and combined lists can be saved to disc.
4.2 Work out effective simple text discriminators for the homologues found
by sequence similarity searching
In the given example, it can be shown by experiment that many relevant
text strings such as "mhc", "hla", "classi", "transplantation" and
"antigen" show little specificity for the MHC proteins. For example,
"mhc" is also the gene designation for "myosin heavy chain" and "hla"
is a frequent substring of irrelevant words such as "chlamydia".
It was found that greater retrieval efficiency was obtained with
character strings from the word "histocompatibility", only a few
(13 in OWL 7.0) did not contain a version of this word and again, only
a few (25 in OWL 7.0) unwanted entries were retrieved. The truncated
string "histocompat" gave more efficient retrieval than the full
word because of the occurrence of the mis-spelled word
"histocompatability" in some database entries. The words "histoco"
and "istocom" gave identical results to "histocompat" but shorter
strings were less discriminating.
The diagnostic success of "histocompatibility", compared to other
words, is a consequence of its appearance as a standard term in one
or more textual fields of the NBRF, Swissprot and GenBank source databases.
The concordance between different databases may be gratuitous:
"histocompatibility" is used by NBRF in both titles and keywords, by
Swissprot as a standard term in the feature tables and by GenBank
as a keyword in most of the relevant entries. GenBank is internally
inconsistent because some of the author-submitted entries lack the
term.
Such results demonstrate why a standard vocabulary ought to be adopted
by source databases and also why DELPHOS free text searching scores over
other query languages. The discriminatory power of "histocompat"
DEPENDS on the free text ability since the keyword is not restricted
to a common field, or any, field in the source databases.
4.3 See if there are more complex text discriminators which give
better retrieval
In the given example, although
DELPHOS> display text "histocompat"
gave 90% recall it was useful to try to achieve better retrieval using
more complex queries to avoid source database problems of synonymy and
inconsistent usage. The following alternative designations were
commonly present in the titles of some of the Class I and Class II
proteins: "Class I histocompatibility", "MHC class I", "MHC class II",
"Class I MHC", "Class II MHC", "HLA class I", "HLA class II",
"HLA-DR class II", "RLA class I", "RT1 class II", "H-2 class I" and
"H-2 class II". There were also less frequent variations such as
"Qa/Tla class I", "MHC HLA DQ", "H-2 L-D gene product", "Q7b antigen",
"T1 antigen", "CD1 thymocyte" and "CD1 histocompatibility".
These many synonyms, which contain key substrings in different
sequences, can be diagnosed only with a relatively complex query. The
example below was formulated as follows. First, the string "classi" was
used since this was common to many of the synonyms. Second, the less
frequent names were reduced to their shortest effective substrings.
The most concise form of a DELPHOS expression to retrieve the great
majority of the required proteins would be:
DELPHOS> display (text "classi" and (text "mhc" or text "hla" or
text "rt1" or text "h2c" or text "histocompat")) or (text "cd1"
and (text "thymo" or text "histocompat")) or text "q7b" or
text "tlantigen"
Although theoretically very efficient in CPU time this query should be
subdivided for clarity (as suggested in the tutorial) and possibly
for computer memory restraints as the multiple `or' operations generate
very large lists. The above query is far more readable as:
DELPHOS> display text "classi mhc" or text "classi hla" or text "classi rt1"
DELPHOS> pluswork text "classi h2c" or text "classi histocompat"
DELPHOS> pluswork text "cd1 thymo" or text "cd1 histocompat"
DELPHOS> pluswork text "q7b" or text "tlantigen"
With OWL 7.0 this took only 23.1 seconds of cpu time on a micro VAX
3600 computer. Retrospective analysis showed that recall was almost
complete with only 3 out of 299 relevant entries being missed.
A few irrelevant proteins designated as MHC "Class II" had also been
retrieved. Although MHC-encoded these are complement proteins and do
not belong to the Class I and Class II set specified. They were
removed using the command
DELPHOS> minuswork text "classiiig" or text "classiiih" or
text "classiiir" or text "glycoproteincd4"
This procedure took only 13.6 seconds of cpu on the same computer.
4.3 Integrate the results from the sequence similarity search with
those from the text retrieval
It is good practice to save the worklist as you go along. In the HLA
example, the `andlists' and `orlists' worklist from the SWEEP search
were held in the files HLA_AND.WK and HLA_OR.WK respectively. The
worklist resulting from the complex text discriminators shown in the
last section was held in MHC_TEXT.WK
First, you can identify all the entries common to the two methods
of retrieval using DELPHOS by typing
DELPHOS> workread MHC_TEXT.WK
DELPHOS> storeread HLA_OR.WK
DELPHOS> andlists
DELPHOS> MHC_AND.WK
In our case the result contained only 2 unwanted entries and missed only
24 giving good reassurance of the high concordance between the two
independent methods of sequence similarity searching and text
retrieval.
Second, identify the entries differing in the two lists. The entries
retrieved by sequence similarity but not by text retrieval were
obtained by typing
DELPHOS> storeread MHC_AND.WK
DELPHOS> workread HLA_OR.WK
DELPHOS> xorlists
DELPHOS> worksave HLA_XOR.WK
and then the entries obtained by text retrieval but not by similarity
searching by typing
DELPHOS> workread MHC_TEXT.WK
DELPHOS> xorlists
DELPHOS> worksave MHC_XOR.WK
The resulting lists were valuable for completing the list of relevant
entries and for identifying the remaining unwanted entries from the
complex text discriminator list MHC_TEXT.WK. The relatively short lists
derived from the XOR operations were quickly perused and new diagnostic
character strings readily devised. Call the resulting worklist MHC.WK
Some relevant entries might have been missed because of a combination
of low sequence homology and anomalies in the text strings of the
database annotations and titles.
Such possibilities should be explored using the `browsing' the database
using DELPHOS
4.4 Browse
An effective method of browsing is to search the OWL database with
text or title strings that are exploratory rather than diagnostic.
Such strings may have relatively broad specificity for aspects of a
range of related proteins or functions that might be cross-annotated in
the textual fields of protein entries. In the context of MHC antigens,
individual exploratory strings might include "mhc", "hla", "rla",
"rt1" etc., or perhaps the names of likely authors such as "hoodl"
(for "Hood, L.").
To explore OWL for relevant entries not in our MHC.WK these strings
were used one at a time and the results of each search compared
with MHC.WK by typing e.g.
DELPHOS> display text "mhc"
DELPHOS> storework
DELPHOS> workread MHC.WK
DELPHOS> xorlists
DELPHOS> andlists
DELPHOS> display
Only one more relevant entry was found with the diagnostic text of this
Class I antigen being "CW1 antigen". This diagnostic text was added
to the complex query and the result saved as the final list.
This list was used to create the specialised database using the
CREATEDB command.
4.5 Updating
Once a retrieval strategy has been established for the creation of
a specialised database, it can be applied quickly and repeatedly to
update this database with each new release of OWL. The research
process, as described above for the MHC proteins, is only necessary
for the initial development and evaluation of the retrieval strategy.
Ease of updating is crucial since the amount of protein data is
doubling every 18 months. For some purposes a simplified strategy
such as
DELPHOS> createdb text "histocompat"
is sufficient as it gave greater than 90% precision and recall for
OWL 7.0 (January 1990). If completeness of retrieval is critical then
more complex queries and expert assessment of the list entries, as
described above, are required. Either way, DELPHOS queries typically
take only a few seconds to perform on a VAX minicomputer and rarely
greater than a minute for very complex queries.
5.0 Theory
The context-free grammar of DELPHOS is defined by the replacement rules
given below which defines the set of all regular DELPHOS expressions.
::= | | | NULL
::= |
::= | # | ()
::=
Key. Metasymbols
::= 'is a'
| 'or'
Non-terminal symbols
Terminal symbol
# boolean operator
A DELPHOS expression consists of a command (with optional qualifiers)
followed by a query. Queries are any arbitrarily complex associations of
function/parameter pairs separated by boolean operators. Subqueries may
be parenthesised to any depth in order to force precedence. Actions may
be abbreviated unless such an abbreviation produces ambiguity with
another action; functions and boolean operators may not be abbreviated.
Actions must be separated from queries by at least one space character.
Similarly probes must be separated from boolean operators and functions
separated from parameters by at least one space character. All
parameters to functions are delimited by double quotation marks. Typical
regular expressions are:
DELPHOS is case-insensitive and ignores all punctuation in function
parameters.
DELPHOS internally converts all regular expressions to postfix.
The conversion to reverse-polish notation removes all parentheses and
makes the query unambiguous. A corollary is that unparenthesised
queries are parsed from right to left.
6.0 References
Pearson, W.R. and Lipman, D.J. (1988) PNAS USA 85, 2444-2448
George et al.