THE SMITE USER GUIDE
An interactive query language for
the PRINTS fingerprint database.
Alan Bleasby
EBI, Hinxton Genome Campus, UK
ajb@ebi.ac.uk
CONTENTS SMITE
1.0 Introduction
1.1 Smite Tutorial
a) How to examine a PRINTS entry
b) Simple queries
c) Complex queries
d) Multiple parameters
e) Very complex queries
f) Other display commands
g) Other commands which can accept a query.
h) How LISTS make life easy
i) NEGWORK
j) How can I see what entries are in the PRINTS database?
k) EXTRACT
l) Shortcuts
1.2 Summary of SMITE commands, qualifiers, functions and operators
a) Commands
b) Functions
c) Operators
d) Qualifiers
2.0 References
2.1 Applications
SMITE
1.0 Introduction
SMITE is a query language for the PRINTS database. It uses the same
general syntax as DELPHOS, the query language for the OWL database.
The program allows you to examine the database and also to extract
motif sets in ADSP format. This brief guide first provides a tutorial
on the use of SMITE followed by command descriptions. It assumes
you're using the example PRINTS database.
1.1 Smite Tutorial
a) How to examine a PRINTS entry
The description of the PRINTS database format has shown that every
entry has a unique identifier code. This section shows how to examine
individual entries. The SMITE command `DISPLAY' is used to query the
PRINTS database. Type
SMITE> display code "lyslact"
You'll see the following information displayed on your screen.
WORKLIST ENTRIES (1):
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
The SMITE command translates to `display the PRINTS database entry
with the identifier code lyslact'. DISPLAY is the command, CODE is a
function. The CODE function allows you to select database entries for
examination. The default action of the DISPLAY command is just to
provide you with a very brief description of what the fingerprint
represents (in this case a lysozyme/lactalbumin fingerprint entry).
What you've typed is a `query'. SMITE stores the entry codes which
match your query in a list called the WORKLIST. After the above query
it shows that there is one entry in the worklist, namely LYSLACT. That
is to be hoped as you've asked SMITE for a unique entry!. You can
redisplay the contents of the worklist at any time by typing
SMITE> display
Try it now. You should get the same results shown as your original
query. There is obviously more information in a PRINTS database entry
than just its title. To get more information from the DISPLAY command
you use `qualifiers'. These qualifiers are listed in Appendix B. To
see what qualifiers are available type
SMITE> help
As an example of their use type ...
SMITE> display/brief code "lyslact"
This will give you the following output on your screen:-
WORKLIST ENTRIES (1):
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
Type of feature: COMPOSITE with 6 elements
Prosite code: PS00128 LACTALBUMIN_LYSOZYME; PATTERN
Created by D.N.PERKINS, 29-MAY-1991 (UPDATE M.E.BECK, 5-APR-1993)
1. SHEWALE, J.G., SUDHIR, K.S. and BREW, K.
Evolution of alpha-lactalbumins.
J.BIOL.CHEM. 259 4947-4956 (1984).
2. IRWIN, D.M. and WILSON, A.C.
Multiple cDNA sequences and the evolution of bovine stomach lysozyme.
J.BIOL.CHEM. 264 11387-11393 (1989).
3. STUART, D.I., ACHYARA, K.R., WALKER, N.P.C., SMITH, S.G., LEWIS M.
and PHILLIPS D.C.
Alpha-lactalbumin possesses a novel calcium binding loop.
NATURE 324 84-87 (1986).
4. NITTA, K., HIDEAKI, H., SHINTARO, S. and SHIMAZAKI, K.
The calcium binding property of equine lysozyme.
FEBS LETTERS 223 405-408 (1987).
Lysozyme C and alpha-lactalbumin and are similar both in terms of primary
sequence and structure, and probably evolved from a common ancestral
protein. There is, however, no similarity in function as lactalbumin
promotes the conversion of galactosyltransferase to lactose synthase and is
essential for milk production [1], while lysozyme catalyses the hydrolysis
of bacterial cell wall polysaccharides; it has also been recruited for a
digestive role in certain ruminants and colobine monkeys [2]. Another
significant difference between the 2 enzymes is that all lactalbumins have
the ability to bind calcium [3], while this property is restricted to only
a few lysozymes [4]. The binding site was deduced using high resolution
X-ray structure analysis and was shown to consist of 3 aspartic acid
residues. It was first suggested that the calcium bound to lactalbumin
stabilised the structure, but recently it has been claimed that calcium
controls the release of lactalbumin from the golgi membrane and that the
pattern of ion binding may also affect the catalytic properties of the
lactose synthetase complex.
LYSLACT is a 6-element fingerprint that provides a signature for the
lysozyme/alpha-lactalbumin superfamily. The fingerprint was derived from
an initial alignment of 12 sequences: motif 5 encodes the calcium binding
region, and together with motif 4 contains 3 of the 8 cysteine residues
that are conserved in both lysozymes and lactalbumins (cf. PROSITE pattern
LACTALBUMIN_LYSOZYME (PS00128)). Two iterations on OWL10.1 were required to
reach convergence, at which point a true set comprising 81 sequences was
identified (cf. signatures LYSOZYME and LACTALBUMIN).
An update on OWL19.1 identified a true set containing 98 sequences,
together with a number of partial matches, all of which are fragments.
SUMMARY INFORMATION
98 codes involving 6 elements
0 codes involving 5 elements
1 codes involving 4 elements
2 codes involving 3 elements
7 codes involving 2 elements
COMPOSITE FINGERPRINT INDEX
6| 98 98 98 98 98 98
5| 0 0 0 0 0 0
4| 0 0 1 1 1 1
3| 0 0 2 2 2 0
2| 6 6 1 0 0 1
--+-------------------------------
| 1 2 3 4 5 6
Qualifiers are only used with the DISPLAY, ISHOW and FSHOW commands.
They must always immediately follow the command i.e. they must appear
before any `functions' such as CODE.
Now that LYSLACT is in the worklist you can redisplay these results by
typing
SMITE> display/brief
Try it. The /brief qualifier gives you the type of fingerprint (simple
or composite), the PROSITE code (if any), the author and creation date
of the fingerprint, bibliographic references, comments, summary
information and the composite fingerprint index. These could have been
selected individually by using the /TYPE, /AUTHOR, /REFERENCE,
/COMMENT, /SUMMARY and /CFI qualifiers. Assuming you still have
LYSLACT in your worklist try typing..
SMITE> display/author
SMITE> display/type etc.
You should get selected extracts from the information you obtained by
typing /BRIEF. You can combine qualifiers, try typing e.g.
SMITE> display/type/comment code "lyslact" or just
SMITE> display/type/comment (if lyslact is already in
the worklist)
Try several combinations.
Some information is not displayed by /BRIEF, notably the scan history,
protein codes and titles for motif sets (true/false positives,
true/false negatives, subfamily positives and negatives), initial
motif sets and final motif sets. To get the full information about a
fingerprint type
SMITE> display/full code "lyslact"
(or just SMITE> display/full if lyslact is in the worklist).
As usual the history, pcode information, initial and final motif sets
can be displayed individually by typing
SMITE> display/history code "lyslact"
SMITE> display/title
SMITE> display/imotif
SMITE> display/fmotif
Try them. Three qualifiers shown by `SMITE> help' have not yet been
mentioned. These are /INFO, /OUTPUT=fn and /PRINTER. The /INFO
qualifier will be described later. The /OUTPUT and /PRINTER qualifiers
allow you to send SMITE output to somewhere other than the screen.
This is very useful for getting hardcopy. Try typing
SMITE> display/brief/output=LYS.BRIEF code "lyslact"
Leave SMITE by typing
SMITE> quit
and examine the file LYS.BRIEF that has been produced. It should
contain all the information that would otherwise be sent to the
screen. The /PRINTER option is available on some systems and will send
SMITE output directly to the default system printer at your site.
Note that the DISPLAY command, with or without qualifiers, shows
everything in the worklist. If you have more than one entry in the
worklist (see later) and you use /BRIEF (for example) you'll get brief
information on all the entries in the worklist.
This subsection has covered how you can display entries in the PRINTS
database; subsequent subsections show how you can query the database
using SMITE.
b) Simple queries
In this part of the tutorial we'll again restrict the examples to the
DISPLAY command.
SMITE contains many other functions other than CODE. You can query the
database on the basis of general text, pcode text, pcodes, sequence
and the number of elements in a fingerprint by using these functions.
Lets start off with a text query. You want to know which entries in
the PRINTS database mention `calcium'. To find out type...
SMITE> display text "calcium"
... and you'll get the following representative output:-
WORKLIST ENTRIES (3):
AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE
CANODO NODO CALCIUM BINDING SIGNATURE
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
The default output, as usual, just gives the title lines for the
matching fingerprints. Only one of them (CANODO) has the word `calcium'
in the title line, in the others the word could be in the prosite
name, author, bibliography or comment fields. The TEXT function looks
at all these fields. Try typing
SMITE> display/brief text "calcium"
(or just `display/brief' if you've just performed the TEXT query)
and spot where `calcium' occurs in the descriptions for AAMYLASE and
LYSLACT.
NB: Like DELPHOS, all text queries use the idea of FREE TEXT
SEARCHING. Only the numerals 0-9 and letters A-Z (case
insensitive) are significant. All query probes must be at
least 3 letters long.
Free text searching means that you don't have to use complete words in
your query, for example
SMITE> display/brief text "alciu"
would be a valid query if looking for the occurrence of the word
calcium.
Also, because all punctuation (INCLUDING SPACES) is ignored you can
use, for example, the two equivalent queries
SMITE> display/brief text "alphaamylase"
SMITE> display/brief text "phaamyl"
to detect both `ALPHA-AMYLASE' and 'ALPHA AMYLASE' occurrences. The
queries
SMITE> display/brief text "alpha-amylase"
SMITE> display/brief text "pha-amyl"
have EXACTLY the same effect as either of the previous queries as all
punctuation is removed from your query before the search is started.
This approach has numerous advantages when you consider the lack of
linguistic standardisation in molecular biology nomenclature. Try them
all.
As the scope of the TEXT function is ALL the general text it is a very
versatile function. For example, to find the database entry
corresponding to a PROSITE code you just have to type e.g.
SMITE> display text "ps00128"
The PTEXT function has precisely the same use as the TEXT function
but, whereas TEXT looks at the general text, PTEXT looks at the text
in the title lines of the pcodes. To see the difference type
SMITE> display text "prothrombin" and
SMITE> display ptext "prothrombin"
The TEXT query doesn't find anything but the PTEXT query gives the
following output:-
WORKLIST ENTRIES (1):
KRINGLE KRINGLE DOMAIN SIGNATURE
This shows that at least one of the pcode titles in the KRINGLE
fingerprint contains the word `prothrombin'. Again, only the default
fingerprint title is given by this query. To get a list of the pcode
titles you could have typed
SMITE> display/title ptext "prothrombin"
however, the word could be a little difficult to spot, especially if
there are a lot of pcodes containing the search string. Because of
this problem the /INFO qualifier is provided. This qualifier causes
SMITE to display each matching hit (in this case a pcode plus title)
as it finds them. The /INFO qualifier is only active when the query is
being performed; it cannot be used for redisplay of the worklist. Type
SMITE> display/info ptext "prothrombin"
and you'll get the following output:-
Matches for PTEXT probe PROTHROMBIN are:
KRINGLE THRB_BOVIN
PROTHROMBIN PRECURSOR (EC 3.4.21.5). - BOS TAURUS (BOVINE).
KRINGLE BOVTHBNM
BOVTHBNM preprothrombin - Bos taurus
KRINGLE THRB_HUMAN
PROTHROMBIN PRECURSOR (EC 3.4.21.5) (COAGULATION FACTOR II) - HO
KRINGLE THRB_MOUSE
PROTHROMBIN PRECURSOR (EC 3.4.21.5). - MUS MUSCULUS (MOUSE).
KRINGLE THRB_RAT
PROTHROMBIN PRECURSOR (EC 3.4.21.5). - RATTUS NORVEGICUS (RAT).
WORKLIST ENTRIES (1):
KRINGLE KRINGLE DOMAINS
This shows, for each occurrence, the database codename, the pcode and
the pcode title line.
The PCODE function allows you to find which database entries contain a
particular pcode. Type
SMITE> display pcode "5pti"
and you'll get
WORKLIST ENTRIES (1):
HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE
If you use the /INFO qualifier you'll get the pcode title line as
well. Type
SMITE> display/info pcode "5pti"
and you'll get
5PTI Trypsin Inhibitor (Crystal Form II) - Bovine (Bos Taurus) Pan
Matches for PCODE probe 5PTI are:
No. of matches = 1
HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE
WORKLIST ENTRIES (1):
HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE
SMITE also allows you to search the final motif sets on the basis of
sequence information. This is done using the SEQ function. Type
SMITE> display seq "iwg"
To get...
WORKLIST ENTRIES (2):
DAGPE DIACYLGLYCEROL/PHORBOL ESTER BINDING SIGNATURE
GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE
This shows that two fingerprintss contain at least one occurrence of
the peptide ile-trp-gly.
Use the /INFO qualifier to see the sequence in context by typing
SMITE> display/info seq "iwg"
and you'll get.....
Matches for SEQ probe IWG are:
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPC1_RABIT 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPC1_RAT 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPC2_BOVIN 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPC2_HUMAN 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPC2_RABIT 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPC2_RAT 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG RATPKCB1 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG RATPKCII 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG A37237 55 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCA_HUMAN 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCA_MOUSE 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCA_RABIT 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCA_RAT 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG MMUV25PKC 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCG_BOVIN 34 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCG_HUMAN 49 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCG_RABIT 49 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCG_RAT 49 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG B37237 47 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCTDF IWG KPCA_BOVIN 50 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCKDF IWG KPC1_DROME 59 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSLCRDF IWG APLPKCB 190 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCREF IWG KPC3_DROME 85 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCRDF IWG KPCE_MOUSE 183 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCRDF IWG KPCE_RABIT 183 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCRDF IWG KPCE_RAT 183 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCREF IWG KPCL_MOUSE 185 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCRDF IWG HSPKCE 183 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCREF IWG RNPKCETA 185 1
DAGPE2 DAG/PE element II - 5
Length = 10
CGHCKDF IWG KPC2_DROME 85 1
DAGPE2 DAG/PE element II - 5
Length = 10
CSHCREF IWG HUMPKCL 184 1
DAGPE2 DAG/PE element II - 5
Length = 10
CGQCSER IWG KPCZ_RAT 144 1
DAGPE2 DAG/PE element II - 5
Length = 10
CGQCSER IWG S25605 136 1
DAGPE2 DAG/PE element II - 5
Length = 10
CGQCSER IWG MUSPROKINC 144 1
GPCRRHOD4 GPCR transmembrane motif IV - 18
Length = 22
LVKFICLS IWG LSLLLALPVLL IL8B_HUMAN 155 12
GPCRRHOD4 GPCR transmembrane motif IV - 18
Length = 22
WAKLYSLV IWG CTLLLSSPMLV BRB2_HUMAN 145 13
GPCRRHOD7 GPCR transmembrane motif VII - 18
Length = 27
T IWG ACFAKSAACYNPIVYGISHPKYG OPS1_CALVI 308 12
GPCRRHOD7 GPCR transmembrane motif VII - 18
Length = 27
T IWG SVFAKANSCYNPIVYGISHPRYK CRBOPLE 309 13
GPCRRHOD7 GPCR transmembrane motif VII - 18
Length = 27
T IWG SVFAKANSCYNPIVYGISHPRYK CRBOPM 309 13
GPCRRHOD7 GPCR transmembrane motif VII - 18
Length = 27
T IWG ACFAKSAACYNPIVYGISHPKYR OPS1_DROPS 311 12
GPCRRHOD7 GPCR transmembrane motif VII - 18
Length = 27
T IWG ACFAKSAACYNPIVYGISHPKYR OPS1_DROME 310 12
GPCRRHOD7 GPCR transmembrane motif VII - 18
Length = 27
T IWG ATFAKTSAVYNPIVYGISHPNDR OPS2_DROPS 317 12
GPCRRHOD7 GPCR transmembrane motif VII - 18
Length = 27
T IWG ATFAKTSAVYNPIVYGISHPNDR DPRH2OP 317 12
GPCRRHOD7 GPCR transmembrane motif VII - 18
Length = 27
T IWG ATFAKTSAVYNPIVYGISHPKYR OPS2_DROME 317 12
The final function is the ELEMENT function. This allows you to select
database entries on the basis of how many elements make up the
fingerprint. Type
SMITE> display element "4"
and you'll get...
WORKLIST ENTRIES (4):
CANODO NODO CALCIUM BINDING SIGNATURE
DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE
KRINGLE KRINGLE DOMAIN SIGNATURE
SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE
These are all the entries whose fingerprints contain 4 elements. To see
this you could type
SMITE> display/type element "4"
The ELEMENT function also allows you to specify greater-than or
less-than parameters. Type
SMITE> display element ">4"
to get....
WORKLIST ENTRIES (4):
AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE
GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE
These are all the database entries whose fingerprints are made up of
more than 4 elements. Similarly type
SMITE> display element "<5"
to get...
WORKLIST ENTRIES (6):
HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE
CANODO NODO CALCIUM BINDING SIGNATURE
DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE
FERREDOXIN PLANT FERREDOXIN SIGNATURE
KRINGLE KRINGLE DOMAIN SIGNATURE
SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE
These are all the database entries whose fingerprints are made up of
less than 5 elements.
The SMITE functions can all be preceded by the NOT word. This negates
the worklist. For example, to find all the database entries which are
not composed of exactly 4 elements type
SMITE> display not element "4"
Try it on the other functions as well.
c) Complex queries
You now know how to use simple queries and how to display results. We
can now add another level of complexity and show further flexibility
of the SMITE query language.
SMITE allows you to use multiple functions in a query and to combine
the results of such queries. This introduces the idea of `OPERATORS'.
The operators available are AND, OR, XOR, ADD, SUBTRACT and NOT. They
are easy to use and have their intuitive meanings so don't be put off!
In order to show what the operators do by using examples type
SMITE> display ptext "mouse"
This will give....
WORKLIST ENTRIES (7):
AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE
DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE
FERREDOXIN PLANT FERREDOXIN SIGNATURE
GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE
KRINGLE KRINGLE DOMAIN SIGNATURE
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE
Now type
SMITE> display ptext "rat"
which will give....
WORKLIST ENTRIES (9):
HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE
CANODO NODO CALCIUM BINDING SIGNATURE
DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE
FERREDOXIN PLANT FERREDOXIN SIGNATURE
GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE
KRINGLE KRINGLE DOMAIN SIGNATURE
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE
SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE
As you can see some ptext entries are common to both lists and others
are unique. Now type
SMITE> display ptext "mouse" or ptext "rat"
To give...
WORKLIST ENTRIES (10):
AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE
HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE
CANODO NODO CALCIUM BINDING SIGNATURE
DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE
FERREDOXIN PLANT FERREDOXIN SIGNATURE
GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE
KRINGLE KRINGLE DOMAIN SIGNATURE
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE
SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE
This query has asked for `all entries which contain the ptext mouse OR
the ptext rat OR BOTH'. This is as you'd intuitively expect. Now try
SMITE> display ptext "mouse" and ptext "rat"
to give....
WORKLIST ENTRIES (6):
DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE
FERREDOXIN PLANT FERREDOXIN SIGNATURE
GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE
KRINGLE KRINGLE DOMAIN SIGNATURE
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE
This has shortened the worklist considerably. The query has asked for
`all entries which contain BOTH mouse AND rat in their ptext fields'.
Now try
SMITE> display ptext "mouse" xor ptext "rat"
to give....
WORKLIST ENTRIES (4):
AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE
HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE
CANODO NODO CALCIUM BINDING SIGNATURE
SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE
Again the list contains 4 entries but not the same as with AND. The
operator XOR stands for `exclusive or'. The query has asked for `all
entries which contain EITHER mouse OR rat BUT *NOT* BOTH'.
The operator called ADD is exactly the same as the OR operator, it is
just added for ease of understanding.
The SUBTRACT operator again does what you'd expect. Type
SMITE> display ptext "mouse" subtract ptext "rat"
to give....
WORKLIST ENTRIES (1):
AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE
This query has asked for a list of `all entries which contain mouse in
the ptext fields EXCEPT those which contain rat in the ptext fields'.
Again, SUBTRACT is added for clarity; it is actually equivalent to
`AND NOT'. Try typing
SMITE> display ptext "mouse" and not ptext "rat"
This will give the same answer as the previous query. This is because
a list is created of all those entries which contain `mouse', another
list is created of all those entries which *don't* contain `rat' and
the two lists are ANDed together. The ability to use constructs like
`AND NOT' or 'XOR NOT' is a powerful feature of SMITE but it is
advised that you gain experience with SMITE before using NOT in
earnest. Above all DON'T PANIC! Use SUBTRACT instead of AND NOT if it
is easier for you to understand.
Finally, using operators you can relate the results of ANY SMITE
function with ANY other one.
d) Multiple parameters
The use of operators makes a complex query easy to read but you can
use shortcuts. This is because SMITE functions can accept multiple
parameters. Type
SMITE> display code "kringle" or code "lyslact" then type
SMITE> display code "kringle lyslact"
You'll see that the result is the same. If the functions CODE, PCODE
or ELEMENT are given multiple parameters there is an implied OR. This
is sensible as an implied AND would result in nothing being selected
by the CODE functions!
The other SMITE functions have an implied AND. Type
SMITE> display text "amylase" and text "calcium" then type
SMITE> display text "amylase calcium"
Again the results are the same. Only those entries which contain both
words are selected. This implied AND is used by the TEXT, PTEXT and
SEQ functions.
e) Very complex queries
SMITE allows very complex queries. These are characterised by having
more than two operators in the query. As an example type...
SMITE> display ptext "mouse" or (seq "iwg" and ptext "rat")
to give...
WORKLIST ENTRIES (7):
AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE
DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE
FERREDOXIN PLANT FERREDOXIN SIGNATURE
GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE
KRINGLE KRINGLE DOMAIN SIGNATURE
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE
This query says `get me all entries which contain both the sequence
ile-trp-gly and the text rat in the ptext field PLUS those entries
which contain mouse in the ptext field. Just like an arithmetic
expression the parentheses tell SMITE in which order to perform its
operations. Now type
SMITE> display (ptext "mouse" or seq "iwg") and ptext "rat"
to give....
DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE
FERREDOXIN PLANT FERREDOXIN SIGNATURE
GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE
KRINGLE KRINGLE DOMAIN SIGNATURE
LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE
SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE
You can see how the position of the parentheses alters the meaning of
the query. This one says `get me all entries which contain rat in the
ptext field and also contain (either mouse in the ptext field or the
peptide ile-trp-gly in the final motif set or both)'. It is strongly
recommended that you use parentheses in very complex queries but you
don't have to; the following query is valid
SMITE> display ptext "mouse" or seq "iwg" and ptext "rat"
Try it to see which of the previous two queries it resembles. The
answer shows that very complex expressions in SMITE are worked out
from right to left. Parentheses avoid confusion!
The last two sections have shown how you can build up any arbitrarily
complex query using SMITE. In normal use you would try and keep each
query simple and therefore easy to understand. The use of LISTS,
explained later, enables you to use several simple queries instead of
one very complex query.
f) Other display commands
The two commands ISHOW and FSHOW allow you to display initial and
final motif sequence blocks. They both have the same syntax. These
commands do NOT accept a query but focus instead on those database
entries that are alreday in the worklist. First of all get a single
entry in the worklist by typing
SMITE> display code "aamyl"
This is a fingerprint with 5 elements. ISHOW and FSHOW typed on their
own will show all the sequence motif blocks for this code. Type
SMITE> ishow and
SMITE> fshow
to show this. These commands are made flexible by allowing a `range'
to be specified. Type
SMITE> ishow 4
This will show only motif 4 in the initial motif set. Now type
SMITE> fshow 2-4
This will show motifs 2, 3 and 4 in the final motif set. Now type
SMITE> ishow -3
This will show initial motifs 1, 2 and 3. Finally type
SMITE> fshow 4-
This will show final motifs from block 4 to the end i.e. motifs 4 and 5.
The /OUTPUT qualifier can be used with these commands
g) Other commands which can accept a query.
These are the commands PLUSWORK and MINUSWORK. Just like DISPLAY they
can accept simple, complex and very complex queries. PLUSWORK adds the
results of a query to the worklist whereas MINUSWORK subtracts the
results of a query. Unlike DISPLAY they do not show the contents of
the worklist after completion. Try the following two sets of examples
each of which are equivalent.
SMITE> display code "kringle" or code "aamyl" is equivalent to
SMITE> display code "kringle"
SMITE> pluswork code "aamyl"
SMITE> display
similarly
SMITE> display ptext "mouse" subtract ptext "rat" is equivalent to
SMITE> display ptext "mouse"
SMITE> minuswork ptext "rat"
SMITE> display
The final DISPLAY commands are just so you can confirm what is in the
worklist. One interesting and useful feature of MINUSWORK is that, if
typed without a query, it will clear the worklist.
h) How LISTS make life easy
Lists allow you to break up complex and very complex queries into
simple steps. SMITE, like DELPHOS, contains two lists. You've already
met the WORKLIST. The results of every query go into the worklist. The
other list, the STORELIST, is provided for your benefit, use it!
Several commands operate on these lists.
STOREWORK: makes another copy of the worklist in the storelist
RECALLWORK: makes another copy of the storelist in the worklist
SWAPWORK: transposes the two lists
ORLISTS: ORs the storelist with the worklist leaving the result
in the worklist
ANDLISTS: ANDs the storelist with the worklist leaving the result
in the worklist
XORLISTS: XORs the storelist with the worklist leaving the result
in the worklist.
WORKSAVE: saves the entry codes in the worklist to a file of your
choice.
WORKREAD: reads entry codes from one of your files into the
worklist
STORESAVE: same as WORKSAVE but uses the storelist
STOREREAD: same as WORKREAD but uses the storelist
As an example of how to use these lists consider the very complex
query we used earlier i.e.
SMITE> display (ptext "mouse" or seq "iwg") and ptext "rat"
Try this again and then try the following series of commands which are
equivalent
SMITE> display ptext "mouse"
SMITE> storework
SMITE> display seq "iwg"
SMITE> orlists
SMITE> storework
SMITE> display ptext "rat"
SMITE> andlists
SMITE> display
To an experienced user this may seem like using a sledgehammer to
crack a nut but the flow of operations is clearer for the novice. Next
try
SMITE> display code "kringle"
SMITE> storework
SMITE> minuswork
SMITE> display
SMITE> recallwork
SMITE> display
SMITE> display code "aamyl"
SMITE> swapwork
SMITE> display
SMITE> swapwork
SMITE> display
to see the effects of moving lists around. To see the save and read
operations in action try the following.
SMITE> display code "kringle"
SMITE> worksave MYWORK.DAT
SMITE> minuswork (clear the worklist)
SMITE> display (the worklist will be empty)
SMITE> storeread MYWORK.DAT (load the data ino the storelist)
SMITE> display (the worklist will still be empty)
SMITE> recallwork
SMITE> display (the worklist is restored)
The above sequence could have been shortened by reading the file
directly into the worklist by using WORKREAD.
i) NEGWORK
This command, as its name implies, negates the worklist. What that
means is that, after executing this command, all entries which were in
the worklist are removed and replaced by those entries which were NOT
in the worklist before. As an example, the query
SMITE> display not ptext "rat"
could be replaced by the sequence of commands
SMITE> display ptext "rat"
SMITE> negwork
SMITE> display
j) How can I see what entries are in the PRINTS database?
Try typing the following and then think how they work.
SMITE> minuswork
SMITE> negwork
SMITE> display
or
SMITE> display element "<20"
Answer: The first example clears the worklist using MINUSWORK and
then negates the worklist. This guarantees all entries in
the database will be in the worklist
The second example relies on the fact that there won't be
any fingerprints of 20 elements (or more!) in the database.
k) EXTRACT
The EXTRACT command allows you to extract the final motif sets from
entries in the worklist into MOT files suitable for use by ADSP. Try
typing
SMITE> display code "lyslact"
SMITE> extract
and look at the files produced.
l) Shortcuts
Commands and qualifiers in SMITE may be abbreviated down to the point
of no conflict with other instructions. For example
SMITE> display/brief code "kringle" could be abbreviated to
SMITE> d/b code "kringle"
Also, if no command is given, the DISPLAY command is assumed therefore
the following two queries are equivalent
SMITE> display ptext "rat"
SMITE> ptext "rat"
In order to redisplay the worklist using qualifiers only the
abbreviated qualifier is necessary so the following SMITE statements
are equivalent
SMITE> display/history
SMITE> /h
1.2 Summary of SMITE commands, qualifiers, functions and operators
a) Commands
DISPLAY [qual] [query] default command to display the results
of a query
FSHOW [range] Display final motif blocks
ISHOW [range] Display initial motif blocks
STOREWORK Copy worklist to storelist
RECALLWORK Copy storelist to worklist
SWAPWORK Transpose worklist and storelist
WORKSAVE [file] save worklist to a file
WORKREAD [file] recreate worklist from a file
STORESAVE [file] save storelist to a file
STOREREAD [file] recreate storelist from a file
ANDLISTS storelist AND worklist -> worklist
ORLISTS storelist OR worklist -> worklist
XORLISTS storelist XOR worklist -> worklist
NEGWORK negate worklist
PLUSWORK query Add results of query to worklist
MINUSWORK [query] Subtract results of query from
worklist
EXTRACT Extract final motif sets to MOT files
HELP brief help sheet
BYE EXIT QUIT leave SMITE
b) Functions
CODE select a database entry code
PCODE select a protein code (pcode)
TEXT search general text
PTEXT search pcode text
SEQ search final motif set polypeptides
ELEMENT select entries based on the number of elements
in the fingerprint
c) Operators
AND perform a boolean AND
OR perform a boolean OR
XOR perform a boolean exclusive-OR
ADD same as OR
SUBTRACT perform a boolean AND NOT
NOT negate the results of a function
d) Qualifiers
/AUTHOR display the fingerprint creator + date
/CFI display the composite fingerprint index
/COMMENT display comment information
/FMOTIF display final motifs
/HISTORY display scan history
/IMOTIF display initial motifs
/REFERENCE display bibliography
/SUMMARY display summary information
/TITLE display pcodes
/TYPE display type of fingerprint
/BRIEF show brief information
/FULL show all information
/INFO show context in PTEXT, PCODE and SEQ queries
/OUTPUT=file redirect screen output to a file
/PRINTER redirect screen output to the default system printer.
2.0 References
1. Attwood, T.K., Beck, M.E., Bleasby, A.J. and Parry-Smith, D.J. (1994)
PRINTS - A database of protein motif fingerprints. Nucleic Acids Research,
in press.
2. Attwood, T.K. and Beck, M.E. (1994) PRINTS - A protein motif finger-
print database. Protein Engineering, 7 (7), 841-848.
3. Parry-Smith, D.J. and Attwood, T.K. (1992) ADSP - A new package for
computational sequence analysis. CABIOS 8 (5) 451-459.
4. Bleasby, A.J., Akrigg, D. and Attwood, T.K. (1994) OWL - A non-
redundant composite protein sequence database. Nucleic Acids Research,
in press.
5. Bleasby, A.J. and Wootton, J.C. (1990) Construction of validated,
non-redundant composite protein sequence databases. Protein Engineering 3
(3) 153-159.
6. Parry-Smith, D.J. and Attwood, T.K. (1991) SOMAP - A novel interactive
approach to multiple protein sequence alignment. CABIOS 7 (2) 233-235.
7. Akrigg, D., Attwood, T.K, Bleasby, A.J., Findlay, J.B.C., Maughan, N.A.,
North, A.C.T., Parry-Smith, D.J., Perkins, D.N. and Wootton, J.C. (1992)
SERPENT: An information storage and analysis resource for protein sequences.
CABIOS, 8 (3), 295-296.
8. Perkins, D.N. and Attwood, T.K. (1994) VISTAS - A package for VIsualising
STructures And Sequences of proteins. J.Mol.Graph., submitted.
2.1 Applications
1. Attwood, T.K. and Findlay, J.B.C. (1994) Fingerprinting G-Protein-Coupled
Receptors. Protein Engineering, 7 (2), 195-203.
2. Attwood, T.K. and Findlay, J.B.C. (1993) Design of a discriminating
fingerprint for G-protein-coupled receptors. Protein Engineering, 6 (2),
167-176.
3. Flower, D.R., North, A.C.T. and Attwood, T.K. (1993) Structure and
Sequence Relationships in the Lipocalins and Related Proteins. Protein
Science, 2, 753-761.
4. Boguski, M.S., Bairoch, A., Attwood, T.K. and Michaels, G.S. (1992)
Proto-vav and Gene Expression. Nature, 358, 113.
5. Flower, D.R., North, A.C.T. and Attwood, T.K. (1991) Mouse oncogene
protein 24p3 is a member of the Lipocalin protein family. Biochemical and
Biophysical Research Communications, 180 (1), 69-74.
------ * ------