Explanation of input

SCOP identifier (Protein domain)

SCOP identifier is specified in seven characters: dPPPPCN where d is the SCOP assigned identifier, PPPP is the Protein Data Bank (PDB) assigned identifier, C is the chain identifier, and N indicates the serial number of domains on this protein chain. A user must input a SCOP code that exists in SCOP 1.73 database, otherwise he/she needs to upload a three-dimensional protein domain structure with PDB format.

PDB identifier and chain (Protein chain)

PDB (Protein Data Bank) identifier is specified in a 4-character PDB assigned identifier. The chain identifier is defined by one-character as found in the PDB file for the desired chain. If the chain identifier is not provided, it will be assumed that this protein includes only one chain, with a blank chain identifier.
Users must input a PDB code which is present in PDB before Feb 24, 2006, otherwise, the user needs to upload a protein 3D structure file with PDB format.

Upload files by user (New protein structure)

If the query protein is a new structure or is not available in our SADB, the user can upload a protein structure file with PDB format(*.ent or *.pdb) and input the chain identifier. If the chain identifier is not provided, it will be assumed that the uploaded file only includes one chain with a blank chain identifier. The upload structure file must be with minimum or at least HEADER and ATOM records in PDB format. The description of file format can be found in PDB guide webpage. Click here to download an example pdb file.

Choose database

Currently, the available structural databases are listed as follows:

  • PDB (30-Jul-08): RCSB Protein Data Bank up to July 30, 2008. There are 117,513 chains in 52,103 proteins
  • nr-PDB-90: The database is based on non-redundant PDB chain set, maintained by RCSB PDB. The directory contains the results of the weekly clustering of protein chains in the PDB with cd-hit. These clusters are used in the "remove similar sequences" feature on the PDB web sites. "nr-PDB-90" is used in the clusters at 90% sequence identity. There are 18,638 chains in 17,419 proteins.
  • SCOP all: SCOP 1.73. There are 95410 domains.
  • SCOP 95%: SCOP 1.73 with less than 95% identity. There are 15233 domains.
  • SCOP 40%: SCOP 1.73 with less than 40% identity. There are 9527 domains.

The list of proteins in these structural databases can be downloaded from the Download page.


The E-value specifies the statistical significance of an alignment to obtain an indication of the reliability of the searching. This setting is a threshold for reporting matching protein structures against structural database. The default value (10-10) means that such matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance is greater than the E-value, the match will not be reported. The lower E-value is more stringent, causing to fewer number of matches being reported.
Read E-value page for more information.

Maximum number of hit structures

Restricting the maximum number of matching protein structures. Increasing the number shows more matches and spends more processing time.


Explanation of output


"Length" is the alignment length of aligning a query structure to a subject structure by using BLAST with structural alphabets.


"Score" is the score of aligning a query structure to subject structure by using BLAST using structural alphabet substitution matrix (SASM). In general, more meaningful alignments correspond to lower E-values and higher scores.


"%Iden" is the structural alphabet identity, which is similar to sequence identity in protein sequence alignment, of aligning a query structure to subject structure by using BLAST. It is given as (the number of identical alphabets)/ (alignment length).


"%Gaps" is the gap percentage of aligning a query structure to a subject structure using BLAST with structural alphabets. It is given as (the number of gaps)/ (alignment length).

Structure alignment

Do detailed structure alignment for structure visualisation/superimposition between query and subject structures.


This description is summarized from TITLE/HEADER in PDB or SCOP.