The goal of defining a structural alphabet is to code a 3D structure
fragment of protein backbones and is to represent a 3D protein structure
by a serial of structural alphabets. An alphabet represents pattern profiles
of the backbone fragments (five residues long) derived from the pair database,
therefore, a protein structure of L residues is described by a
structural alphabet sequence of L-4 alphabets. We developed a nearest-neighbor clustering (NNC)
algorithm to cluster 225523 3D-protein fragments into 23 groups,
which are represented by respective structural alphabets.
We found that these 23 structural alphabets can
represent the profiles of most of the 3D fragments and be roughly divided into
five categories: Helix alphabet (A, Y, B, C, and D), helix-like alphabet (G, I,
and L), strand alphabet (E, F, and H), strand-like alphabet (K and N), and
others. The 3D sharps of representation segments in the same category are
similar. For example, the sharps of 3D segments in the helix alphabets are
similar and the ones of strand alphabets are also similar. These 3D-fragment sharps
and structural alphabets are shown in the following Figure.
|