Substitution matrix (SASM)

Substitution matrices are the key component of protein-alignment methods. We developed a new structural alphabet substitution matrix (SASM) for 3D-BLAST. The SASM is similar to BOLSUM 62 in BLAST for protein sequences search. The SASM matrix (2323) offers insights about substitution preferences of 3D segments between homologous structures with low sequence identity. The highest substitution score in this matrix is for the alignment of an alphabet W with an alphabet W, in which the sharp of the representative segment is similar to b-turn that that allow the peptide backbone to fold back and bear great significance in protein structure and function. This value is 11. In total, most of these segments (95.25%) in W are the b-turn based on the tool PROMOTIF. The substitution scores are high when two identical structural alphabets (e.g., diagonal entries) are aligned. For example, the alignment scores are high if I and S are aligned to I and S, respectively. Most of the substitution scores are positive if two structural alphabets in the same category, e.g., helix alphabets (A, Y, B, C, and D), are aligned together since the sharps of their representatives are similar. On the other hand, the lowest substitution score (-15) in this SASM matrix is for the alignment of the Y (a helix alphabet) with the E (a strand alphabet). All of the substitution scores are low when the helix alphabets (A, Y, B, C, and D) are aligned to strand alphabets (E, F, and H). The above relationships are well known, showing that the SASM embodies conventional knowledge about structure secondary conservation in proteins.

Structural alphabets substitution matrix (SASM) of 3D-BLAST. The scores are high if similar alphabets are aligned, e.g., helix alphabets (A, Y, B, C, and D) are aligned to helix alphabets. In contrast, the scores are low when helix alphabets are aligned to strand alphabets.