Modeling the proteins

Binding motifs can be widely distributed in the protein sequence and involve only few, if any, consecutive amino acids. Thus we cannot really use the sequence and we have to focus on the pure 3D structure. Moreover, the nature of amino acids themselves can vary since only part of the residues are used to bind.

Thus we model the protein by the unordered set of the geometric configurations of the amino acids. However, enforcing the amino acid types to be conserved makes the recognition process much easier.

Each amino acid has 4 atoms participating to the backbone of the protein, 3 of them always being in the same geometric configuration. We use these 3 atoms to define the configuration of the amino acid in space. Each amino acid is thus modeled as a frame, and a protein is modeled by an unordered set of frames.

Backbone of the protein

Protein modeled as an

unordered set of frames

To find similar substructures in two proteins, we now have to find two subsets of frames that are in the same configuration, up to a global rigid transformation.


[ Back to geometry and molecular biology ]

xpennec@sophia.inria.fr