Home - Knowledge Center - Bioinformatics


There are a myriad of steps following the location of a gene locus to the realization of a three-dimensional model of the protein that it encodes.

STEP 1: Location of Transcription Start/Stop
A proper analysis to locate a genetic locus will usually have already pinpointed at least the approximate sites of the transcriptional start and stop.

Such an analysis is usually sufficient in determining protein structure. It is the start and end codons for translation that must be determined with accuracy.


STEP 2: Location of Translation Start/Stop
The first codon in a messenger RNA sequence is almost always AUG. While this reduces the number of candidate codons, the reading frame of the sequence must also be taken into consideration.

There are six reading frames possible for a given DNA sequence, three on each strand, which must be considered, unless further information is available.

Since genes are usually transcribed away from their promoters, the definitive location of this element can reduce the number of possible frames to three.

There is not a strong consensus between different species surrounding translation start codons. Hence location of the appropriate start codon will include a frame in which they are not apparent abrupt stop codons.

Knowledge of a proteins predicted molecular mass can assist this analysis. Incorrect reading frames usually predict relatively short peptide sequences.

Therefore, it might seem deceptively simple to ascertain the correct frame. In bacteria, such is frequently the case. However, eukaryotes add a new obstacle to this process: INTRONS!

STEP 3: Detection of Intron/Exon Splice Sites
In eukaryotes, the reading frame is discontinuous at the level of the DNA because of the presence of introns. Unless one is working with a cDNA sequence in analysis, these introns must be spliced out and the exons joined to give the sequence that actually codes for the protein.

Intron/exon splice sites can be predicted on the basis of their common features. Most introns begin with the nucleotides GT and end with the nucleotides AG.

There is a branch sequence near the downstream end of each intron involved in the splicing event. There is a moderate consensus around this branch site.

STEP 4: Prediction of 3-D Structure
With the completed primary amino acid sequence in hand, the challenge of modeling the three-dimensional structure of the protein awaits. This process uses a wide range of data and CPU-intensive computer analysis.

Most often, one is only able to obtain a rough model of the protein, and several conformations of the protein may exist that are equally probable.

The best analyses will utilize data from all the following sources:

  • Pattern Comparison: Alignment to known homologues whose conformation is more secure.
  • X-ray Diffraction Data: Most ideal when some data is available on the protein of interest. However, diffraction data from homologous proteins is also very valuable.
  • Physical Forces/Energy States: Biophysical data and analyses of an amino acid sequence can be used to predict how it will fold in space.

All of this information is used to determine the most probable locations of the atoms of the protein in space and bond angles. Graphical programs can then use this data to depict a three-dimensional model of the protein on the two-dimensional computer screen.




International Medical Informatics Association
American Medical Informatics Association
UK Health Informatics Society
International Society of Computational Biology
European Bioinformatics Institute

Knowledge Center
Biomedical Informatics
Gene Searching
Protein Modeling
Sequence Databases
Health Informatics
Healthcare Technologies

Last Updated: 10 August 2006.

Copyright © 2018 Biohealthmatics.com. All Rights Reserved. Contact Us - About Us - Privacy Policy - Terms & Conditions - Resources

Can't find what you are looking for? View our Site Map