Home - Knowledge
Center - Bioinformatics
SEARCHING FOR GENES
The collecting, organizing and indexing of sequence
information into a database, a challenging task, provides the scientist with a
wealth of information, albeit of limited use. The power of a database comes not
from the collection of information, but in its analysis. A sequence of DNA does
not necessarily constitute a gene. It may constitute only a fragment of a gene
or alternatively, it may contain several genes.
Luckily,
in agreement with evolutionary principles, scientific research to date has
shown that all genes share common elements. For many genetic elements, it has
been possible to construct consensus sequences, those sequences best
representing the norm for a given class of organisms (e.g., bacteria,
eukaroytes).
advertisement
Common genetic elements include promoters, enhancers,
polyadenylation signal sequences and protein binding sites. These elements have
also been further characterized into further subelements.
Genetic
elements share common sequences, and it is this fact that allows mathematical
algorithms to be applied to the analysis of sequence data. A computer program
for finding genes will contain at least the following elements:
- Algorithms
for pattern recognition:
Probability formulae are used to determine if two sequences are statistically
similar.
- Data
Tables:
These tables contain information on consensus sequences for various genetic
elements. More information enables a better analysis.
- Taxonomic
Differences:
Consensus sequences vary between different taxonomic classes of organisms.
Inclusion of these differences in an analysis speeds processing and minimizes
error.
- Analysis
Rules:
These programming instructions define how algorithms are applied. They define
the degree of similarity accepted and whether entire sequences and/or fragments
thereof will be considered in the analysis. A good program design enables users
to adjust these variables.
|
|