Home - Knowledge Center - Bioinformatics

THE CREATION OF SEQUENCE DATABASES

Most biological databases consist of long strings of nucleotides (guanine, adenine, thymine, cytosine and uracil) and/or amino acids (threonine, serine, glycine, etc.). Each sequence of nucleotides or amino acids represents a particular gene or protein (or section thereof), respectively. Sequences are represented in shorthand, using single letter designations. This decreases the space necessary to store information and increases processing speed for analysis.

While most biological databases contain nucleotide and protein sequence information, there are also databases, which include taxonomic information such as the structural and biochemical characteristics of organisms. The power and ease of using sequence information has however, made it the method of choice in modern analysis.


advertisement  

In the last three decades, contributions from the fields of biology and chemistry have facilitated an increase in the speed of sequencing genes and proteins. The advent of cloning technology allowed foreign DNA sequences to be easily introduced into bacteria. In this way, rapid mass production of particular DNA sequences, a necessary prelude to sequence determination, became possible. Oligonucleotide synthesis provided researchers with the ability to construct short fragments of DNA with sequences of their own choosing. These oligonucleotides could then be used in probing vast libraries of DNA to extract genes containing that sequence. Alternatively, these DNA fragments could also be used in polymerase chain reactions to amplify existing DNA sequences or to modify these sequences. With these techniques in place, progress in biological research increased exponentially.

For researchers to benefit from all this information, however, two additional things were required:

  • Ready access to the collected pool of sequence information and
  • Ways to extract from this pool only those sequences of interest to a given researcher. Simply collecting, by hand, all necessary sequence information of interest to a given project from published journal articles quickly became a formidable task. After collection, the organization and analysis of this data still remained. It could take weeks to months for a researcher to search sequences by hand in order to find related genes or proteins.

Computer technology has provided the obvious solution to this problem. Not only can computers be used to store and organize sequence information into databases, but they can also be used to analyze sequence data rapidly. The evolution of computing power and storage capacity has, so far, been able to outpace the increase in sequence information being created. Theoretical scientists have derived new and sophisticated algorithms, which allow sequences to be readily, compared using probability theories. These comparisons become the basis for determining gene function, developing phylogenetic relationships and simulating protein models.

The physical linking of a vast array of computers in the 1970's provided a few biologists with ready access to the expanding pool of sequence information. This web of connections, now known as the Internet, has evolved and expanded so that nearly everyone has access to this information and the tools necessary to analyze it.

 

 



QUICK LINKS
International Medical Informatics Association
American Medical Informatics Association
UK Health Informatics Society
International Society of Computational Biology
Bioinformatics.Org
European Bioinformatics Institute


Knowledge Center
Biomedical Informatics
Bioinformatics
Gene Searching
Protein Modeling
Sequence Databases
Health Informatics
Healthcare Technologies


Last Updated: 10 August 2006.



Copyright © 2017 Biohealthmatics.com. All Rights Reserved. Contact Us - About Us - Privacy Policy - Terms & Conditions - Resources

Can't find what you are looking for? View our Site Map