Now that you have obtained information about the functional characteristics of alpha amylase, in this exercise you will be comparing the molecular structure of the enzyme in the three species. The tutorial will guide you through finding the gene sequences and comparing them with the BLAST and ClustalW tools.
Bioinformatics is the acquisition, storage, arrangement, identification, analysis, and communication of information related to biology. The term was coined in 1990 with the use of computers in DNA sequence analysis. Think of it as the “theoretical” branch of molecular biology – like the relationship of theoretical physics to the general field of physics.
You will be using the DNA and protein sequence on-line databases that are the core of bioinformatics. There are two general types of sequence databases: Primary databases contain experimental results in an accessible format, but are not sequences that are a population consensus. DDBJ, EMBL, and GenBank are primary databases. Secondary databases are curated to reflect consensus sequences from multiple experiments and usually use the primary databases as their sources.
DDBJ – DNA Databank of Japan
EMBL – European Molecular Biology Laboratory
NCBI – National Center for Biotechnology Information
BLAST – Basic local alignment search tool
The standard sequence format is called FASTA. All FASTA sequences start with a definition line which consists of:
Every coding sequence also has a unique protein number assigned to it, starting with AA.
Reference sequences (which undergo continuing curation) are the most complete and up-to-date and always start with NT for DNA, NM for mRNA, or NP for protein. Hint – these are the ones you want to use if possible.
You are now ready to start the tutorial.