Introduction To Bioinformatics Its Applications And Types Of Biological Databases
Ghanshyam Sahu, Shikha Bishnoi
Ph.D. Scholar -Animal Biochemistry Division Indian Veterinary Research Institute
Izatnagar, Bareilly-243122
Introduction
The use of computer science and information technology to the study of biology is known as bioinformatics. Bioinformatics, a branch of science associated with genetics and genomics, collects, stores, analyses, and disseminates biological data and information, such as DNA and amino acid sequences or annotations about those sequences. Databases that organize and index this biological data are used by researchers and medical professionals to better understand health and disease, and in some circumstances, as a component of patient care. Early in the 1990s, when scientists started using computers to sequence DNA, the field of bioinformatics was born. Early bioinformatics tools were frequently extremely basic at the time because there was no standardized mechanism to store and analyze this data. But recently, the area has advanced quickly, and researchers now have access to a variety of cutting-edge bioinformatics tools.
It is a multidisciplinary field that creates techniques and computer programs to comprehend biological data. Medical researchers have long employed bioinformatics to better understand the genetic causes of disease. The development of new medicines and the improvement of disease diagnostics and therapy both benefit from the use of bioinformatics technologies.
Bioinformatics has many distinct applications. The list below includes a few of the more typical ones.
- Sequence alignment involves aligning two or more DNA or protein sequences to find regions of similarity. This can be used to locate new therapeutic targets, identify shared genetic sequences among many species, and understand the evolutionary links between them.
- Sequence analysis is the process of looking at the sequence of a protein or piece of DNA to find properties like genes, regulatory components, and protein domains. This data can be used to investigate how genes and proteins work and to find possible therapeutic targets.
- Measurement of the levels of gene expression in various cells or tissues is known as gene expression analysis. This can be used to categorize various cell types, examine how genes work, and find new therapeutic targets.
- The process of sequencing a species’ whole genome is known as genome sequencing. This can be used to investigate a species’ genetic make-up, find genes and genetic variants, and investigate the evolutionary connections between various species.
- The technique of sequencing the transcriptomes of various cells or tissues is known as transcriptome sequencing. This can be used to categorize various cell types, examine how genes work, and find new therapeutic targets.
- Proteome sequencing – This is the process of sequencing the proteomes of different cells
- Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis.
- 1552 databases that are publicly accessible online till 2014 and 1665 till 2016
Basis of Classification
- Scope of data coverage
Comprehensive: include a variety of data types from a wide range of species; typical examples include GenBank, the European Molecular Biology Laboratory (EMBL), and the DNA Data Bank of Japan (DDB).
Specialized databases: It contains certain kinds of data or information from particular species. RiceWiki is a community-curated database of rice genes, while Worm Base is for nematode biology and genomics.
B. According to level of data curation
- Primary Database : Raw data
- Secondary database: processed data
- Derivatives / hybrid database
According to Method of biocuration
- expert-curated databases
- community-curated databases
According to type of data managed
- DNA : ex. NCBI RefSeq
- RNA: RNAcentral
- Protein: PDB
- Expression: ex. EBI-GEO
- Pathway: ex. Reactome
- Disease:ex. KEGG DISEASE Database
- Nomenclature: HGNC, Genew
- Literature:NCBI-PUBMED
- Standard and ontology. GO Molecular , Cellular
Primary database– Data from experiments, such as protein sequences, nucleotide sequences, and macromolecular structures, are entered into primary databases. Data in primary databases are never modified after they are assigned a database accession number since they are considered part of the scientific record.
Examples: PDB, NCBI-SRA
Secondary databases– By contrast, secondary databases comprise data derived from the results of analyzing primary data. Information can be obtained from databases (primary and secondary), controlled vocabularies and the scientific literature. Results of data curation complex combination of computational algorithms and manual analysis and interpretation to derive new knowledge from the public record of science.
Examples: PDB, NCBI-SRA
Conclusion
We can draw the conclusion that bioinformatics has evolved as a very promising and significant discipline for academics, research, and industry applications for data storage, data warehousing, sequence analysis, etc., which has influenced scientific, engineering, and economic development of the world.
References
https://www.genome.gov/genetics-glossary/Bioinformatics
Application of Omics Tools in Veterinary & Animal Science Research