Key Points
-
Literature-mining tools are becoming essential to researchers because of the growth of the scientific literature and the shift from studying individual genes and proteins to entire systems.
-
Currently, information-retrieval tools such as PubMed are by far the most commonly used literature-mining methods among biologists.
-
Methods for identifying the genes, proteins and other entities that are mentioned in the literature — known as entity recognition — are key components of most complex literature-mining systems.
-
Recently, methods for extracting biomedical facts from text have improved considerably. Such methods will probably soon become mainstream tools for the annotation and analysis of large-scale experimental data sets.
-
By combining facts that have been extracted from several papers, text-mining methods can discover both global trends and generate new hypotheses that are based on the existing literature.
-
To realize the full discovery potential of literature mining, it should be integrated with other data types. Protein networks are well suited for unifying large-scale experimental data with knowledge that has been extracted from the biomedical literature.
-
Data-integration methods have also been developed for ranking candidate genes for inherited diseases and for associating genes with phenotypic characteristics.
-
Bridging the gap between biologists and computational linguists will be crucial to the success of approaches that integrate literature mining with high-throughput experimental data. We hope that this review will inspire more biologists to become actively involved in the development of literature-mining tools.
Abstract
For the average biologist, hands-on literature mining currently means a keyword search in PubMed. However, methods for extracting biomedical facts from the scientific literature have improved considerably, and the associated tools will probably soon be used in many laboratories to automatically annotate and analyse the growing number of system-wide experimental data sets. Owing to the increasing body of text and the open-access policies of many journals, literature mining is also becoming useful for both hypothesis generation and biological discovery. However, the latter will require the integration of literature and high-throughput data, which should encourage close collaborations between biologists and computational linguists.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Rebholz-Schuhmann, D. Facts from text — is text mining ready to deliver. PLoS Biol. 3, e65 (2005).
Andrade, M. A. & Bork, P. Automated extraction of information in molecular biology. FEBS Lett. 476, 12–17 (2000).
Hirschman, L., Park, J. C., Tsujii, J., Wong, L. & Wu, C. H. Accomplishments and challenges in literature data mining for biology. Bioinformatics 18, 1553–1561 (2002).
Yandell, M. D. & Majoros, W. H. Genomics and natural language processing. Nature Rev. Genet. 3, 601–610 (2002).
Krallinger, M. & Valencia, A. Text-mining and information-retrieval services for molecular biology. Genome Biol. 6, 224 (2005).
Asano, S. et al. Concerted mechanism of swe1/wee1 regulation by multiple kinases in budding yeast. EMBO J. 24, 2194–2204 (2005).
Wilbur, W. J. & Yang, Y. An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Comput. Biol. Med. 26, 209–222 (1996).
Wilbur, W. J. & Coffee, L. The effectiveness of document neighboring in search enhancement. Inf. Process. Manage. 30, 253–266 (1994).
Renner, A. & Aszodi, A. High-throughput functional annotation of novel gene products using document clustering. Pac. Symp. Biocomput. 5, 50–68 (2000).
Iliopoulos, I. Enright, A. J. & Ouzounis, C. A. Textquest: document clustering of Medline abstracts for concept discovery in molecular biology. Pac. Symp. Biocomput. 6, 384–395 (2001).
Glenisson, P., Antal, P., Mathys, J., Moreau, Y. & De Moor, B. Evaluation of the vector space representation in text-based gene clustering. Pac. Symp. Biocomput. 8, 391–402 (2003).
Marcotte, E. M., Xenarios, I. & Eisenberg, D. Mining literature for protein–protein interactions. Bioinformatics 17, 359–363 (2001).
Bhalotia, G., Nakov, P. I., Schwartz, A. S. & Hearst, M. A. BioText team report for the TREC 2003 genomics track [online], <http://trec.nist.gov/pubs/trec12/papers/ucal-berkeley.genomics.pdf> (2003).
Donaldson, I. et al. PreBIND and Textomy — mining the biomedical literature for protein–protein interactions using a support vector machine. BMC Bioinformatics 4, 11 (2003).
Kayaalp, M. et al. Methods for accurate retrieval of MEDLINE citations in functional genomics [online], <http://trec.nist.gov/pubs/trec12/papers/nlm.genomics.pdf> (2003).
Goetz, T. & von der Lieth, C.-W. PubFinder: a tool for improving retrieval rate of relevant PubMed abstracts. Nucleic Acids Res. 33, W774–W778 (2005).
Shah, P. K., Jensen, L. J., Boue, S. & Bork, P. Extraction of transcript diversity from scientific literature. PLoS Comp. Biol. 1, e10 (2005).
Suomela, B. P. & Andrade, M. A. Ranking the whole MEDLINE database according to a large training set using text indexing. BMC Bioinformatics 6, 75 (2005).
Hersh, W. & Bhuptiraju, R. T. TREC genomics track overview [online], <http://trec.nist.gov/pubs/trec12/papers/GENOMICS.OVERVIEW3.pdf> (2003).
Hersh, W. R. et al. TREC 2004 genomics track overview [online], <http://trec.nist.gov/pubs/trec13/papers/GEO.OVERVIEW.pdf> (2004).
Büttcher, S., Clarke, C. L. A. & Cormack, G. V. Domain-specific synonym expansion and validation for biomedical information retrieval [online], <http://trec.nist.gov/pubs/trec13/papers/uwaterloo-clarke.geo.Pdf> (2004).
Tanabe, L. et al. MedMiner: An internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27, 1210–1217 (1999).
Muller, H. M., Kenny, E. E. & Sternberg, P. W. Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2, e309 (2004). This paper presents an advanced full-text IR tool that is designed for the Caenorhabditis elegans research community.
Perez-Iratxeta, C., Bork, P. & Andrade, A. M. XplorMed: a tool for exploring MEDLINE abstracts. Trends Biochem. Sci. 26, 573–575 (2001).
Hoffmann, R. & Valencia, A. A gene network for navigating the literature. Nature Genet. 36, 664 (2004).
Doms, A. & Schroeder, M. GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Res. 33, W783–W786 (2005).
Hoffmann, R. et al. Text mining for metabolic pathways, signaling cascades, and protein networks. Sci. STKE 283, pe21 (2005).
Fukuda, K., Tamura, A., Tsunoda, T. & Takagi, T. Toward information extraction: identifying protein names from biological papers. Pac. Symp. Biocomput. 3, 707–718 (1998).
Tanabe, L. & Wilbur, W. J. Tagging gene and protein names in biomedical text. Bioinformatics 18, 1124–1132 (2002).
Coller, N., Nobata, C. & Tsujii, J. Extracting the names of genes and gene products with a hidden Markov model. Int. Conf. Comput. Linguist. 18, 201–207 (2000).
Chang, J. T., Schutze, H. & Altman, R. B. GAPSCORE: finding gene and protein names one word at a time. Bioinformatics 20, 216–225 (2004).
McDonald, R. & Pereira, F. Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics 6, S6 (2005).
Settles, B. ABNER: an open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics 21, 3191–3192 (2005).
Zhou, G., Shen, D., Zhang, J., Su, J. & Tan, S. Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics 6, S7 (2005).
Krauthammer, M., Rzhetsky, A., Morozov, P. & Friedman, C. Using BLAST for identifying gene and protein names in journal articles. Gene 259, 245–252 (2000).
Leonard, J. E., Colombe, J. B. & Levy, J. L. Finding relevant references to genes and proteins in Medline using a Bayesian approach. Bioinformatics 18, 1515–1522 (2002).
Mika, S. & Rost, B. Protein names precisely peeled off free text. Bioinformatics 20, i241–i247 (2004).
Finkel, J. et al. Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics 6, S5 (2005).
Crim, J., McDonald, R. & Pereira, F. Automatically annotating documents with normalized gene lists. BMC Bioinformatics 6, S13 (2005).
Fundel, K., Güttler, D., Zimmer, R. & Apostolakis, J. A simple approach for protein name identification: prospects and limits. BMC Bioinformatics 6, S15 (2005).
Hanisch, D., Fundel, K., Mevissen, H. T., Zimmer, R. & Fluck, J. ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics 6, S14 (2005). This paper describes a simple biomedical ER system that relies primarily on a carefully curated list of synonyms. It was one of the methods that performed best in the BioCreAtIvE assessment.
Chen, L., Liu, H. & Friedman, C. Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21, 248–256 (2005). These authors provide a quantitative overview of the causes of gene-name ambiguity, and suggest how researchers and publishers can help to minimize this problem.
Gaudan, S., Kirsch, H. & Rebholz-Schuhmann, D. Resolving abbreviations to their senses in Medline. Bioinformatics 21, 3658–3664 (2005).
Schijvenaars, B. J. A. et al. Thesaurus-based disambiguation of gene symbols. BMC Bioinformatics 6, 149 (2005).
Tanabe, L., Xie, N., Thom, L. H., Matten, W. & Wilbur, W. J. GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics 6, S3 (2005).
Craven, M. Kumlien, J. Constructing biological knowledge bases by extracting information from text sources. in Proc. Int. Conf. Intell. Syst. Mol. Biol. 7, 77–86 (1999).
Cooper, J. W. & Kershenbaum, A. Discovery of protein–protein interactions using a combination of linguistic, statistical and graphical information. BMC Bioinformatics 6, 143 (2005).
Ramani, A. K., Bunescu, R. C., Mooney, R. J. & Marcotte, E. M. Consolidating the set of known human protein–protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol. 6, R40 (2005).
Stephens, M., Palakal, M., Mukhopadhyay, S., Raje, R. & Mostafa, J. Detecting gene relations from Medline abstracts. Pac. Symp. Biocomput. 6, 483–495 (2001).
Blaschke, C. & Valencia, A. The frame-based module of the SUISEKI information extraction system. IEEE Intell. Syst. 17, 14–20 (2002).
Stapley, B. J. & Benoit, G. Biobibliometrics: information retrieval and visualization from co-occurrence of gene names in Medline abstracts. Pac. Symp. Biocomput. 5, 529–540 (2000).
Jenssen, T. K., Lægreid, A., Komorowski, J. & Hovig, E. A literature network of human genes for high-throughput analysis of gene expression. Nature Genet. 28, 21–28 (2001). This paper describes an IE system, PubGene, that is based on simple co-occurrence, and shows how it can be used for the interpretion of microarray expression data.
Bowers, P. M. et al. Prolinks: a database of protein functional linkages derived from coevolution. Nucleic Acids Res. 5, R35 (2003).
von Mering, C. et al. STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res. 33, D433–D437 (2005).
Schlitt, T. et al. From gene networks to gene function. Genome Res. 13, 2568–2576 (2003).
Wren, J. D. & Garner, H. R. Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics 20, 191–198 (2004).
Alako, B. T. et al. CoPub Mapper: mining MEDLINE based on search term co-publication. BMC Bioinformatics 6, 51 (2005).
Tiffin, N. et al. Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res. 33, 1544–1552 (2005). This study combines tissue-expression data with disease–tissue relationships that were extracted from the literature to predict candidate disease genes.
Ding, J., Berleant, d., Nettleton, D. & Wurtelle, E. Mining Medline: abstracts, sentences, or phrases? Pac. Symp. Biocomput. 7, 326–337 (2002).
Ray, S. & Craven, M. Learning statistical models for annotating proteins with function information using biomedical text. BMC Bioinformatics 6, S18 (2005).
Narayanaswamy, M., Ravikumar, K. E. & Vijay-Shanker, K. Beyond the clause: extraction of phosphorylation information from Medline abstracts. Bioinformatics 21, i319–i327 (2005).
Saric, J., Jensen, L. J., Ouzounova, R., Rojas, I. & Bork, P. Extraction of regulatory gene/protein networks from Medline. Bioinformatics 26 July 2005 (10.1093/bioinformatics/bti597).
Rindflesch, T. C., Tanabe, L., Weinstein, J. N. & Hunter, L. EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pac. Symp. Biocomput. 1, 517–528 (2000).
Proux, D., Rechenmann, F. & Julliard, L. A pragmatic information extraction strategy for gathering data on genetic interactions. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 179–285 (2000).
Yakushiji, A., Tateisi, Y., Miyao, Y. & Tsujii, J. Event extraction from biomedical papers using a full parser. Pac. Symp. Biocomput. 6, 408–419 (2001).
Daraselia, N. et al. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20, 604–611 (2004).
Friedman, C., Kra, P., Yu, H., Krauthammer, M. & Rzhetsky, A. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17, S74–S82 (2001).
Rzhetsky, A. et al. GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J. Biomed. Inform. 37, 43–53 (2004). This paper is a good introduction to NLP-based IE and to the design of complex IE systems such as GeneWays.
Temkin, J. M. & Gilder, M. R. Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19, 2046–2053 (2003).
Hao, Y., Zhu, X., Huang, M. & Li, M. Discovering patterns to extract protein–protein interactions from the literature: part II. Bioinformatics 21, 3294–3300 (2005).
Thomas, J., Milward, D., Ouzounis, C., Pulman, S. & Carroll, M. Automatic extraction of protein interactions from scientific abstracts. Pac. Symp. Biocomput. 5, 707–709 (2000).
Hearst, M. A. Untangling text data mining. Proc. Assoc. Comput. Linguist., 37, 3–10 (1999).
Swanson, D. R. Fish oil, Raynaud's Syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986). This is the original text-mining paper, which shows how new knowledge can be inferred from the existing literature.
Blagosklonny, M. V. & Pardee, A. B. Unearthing the gems. Nature 416, 373 (2002).
Swanson, D. R. Migrane and magnesium: eleven neglected connections. Perspect. Biol. Med. 31, 526–557 (1988).
Swanson, D. R. Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspect. Biol. Med. 33, 157–186 (1990).
Smalheiser, N. R. & Swanson, D. R. Linking estrogen to Alzheimer's disease: an informatics approach. Neurology 47, 809–810 (1996).
Swanson, D. R. Intervening in the life cycle of scientific knowledge. Library Trends 41, 606–631 (1988).
Smalheiser, N. R. & Swanson, D. R. Assessing a gap in the biomedical literature: Magnesium deficiency and neurological disease. Neurosci. Res. Commun. 15, 1–9 (1994).
Weeber, M. et al. Text-based discovery in biomedicine: the architecture of the DAD-system. Proc. AMIA Symp. 20, S903–S907 (2000).
Srinivasan, P. & Libbus, B. Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics 20, i290–i296 (2004).
Wren, J. D. Extending the mutual information measure to rank inferred literature relationships. BMC Bioinformatics 5, 145 (2004).
Hristovski, D., Peterlin, B., Mitchell, J. A. & Humphrey, S. M. Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inform. 74, 289–298 (2005).
Grably, M. R., Stanhill, A., Tell, O. & Engelberg, D. HSF and Msn2/4p can exclusively or cooperatively activate the yeast HSP104 gene. Mol. Microbiol. 44, 21–35 (2002).
Chi, Y. et al. Negative regulation of Gcn4 and Msn2 transcription factors by Srb10 cyclin-dependent kinase. Genes Dev. 15, 1078–1092 (2001).
Bose, S., Dutko, J. A. & Zitomer, R. S. Genetic factors that regulate the attenuation of the general stress response of yeast. Genetics 169, 1215–1226 (2005).
Lenssen, E. et al. The Ccr4–Not complex independently controls both Msn2-dependent transcriptional activation — via a newly identified Glc7/Bud14 type I protein phosphatase module — and TFIID promoter distribution. Mol. Cell. Biol. 25, 488–498 (2005).
Xiao, Y. & Mitchell, A. P. Shared roles of yeast glycogen synthase kinase 3 family members in nitrogen-responsive phosphorylation of meiotic regulator Ume6p. Mol. Cell. Biol. 20, 5447–5453 (2000).
Eiznhamer, D. A., Ashburner, B. P., Jackson, J. C., Gardenour, K. R. & Lopes, J. M. Expression of the INO2 regulatory gene of Saccharomyces cerevisiae is controlled by positive and negative promoter elements and an upstream open reading frame. Mol. Microbiol. 39, 1395–1405 (2001).
Kennedy, M. A., Barbuch, R. & Bard, M. Transcriptional regulation of the squalene synthase gene (ERG9) in the yeast Saccharomyces cerevisiae. Biochim. Biophys. Acta 1445, 110–122 (1999).
Hoffmann, R. & Valencia, A. Life cycles of successful genes. Trends Genet. 19, 79–81 (2003).
de Lichtenberg, U., Jensen, L. J., Brunak, S. & Bork, P. Dynamic complex formation during the yeast cell cycle. Science 307, 724–727 (2005).
Morel, V. & Schweisguth, F. Repression by Suppressor of Hairless and activation by Notch are required to define a single row of single-minded expressing cells in the Drosophila embryo. Genes Dev. 14, 377–388 (2000).
Woods, S. L. & Witelaw, M. L. Differential activities of Murine Single Minded 1 (SIM1) and SIM2 on a hypoxic response element. J. Biol. Chem. 277, 10236–10243 (2002).
Andrade, M. A. & Valencia, A. Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14, 600–607 (1998).
Blaschke, C., Oliveros, J. C. & Valencia, A. Mining functional information associated with expression arrays. Funct. Integr. Genomics 1, 256–268 (2001).
Masys, D. R. et al. Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics 17, 319–326 (2001).
Chaussabel, D. & Sher, A. Mining microarray expression data by literature profiling. Genome Biol. 3, research0055.1–research0055.16 (2002).
Raychaudhuri, S., Schutze, H. & Altman, R. B. Using text analysis to identify functionally coherent gene groups. Genome Res. 12, 1582–1590 (2002).
Raychaudhuri, S., Chang, J. T., Imam, F. & Altman, R. B. The computational analysis of scientific literature to define and recognize gene expression clusters. Nucleic Acids Res. 31, 4553–4560 (2003).
Glenisson, P. et al. TXTGate: profiling gene groups with text-based information. Genome Biol. 5, R43 (2004).
Krauthammer, M., Kaufmann, C. A., Gilliam, T. C. & Rzhetsky, A. Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. Proc. Natl Acad. Sci. USA 101, 15148–15153 (2004). The study shows how literature-based molecular networks and genetic linkage mapping can be integrated to find candidate disease genes.
Perez-Iratxeta, C., Bork, P. & Andrade, M. A. Association of genes to genetically inherited diseases using text mining. Nature Genet. 31, 316–319 (2002).
Perez-Iratxeta, C., Wjst, M., Bork, P. & Andrade, M. A. G2D: A tool for mining genes associated to disease. BMC Genetics 6, 45 (2005). Reference 103 integrates genetic linkage-mapping data with data from the literature to suggest candidate genes for inherited diseases. Reference 104 shows later improvements of the method.
Korbel, J. O. et al. Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol. 3, e134 (2005). These authors present a method for linking genotypes to phenotypes by comparing species profiles of genes and literature-derived keywords.
Shah, P. K., Perez-Iratxeta, C., Bork, P. & Andrade, M. A. Information extraction from full text scientific articles: Where are the keywords? BMC Bioinformatics 4, 20 (2003).
Schuemie, M. J. et al. Distribution of information in biomedical abstracts and full-text publications. Bioinformatics 20, 2597–2604 (2004).
Dickman, S. Tough mining. PLoS Biol. 1, 144–147 (2005).
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. F. & Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000).
Yeh, A. S., Hirschman, L. & Morgan, A. A. Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics 19, i331–i339 (2003).
Hirschman, L., Yeh, A., Blaschke, C. & Valencia, A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6, S1 (2005).
Krauthammer, M. et al. Of truth and pathways: chasing bits of information through myriads of articles. Bioinformatics 18, S249–S257 (2002).
Perez-Iratxeta, C. & Andrade, M. A. Worldwide scientific publishing activity. Science 297, 519 (2002).
Netzel, R., Perez-Iratxeta, C., Bork, P. & Andrade, M. A. The way we write. EMBO Rep. 4, 446–451 (2003).
Newman, M. E. J. Coauthorship networks and patterns of scientific collaboration. Proc. Natl Acad. Sci. USA 101, 5200–5205 (2004).
Acknowledgements
The authors would like to thank T. Doerks and S. Hooper for help with figures, and other group members of P.B.'s group at the European Molecular Biology Laboratory and I. Rojas's group at EML Research for valuable discussions. J.S. is funded by the Klaus Tschira Foundation. This work was supported by grants from the European Community and the German Ministry for Education and Science through Nationales Genomforschungsnetz (NGFN).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Related links
Related links
DATABASES
OMIM
FURTHER INFORMATION
An extended bibliography of biological literature-mining papers
Glossary
- Machine learning
-
The ability of a machine to learn from experience or extract knowledge from examples in a database. Artificial neural networks and support-vector machines are two commonly used types of machine-learning method.
- Gene Ontology
-
A set of controlled vocabularies that are used to describe the molecular functions of a gene product, the biological processes in which it participates and the cellular components in which it can be found.
- Syntax
-
The orderly manner in which words are put together to form phrases and sentences.
- Semantics
-
The meaning that is implied by words and sentences. If an information-extraction method extracts the right facts from a sentence, it has interpreted the semantics correctly.
- Anaphoric relationships
-
Back-references to previously mentioned entities. A protein that is mentioned in an earlier sentence might, for example, be subsequently be referred to as 'it'.
- Corpus
-
A collection of texts. A corpus might consist of either the raw text only (for example, Medline) or be tagged so that, for example, gene and protein names are labelled (for example, GENIA).
- Study bias
-
Study biases arise because some proteins (or other molecules) are more studied than others. For example, if a protein is known to be phosphorylated, it is also more likely to have been studied in other respects, and is therefore more likely to be known to be regulated by expression, for example.
- MeSH terms
-
A controlled vocabulary that is used for annotating Medline abstracts. Several classes of MeSH term exist, the most relevant for literature mining being 'Chemicals and Drugs' (MeSH-D) and 'Diseases' (MeSH-C).
- Linkage mapping
-
A method for localizing genes that is based on the co-inheritance of genetic markers and phenotypes in families over several generations.
Rights and permissions
About this article
Cite this article
Jensen, L., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7, 119–129 (2006). https://doi.org/10.1038/nrg1768
Issue Date:
DOI: https://doi.org/10.1038/nrg1768
This article is cited by
-
GPDminer: a tool for extracting named entities and analyzing relations in biological literature
BMC Bioinformatics (2024)
-
Molecular and network-level mechanisms explaining individual differences in autism spectrum disorder
Nature Neuroscience (2023)
-
HLA-SPREAD: a natural language processing based resource for curating HLA association from PubMed abstracts
BMC Genomics (2022)
-
Identification of molecular markers and putative candidate genes associated with early seedling vigour traits in rice (Oryza sativa L.)
Brazilian Journal of Botany (2022)
-
ANDDigest: a new web-based module of ANDSystem for the search of knowledge in the scientific literature
BMC Bioinformatics (2020)