Introduction to BioCyc and Pathway Tools
BioCyc is a database collection and website that couples rich and high-quality data with extensive bioinformatics tools.The Pathway Tools software -- the software that underlies BioCyc.org -- can be installed at your site to create BioCyc-like databases for genomes of interest.
BioCyc Benefits: Better Science, Faster
BioCyc has applications in basic science, drug discovery, synthetic biology, and biotechnology.BioCyc accelerates your science:
- BioCyc curators author mini-reviews for genes
[example]
and pathways
[example]
from the literature, which save scientists time in searching, reading, synthesizing, and reconciling errors and disagreements in the literature
- BioCyc integrates many types of data to form a one-stop shop, placing genes in
their pathway and regulatory context and including protein features, GO terms, and gene essentiality. Our Guided Tour demonstrates some of these data and how to access them.
- BioCyc visualization tools speed information uptake and produce publication-quality figures for pathways, genome comparisons, and more
- Comprehensive and high-quality information curated from 146,000 publications enable:
- Better hypothesis generation
- More complete experimental interpretation
- Higher quality publications and grant proposals
- BioCyc's tools enable unique analyses:
- Pathway-based analysis of metabolomics data
- Visualize transcriptomics data on zoomable metabolic charts
- Comparative genomics and pathway operations
BioCyc Database Collection
The BioCyc collection of Pathway/Genome Databases (PGDBs) provides a reference on the genomes, metabolic pathways, and (in some cases) regulatory networks of thousands of sequenced organisms. Each database combines:
- Computational inferences: Our Pathway Tools
software predicts the metabolic pathways of an
organism, predicts which genes code for missing enzymes in metabolic
pathways, predicts protein complexes, and predicts operons.
- Imported data: BioCyc integrates information
from other bioinformatics databases, such as protein feature and Gene
Ontology information from UniProt, gene-essentiality datasets from OGEE,
and regulatory information from RegTransBase.
- Manual curation:
The
Tier 1 and Tier 2 databases
have received literature-based curation to enter new gene functions,
pathways, protein complexes, regulation, and more. Curated PGDB
entries include mini-review summaries and thousands of literature
citations.
- The free EcoCyc DB is the result of more than 20 person-years of effort to enter
information from 44,000+ E. coli articles about gene function, metabolism, transport, and regulatory processes.
- The MetaCyc DB describes metabolic pathways, enzymes,and metabolites from all domains of life, curated from 76,000+ publications.
- The free EcoCyc DB is the result of more than 20 person-years of effort to enter
information from 44,000+ E. coli articles about gene function, metabolism, transport, and regulatory processes.
Also free is the database for the cyanobacterium Arthrospira platensis NIES-39 as an example of a Tier 3 database. The other BioCyc databases are available via subscription, which supports their curation. To obtain free access to BioCyc for teaching purposes, click here.
BioCyc data files may be downloaded to your site, and BioCyc data can be queried via web services.
BioCyc.org Bioinformatics Tools
BioCyc.org provides a suite of bioinformatics tools (see Tools menu) for accessing and analyzing the BioCyc databases.
- Search:
Multiple search tools enable users to find genes, pathways, and metabolites of
interest, which are presented in corresponding information pages. Most searches apply to the currently selected organism database.
There are two ways to search across multiple databases:
(1) Use Tools → Search → Cross Organism Search or (2) In commands such as Tools → Search → Search Genes, Proteins, and RNAs,
select "Search across multiple organisms/databases" under the list of buttons.
- Visualization:
A variety of visualization tools are provided, such as metabolic-pathway diagrams, and zoomable diagrams depicting the complete metabolic network of each organism
[example].
- Genome Browser: The BioCyc genome browser
[example]
enables visual genome exploration and analysis of positional genome datasets via tracks.
- Omics Data Analysis: Tools include enrichment analysis;
and visualization of gene expression, proteomics, or
metabolomics data on metabolic-chart diagrams
[example] and on the Omics Dashboard
[example].
- SmartTables: Provide biologist-friendly analysis capabilities for groups of genes or metabolites that are stored in your BioCyc account.
- Metabolic Route Search: Search for reaction paths connecting specified
metabolites in the metabolic network. Pathway Tools enables design of novel pathways by adding reactions
from MetaCyc.
- Comparative Analysis: Tools include comparison of pathways, metabolites, transporters,
and regulatory networks -- see menu command Analysis → Comparative Analysis and the new Comparative Genome Dashboard at Analysis → Comparative Genome Dashboard.
- Sequence Analysis: Extract sequences, perform BLAST searches, sequence pattern searches, and perform multiple alignments.
Pathway Tools Software
Pathway Tools is an enterprise genome and pathway data management tool and is among the most extensive bioinformatics software packages. It is the software used to create BioCyc databases and it powers the BioCyc.org website and several additional websites. Its capabilities are described in detail here. Please click here to see Pathway Tools testimonials.Pathway Tools can run as both a desktop application and as a web server.
It is freely available to academics interested in creating PGDBs for organisms of interest to them.
Pathway Tools consists of several components that provide advantages when installed at your site:
- The Pathway/Genome Navigator supports querying, visualization, and analysis of PGDBs, including private local BioCyc PGDBs on your intranet. It also provides extensive search, visualization, and analysis tools for your locally stored data.
- PathoLogic enables creation of new PGDBs from your own genome data, generating metabolic reconstruction (pathway prediction), operon inferences, and more.
- The Pathway/Genome Editors support interactive updating
and refinement of PGDBs, such as adding new gene functions and pathways.
- MetaFlux enables creation of quantitative metabolic models from PGDBs using Flux-Balance Analysis.
How to Learn More About BioCyc
The following additional information exists on the BioCyc site:
- Guided Tour: Presents the different information types present in BioCyc
- Webinars: Online videos describing how to use BioCyc
- Website User's Guide: Instructions for using the BioCyc Website
- BioCyc User Guide: Information about the data content of BioCyc DBs
- PGDB Concepts Guide: The ideas behind BioCyc
- Publications: Articles about BioCyc databases and the Pathway Tools software
Definitions of Terminology on the BioCyc Website
Here we define a few key terms. See the glossary for more definitions.Pathway/Genome Database (PGDB). A database that describes
- The genome of an organism -- its chromosome(s), genes, and genome sequence
- The product of each gene
- The metabolic network of the organism -- its pathways, reactions, enzymes, and metabolites
- The regulatory network of the organism, including its operons, transcription factors, and the interactions between transcription factors and their small-molecule ligands and DNA binding sites
Tier 1 PGDB. PGDBs in Tier 1, such as EcoCyc, MetaCyc, and HumanCyc, have received at least one year of literature-based curation by scientists.
Tier 2 PGDB. PGDBs in Tier 2 were generated by Pathway Tools, which predicted their metabolic pathways; their operons (for bacteria only); protein complexes; and some missing enzymes in their predicted pathways (pathway hole fillers). The resulting PGDBs underwent manual review by a biologist to remove false-positive pathway predictions, and to perform refinements such as defining protein complexes. Then literature-based curation resulted in entry of new gene functions and of metabolic pathways that had been experimentally elucidated in the organism but were not predicted computationally. [list of Tier 2 PGDBs]
Tier 3 PGDB. PGDBs in Tier 3 were generated computationally in the same manner as for Tier 2. The resulting PGDBs did not undergo manual curation. [list of Tier 3 PGDBs]
BioCyc: The collection of PGDBs at URL https://BioCyc.org/ is called the BioCyc Database Collection. EcoCyc and MetaCyc are component databases within the BioCyc collection.