BioCyc and Pathway Tools Blog: visualization

Friday, October 12, 2018

Pan-Genome PGDBs Unify Genomic and Metabolic Data across Related Strains

Pan-Genome PGDBs are a relatively new feature of BioCyc. These BioCyc databases combine in one place information about multiple sequenced genomes for a given species. For example, the Helicobacter pylori Pan-Genome database covers 158 sequenced strains.

The Pan-Genome PGDBs contain one gene object for each orthologous group of genes in the organism. They also contain the union of all metabolic pathways across all the strains. Thus, a Pan-Genome PGDB allows you to quickly assess the full set of gene functions and metabolic pathways across the the known strains. For example, the gene page shows all orthologs across all strains in the Pan-Genome. The page for the ftsX gene illustrates this for a gene with relatively few orthologs.

Other gene pages list hundreds of orthologs and synonyms.

You can visit a page for the orthologous genes by following the links in the 'Relationship Links' area near the bottom of the page.

Pan-Genomes provide a way to visualize genes in the genome browser as well. It's easier to understand this visualization if you know that Pan-Genomes are constructed starting with the PGDB of a base strain and adding other members of the collection of strains one-by-one. For example, the H. pylori Pan-Genome was constructed by starting with strain 26695. The visualization is based on dividing groups of orthologous genes into two sets: ortholog groups that include genes that occur in the base strain, and ortholog groups that only include genes in other strains.

Ortholog groups that include genes from the base strain are collected on a 'chromosome' that preserves the location and direction of the genes from the base strain. The remaining ortholog groups are mapped in arbitrary order onto an 'artificial replicon'. The names in the artificial replicon display are based on the (arbitrary) order that PGDBs were added to the Pan-Genome.

You can search for genes either by name (e.g., 'abc') or an identifier combining a strain name and id joined with an underscore (e.g., HPHPP11_0013) by entering the name in the search box at the top right. Either quick search or gene search will take you to the page containing the gene and its orthologs in the Pan-Genome PGDB.

The cellular overview diagram provides a way to visualize reactions associated with genes shared by all members of the Pan-Genome. You can also visualize those reactions that are unique to a single organism within the Pan-Genome. In the screen shot below, the reactions shared by all organisms in the H. pylori Pan-Genome are shown in red and those unique to a single organism are purple.

To create a diagram like this on BioCyc web site, select a Pan-Genome PGDB, and bring up the cellular overview. In the operations menu, choose Highlight Genes -> By Pan-Genome Core Genes. The core genes are the set of genes shared by all organism databases in the Pan-Geneome. Chose Highlight Genes -> By Pan-Genome Unique Genes, to highlight reactions associate with genes that are unique to a single organism database.

In the desktop, show the Cellular Overview, then from Overviews -> Highlight -> Highlight the core genome followed by Overviews -> Highlight the unique genes will achieve the same highlighting.

More details on how Pan-Genome PGDBs are created and how to use them are provided here.

You can select a Pan-Genome PGDB by entering the phrase “pan-genome” in the change organism database dialogue. Here's our current list of Pan-Genome PGDBs and the number of strains that each one contains. Over time we will be adding Pan-Genome PGDBs for additional species, and regenerating existing Pan-Genome PGDBs to include additional strains.

Clostridioides difficile 10 strains

Escherichia coli 374 strains

Helicobacter pylori 158 strains

Listeria monocytogenes 35 strains

Mycobacterium tuberculosis 24 strains

Pseudomonas aeruginosa 24 strains

Salmonella enterica 113 strains

Shigella flexneri 9 strains

Vibrio cholerae 81 strains

Thursday, March 10, 2016

Introducing Pathway Collages...

Figure 1

Pathway Tools has long been recognized for the quality of our automatically generated individual metabolic pathway diagrams, which are intuitive to biologists, can be shown at varying levels of detail, and can be customized in various ways, including with the overlay of omics data. When a more global view is called for, our cellular overview diagram depicts the entire metabolic network for an organism, with capabilities for selective highlighting and overlay of omics data. However, to understand some biochemical situations, viewing a single pathway is insufficient, whereas viewing the entire metabolic network results in information overload. Pathway Collages, new in Pathway Tools version 19.5, are an attempt to bridge this gap, allowing users to create high-quality, customized, user-manipulable diagrams containing collections of user-specified pathways.

Pathway Collages can be explored and edited via the Pathway Collage Viewer web browser application. This application, implemented using the Cytoscape.js open-source JavaScript graph visualization library, supports panning, zooming, and all the editing and customization operations described in this post and the documentation embedded within the Pathway Collage Viewer itself. Feel free to experiment yourself with the example pathway collage online at http://biocyc.org/cytoscape-js/ovsubset.html?graph=example1&showHelp=T, or create your own following the instructions below.

Figure 2

Three example Pathway Collage figures are illustrated here. Figure 1 depicts a Pathway Collage consisting of four E. coli pathways overlaid with gene expression data. This diagram has already been manually adjusted by repositioning the pathways relative to each other and tweaking node font sizes and shapes. Metabolites that are shared between pathways are indicated by drawing connecting lines between them.

Figure 2 shows a collage consisting of two E. coli pathways overlaid with predicted reaction flux data. In this diagram, rather than drawing connecting lines, compounds that are shared between the two pathways are merged, showing glycolysis flowing seamlessly into fermentation.

Figure 3

Figure 3 depicts a collage containing a larger number of pathways at a lower zoom level, so metabolite, enzyme and gene names are automatically suppressed (the font size of the pathway labels has been increased so those labels remain visible). In addition to manually repositioning pathways, merging some common nodes, and changing the default colors, some metabolites of interest have been highlighted in purple.

Now that you've seen what you can do with a Pathway Collage, how can you create one for yourself? Pathway Collages can be created from either the BioCyc website (or other Pathway Tools-based website) or from desktop Pathway Tools. There are five basic steps.

Specify the set of pathways to be included. The simplest and most reliable way to specify a set of pathways is to generate a SmartTable containing the desired pathways, and then export the SmartTable to a Pathway Collage. This works both for the desktop and web versions of Pathway Tools, and enables you to keep your list of pathways around in case you ever want to edit it or regenerate your collage. There are other ways to specify a set of pathways, such as by interactively clicking on them in the cellular overview diagram (desktop only), from an omics dataset (web only), or by creating a seed collage from a single pathway and then interactively adding more (web only). We may add additional options to specify pathways in the future. Consult the documentation for more details.
Export to Pathway Collage Viewer. Pathway Tools will compute automatic layouts of the individual pathways within the collage, then position those diagrams next to one another horizontally, and send that initial layout of the collage to the Pathway Collage Viewer application in your web browser.
Interactively refine and customize the collage. This can involve repositioning items, showing connections, adding, deleting or merging elements, editing labels, highlighting elements of interest, and/or customizing node and edge styles. By default, only the metabolites along the main backbone of a pathway are included in the diagrams, but side metabolites can be added interactively. Additional pathways involving a metabolite of interest can also be added interactively.
Import omics data to be visualized on the collage (optional). Omics data can be added either before or after the collage is generated. The collage can display omics data associated with either genes, metabolites, or reactions. When multi-timepoint gene expression data is displayed, the display of enzyme names is suppressed.
Save or export the collage. At any time, a pathway collage can be saved as a JSON-format graph file on your computer; that file can later be loaded back in to the collage viewer (not all browsers support this operation --- we recommend using Chrome or Firefox). A pathway collage can also be exported to a PNG-format image file for use in presentations or publications. The image will be generated with a resolution comparable to that of the display at the time the image is created (up to some maximum), therefore, the highest-quality images are obtained if the collage is displayed at a high zoom level when exporting.

For more information on Pathway Collages, see the Pathway Tools Website User Guide or the help documentation within the Pathway Collage Viewer itself.

Friday, October 12, 2018

Pan-Genome PGDBs Unify Genomic and Metabolic Data across Related Strains

Thursday, March 10, 2016

Introducing Pathway Collages...

Subscribe To