Guide to EcoCyc

1 EcoCyc Project Overview

2 How to Cite EcoCyc

3 The Roles of EcoCyc in Microbial Genome Annotation

4 Conditions of E. coli Growth and Non-Growth

5 Essential Gene Information

6 EcoCyc Metabolic Flux Model

7 Update Frequency

8  Data Sources Incorporated into EcoCyc
8.1  UniProt Features
8.2  Gene Ontology
8.3  RefSeq Collaboration
8.4  MetaCyc

9 EcoCyc Accession Numbers
9.1 Gene Accession Numbers

10 Other E. coli and Shigella PGDBs in BioCyc

11 We Encourage Your Feedback

12 How to Learn More

13 Acknowledgments

1 EcoCyc Project Overview

EcoCyc¹ is a bioinformatics database that describes the genome and the biochemical machinery of E. coli K-12 MG1655. The long-term goal of the project is to describe the molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists, and for biologists who work with related microorganisms.

In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc.

This chapter provides an overview of the data content of EcoCyc, and of the procedures by which these data have been and continue to enter EcoCyc.

EcoCyc is designed for several different modes of interactive use via the EcoCyc.org web site and in conjunction with the downloadable Pathway Tools [1] software (Section 12 tells how to learn how to use the web site and software):

It is designed as an encyclopedic reference to provide information about the biological role of a particular gene, metabolite, or pathway. A variety of visualization tools are provided such as a genome browser, metabolic map display, and regulatory network diagram.
It is designed to assist in the analysis of high-throughput data such as gene-expression and metabolomics data. Analysis tools include enrichment analysis and visualization of omics data onto a metabolic map diagram, complete genome diagram, and regulatory network diagram. See http://ecocyc.org/PToolsWebsiteHowto.shtml#node_sec_7 for more details.
The EcoCyc metabolic flux model can predict growth or no-growth of wildtype and knock-out strains under different nutrients.

EcoCyc data are also available for download in multiple file formats [2] and can be queried programmatically via web services [3].

Genome. EcoCyc contains the complete genome sequence of E. coli, and describes the nucleotide position and function of every E. coli gene. A staff of five full-time curators updates the annotation of the E. coli genome on an ongoing basis using a literature-based curation (see below) strategy. Mini-review summaries of E. coli gene products can be found in EcoCyc protein and RNA pages. Users can retrieve the nucleotide sequence of a gene, and the amino-acid sequence of a gene product.

Regulation. EcoCyc describes several types of E. coli cellular regulation:

Substrate-level enzyme regulation: Provided for hundreds of E. coli enzymes.
Transcriptional regulation and operon organization: EcoCyc contains the most complete description of the regulatory network of any organism, including E. coli operons, promoters, transcription factors, transcription-factor binding sites, attenuators, and small-RNA regulators. The transcriptional regulatory information in EcoCyc and RegulonDB is curated by the group of Dr. Julio Collado-Vides at the UNAM; both databases therefore have the same data content on transcriptional regulation of gene expression. Actual curation of the data occurs within EcoCyc, and the information is periodically propagated to RegulonDB.
Attenuation: Curation of regulation by attenuation began in 2008.
Regulation by small RNAs: Curation of small RNA-based regulation began in 2008.

Membrane transporters. EcoCyc annotates E. coli transport proteins, and the associated transport reactions that they mediate.

Metabolism. EcoCyc describes all known metabolic pathways and signal-transduction pathways of E. coli. It describes each metabolic enzyme of E. coli, including its cofactors, activators, inhibitors, and subunit structure. See also the MetaCyc project.

Database links. EcoCyc is linked to other biological databases containing protein and nucleic acid sequence data, bibliographic data, protein structures, and descriptions of different E. coli strains. Literature-Based Curation.

Curation is the process of manually refining and updating a bioinformatics database. The EcoCyc project uses a literature-based curation approach in which database updates are based on evidence in the experimental literature. EcoCyc is largely up to date with respect to its curation activities. As of March 2013, EcoCyc has encoded information from more than 43,542 publications.

Curators collect gene, protein, pathway, and compound names and synonyms. They classify genes and gene products using the Gene Ontology and MultiFun ontology, and they classify pathways within the Pathway Tools pathway ontology. Protein complex components and the stoichiometry of these subunits are captured; cellular localization of polypeptides and protein complexes is entered, as are experimentally determined protein molecular weights; enzyme activities and any enzyme prosthetic groups, cofactors, activators, or inhibitors are captured. Operon structure and gene regulation information are encoded. Textual summaries with extensive citations are authored by curators. Within the summaries for proteins, RNAs, pathways, and operons, curators capture additional information not captured in the highly structured database fields of EcoCyc. For example, curators use the free-text summary sections to capture phenotypes caused by mutation, depletion, or overproduction of each gene product; any genetic interactions known; protein domain architecture and structural studies; similarity to other proteins; or any functional complementation experiments that have been described. Summaries can also be used to note cases in which the published reports present contradictory results. In such cases, both viewpoints will be presented with proper attribution. This approach assures that no information is lost. Underlying software. The Pathway Tools software that underlies EcoCyc is not specific to E. coli, but has been applied to manage genomic and biochemical data for hundreds of organisms.

2 How to Cite EcoCyc

Please cite EcoCyc in publications that benefited from the use of the EcoCyc database or web site. Please cite EcoCyc as:

Keseler et al., Nuc Acids Res, 39:D583–90 2011.

3 The Roles of EcoCyc in Microbial Genome Annotation

The EcoCyc database can impact two aspects of microbial genome annotation: annotation of gene function, and annotation of metabolic pathways.

We suggest that microbial genome annotation pipelines include a BLAST search (or a search by other sequence similarity tools) against all proteins with experimentally defined functions from EcoCyc. As discussed in our article Multidimensional annotation of the Escherichia coli K-12 genome, E. coli contains more proteins of experimentally determined functions than any other organism. Strong similarity hits to the preceding proteins should be preferred over hits against other proteins during assignment of functions to newly sequenced genes to minimize the chances of annotation errors due to transitive annotations.

4 Conditions of E. coli Growth and Non-Growth

As of 2011 EcoCyc incorporates media that have been shown experimentally to support or not support growth of both wild type and knock-out strains of E. coli K–12. This work has two goals. First is to assemble a comprehensive encyclopedia of E. coli growth conditions for experimentalists. The spectrum of environmental conditions supporting the growth of a bacterium is among its most important phenotypic traits. We cannot expect to understand the functions of all genes in an organism unless we understand the full range of environments in which the cell can grow. Second, a comprehensive collection of E. coli growth media will drive more accurate systems biology modeling of E. coli. The larger is the set of growth media against which these models are validated, the more accurate and comprehensive the models will be.

EcoCyc captures approximately 20 media that are commonly used by E. coli laboratories. It also describes media used in the following high-throughput experiments from Biolog Phenotype Microarrays (PMs) that support respiration in E. coli.

B. Bochner and X. Lei, personal communication, 2012.
Strain: E. coli BW30270 (rph+ (RNase PH) derivative of MG1655; the strains also show a PyrE deficiency. Found to be fnr+ as well, according to Datsenko and Wanner, unpublished results.)
“Genome Scale Reconstruction of a Salmonella Metabolic Model”
AbuOun et al 2009 [4]
Strain: E. coli MG1655
Method: Grown for 16 hours on LB plates, then suspended in Biolog Inoculating Fluid (IF-0). Formazan formation monitored at 15-minute intervals for 26 hours. Experiments performed at least twice each under aerobic conditions.
“The evolution of metabolic networks of E. coli”
Baumler et al 2011 [5]
Strain: E. coli MG1655
Method: Grown overnight on Sheep Blood Agar Plates, and then inoculated into Biolog plates for respiration testing under aerobic conditions.
A Mackie and I. Paulsen, personal communication, 2012.
Strain: E. coli MG1655
“Comparative multi-omics systems analysis of Escherichia coli strains B and K-12”
Yoon et al 2012 [6]
Strain: E. coli MG1655
Method: Grown overnight on BUG+B agar, then suspended in Biolog Inoculating Fluid (IF-0). Cell suspensions were inoculated into Biolog plates at 100 μl/well and incubated at 37 degrees C for 48 hours. The areas beneath the growth-time curves were averaged over four independent tests, and the negative control values were subtracted.

These data on growth conditions can be accessed from the EcoCyc Web site by invoking the command Tools → Search → Growth Media, then clicking on the button “All Growth Media for this Organism.” Individual media are shown in the initial table; PM data are shown in the following tables. The coloring of each cell indicates the degree of growth observed under that condition. Three levels of growth can be recorded: no growth, low growth, and growth (see legend that indicates the colors associated with each level of growth). Click on any growth medium to request a page describing its composition, and to see genes that are essential or not essential for growth under that condition.

5 Essential Gene Information

As of 2011 EcoCyc incorporates several large-scale datasets on gene essentiality in E. coli. Gene essentiality information is useful for

Predicting antibiotic targets for pathogenic bacteria
Guiding the design of minimal genomes
It provides clues regarding the functions of genes of unknown function
It is useful for validating genome-scale metabolic flux models because those models can simulate the effects of knock-outs; their results are compared to the experimental data to assess model accuracy

EcoCyc incorporates data on essentiality from the following publications:

Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655
Gerdes et al [7]
Strain: E. coli MG1655 (F⁻ λ ⁻ ilvG rfb-50 rph-1)
Media: Enhanced Luria-Berani medium (described in EcoCyc) with Kanamycin
Conditions: Aerobic growth
Method: Genetic footprinting via the Tn5 transposome system. Note that this means that in some cases reported as "no growth," the experimental result is that no transposon insertions were identified in the gene in question.
Experimental and computational assessment of conditionally essential genes in Escherichia coli
Joyce et al [8]
Strain: E. coli BW25113 (rpoS(Am) rph-1 λ ⁻ rrnB3 ΔlacZ4787 hsdR514 Δ(araBAD)567 Δ(rhaBAD)568 rph-1) (the same as in [9])
Media: M9 medium with 1% glycerol and kanamycin
Conditions: Aerobic growth at 37 degrees with agitation
Method: This study used the deletion collection described in [9].
Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection
Baba et al [9]
Strain: E. coli BW25113 (rpoS(Am) rph-1 λ ⁻ rrnB3 ΔlacZ4787 hsdR514 Δ(araBAD)567 Δ(rhaBAD)568 rph-1)
Media: LB and 0.4% glucose MOPS medium with 2 mM inorganic phosphate and kanamycin
Conditions: Aerobic growth at 37 degrees without shaking
Method: Deletions were made by use of the FLP recombinase system, replacing target genes with in-frame kanamycin resistance genes. Note that in some cases there were secondary impacts from single-gene deletions, such as compensating suppressor mutation. There were also errors in some of the mutants described in this paper, which were later corrected.
A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information
Feist et al [10]
This publication re-interpreted the data from Baba et al [9] using different thresholds. Feist et al distinguished growth from non-growth as follows (J. Reed, personal communication): Baba et al had apparently selected all mutants whose growth under LB medium was less than 1/3 of the average optical density (OD), and retested those for growth. 200 mutants were retested. All strains showing less than 1/3 of the average OD of those 200 mutants were considered no-growth. There were 119 such strains. Therefore, in this MOPS minimal glucose medium, Feist et al selected the 119 slowest growing strains and declared them to be no-growth. This dataset is included in EcoCyc to provide a record of this essential gene set and to facilitate benchmarking of computational predictions of essentiality from the EcoCyc model with computations from the model of Feist et al.
Multicopy suppression underpins metabolic evolvability.
Patrick et al 2007 [11]
Strain: E. coli BW25113 (rpoS(Am) rph-1 λ ⁻ rrnB3 ΔlacZ4787 hsdR514 Δ(araBAD)567 Δ(rhaBAD)568 rph-1)
Media: M9 medium with 0.4% glycerol and kanamycin
Method: This study used the deletion collection described in [9].

When essentiality data is available for a given gene, the EcoCyc gene page includes a table of the conditions under which that gene has been found to be essential, or not essential, for growth. Clicking on the condition will navigate to a growth-medium page that lists all essentiality information under that growth condition.

6 EcoCyc Metabolic Flux Model

A quantitative steady-state metabolic flux model has been derived from EcoCyc using Flux-Balance Analysis (FBA). By running this model with different parameters, scientists can model the growth of E. coli under different nutrient conditions and under different gene knock-outs. Every time the model is executed, the model is freshly generated from EcoCyc, meaning that as the reactions in EcoCyc are updated due to curation, the model evolves to reflect those changes.

To run the model, use the Tools → Metabolism → Run Metabolic Model command. MetaFlux is described in the in the Metabolic Models section of the website user guide.

7 Update Frequency

The EcoCyc.org and BioCyc.org Web sites and downloadable files are updated approximately three times per year. A faster, more powerful EcoCyc that you can install locally on your computer (Macintosh, PC/Windows, PC/Linux) is released semiannually.[EcoCyc release history]

8 Data Sources Incorporated into EcoCyc

8.1 UniProt Features

UniProt protein features (the UniProt KB term is sequence annotations) from the complete proteome of E. coli K-12 MG1655 in SwissProt are imported into EcoCyc for every EcoCyc release. We import all protein features with experimental or non-experimental evidence qualifiers except for the following types: turn, helix, beta strand, and coiled‑coil. The chain type is only imported if it does not span the entire length of the protein. Examples of imported feature types include catalytic domains, phosphorylation sites, and metal ion binding sites. We import citations associated with UniProt protein features if they have an associated PubMed ID. The import of protein features into EcoCyc is done via the UniProt Feature Importer tool within the Pathway Tools software (which can be applied to any PGDB).

8.2 Gene Ontology

For several years, EcoCyc and EcoliWiki have been collaborating on improving and maintaining the GO annotations for E. coli. Since the summer of 2008, we have been periodically generating a file containing all E. coli K-12 GO term annotations, called ecocyc.gaf, that may be obtained from the Gene Ontology Consortium.

GO annotation has become a standard part of the EcoCyc’s manual literature-based curation process. The GO annotations are added to the database objects that represent the functional gene products or protein complexes, not directly to the gene objects, so as to model the biology as accurately as possible. In parallel, manual annotation of E. coli genes with GO is ongoing at EcoliWiki. On a regular basis, the GO annotations are merged. The latest UniProt and EcoliWiki annotations are imported into EcoCyc. Because electronic annotations are not accepted by the GO consortium as part of the gene association file if they are more than one year old, these UniProt annotations are reimported into EcoCyc on a regular basis.

EcoCyc incorporates many electronic and experimental GO term annotations of E. coli K-12 gene products obtained from the “UniProt [multispecies] GO Annotations @ EBI” file downloaded from the Gene Ontology Consortium. When this import was first performed in 2007, about 30,000 new IEA (“Inferred from Electronic Annotation”) GO term assignments were added to EcoCyc, along with approximately 1,000 assignments with experimental evidence codes including assignments from high-throughput protein-interaction studies. During the import of GO terms from UniProt into EcoCyc, a filtering operation is applied to prune out GO term annotations that had solely computational (IEA) evidence, if the EcoCyc gene product already had more specific GO annotations (in other words, GO terms that are children of the GO term being imported), and which had experimental evidence available. For example, if a gene product already contained an experimental annotation of the term “galactose kinase,” the software would not add the computational annotation “carbohydrate kinase.” This filtering leads to the removal of about 1,000 of these less specific and redundant annotations. A gene association file is generated from the quarterly releases of EcoCyc. This file is sent to the EcoliWiki team at Texas A&M for further processing. At EcoliWiki, annotations made in the wiki-based community annotation system since the last EcoCyc update are added to the file, along with annotations containing qualifiers (mainly contributes_to) not yet supported by EcoCyc. Only those annotations that are complete by GO consortium standards are extracted from EcoliWiki; incomplete annotations are left in place with the hope that community members will eventually complete them. EcoliWiki runs the GO consortium validation scripts and deposits the file with the GO consortium via their Concurrent Versioning System.

8.3 RefSeq Collaboration

EcoCyc is involved in a collaboration to update the genome annotation of the GenBank (U00096.3) and RefSeq (NC_000913.3) entries for E. coli K-12 MG1655 on an ongoing basis. The primary collaborators include EcoCyc, EcoGene, UniProtKB/Swiss-Prot, and NCBI. The collaborators routinely share their data and resolve conflicts among the data. Updates of gene names, gene positions, and gene product names are shared among all partners.

8.4 MetaCyc

The EcoCyc and MetaCyc databases exchange data as part of the release processes for both databases. Updates that have occurred to enzymes, genes, pathways, reactions, and metabolites are exchanged between the database based on automated comparisons of update dates to ensure that the latest information and corrections are propagated between databases.

9 EcoCyc Accession Numbers

9.1 Gene Accession Numbers

Three systems of accession numbers are typically available for genes within EcoCyc. Any of these accession numbers may be used when querying EcoCyc genes “by name,” and in the Web site Quick Search.

EcoCyc ID: The EcoCyc project assigns unique identifiers to each gene that for historical reasons are of variable syntax, and are of the form “Gnnnn,” “EGnnnnn,” or “G0-nnnnn”. EcoCyc IDs are stored as the frame id of the EcoCyc gene object.
B-numbers: Originally assigned by the Blattner laboratory as part of the E. coli genome project, the b-number identifiers are of the form “bnnnn.” B-numbers were originally assigned sequentially along the genome. When a gene object is removed from the genome because of a decision that insufficient evidence for the existence of that gene is available, that b-number is retired and is not reused. When new genes are added to the genome, they are assigned the next highest available b-number. Thus, b-numbers are no longer purely sequential along the genome. B-numbers are stored in the EcoCyc slot Accession-1.
ECK numbers: ECK numbers were assigned to the E. coli K-12 MG1655 and W3110 genomes in 2005 in an attempt to provide shared accession numbers for genes common to the two genomes [12]. ECK numbers are stored in the EcoCyc slot Accession-2. For only the first 18 or so genes in the E. coli K-12 MG1655 genome are the b-number and ECK number the same number; for subsequent genes the numbers have diverged.

10 Other E. coli and Shigella PGDBs in BioCyc

EcoCyc is part of the larger BioCyc collection of Pathway/Genome Databases (PGDBs). BioCyc version 16.0 (2012) included more than 130 E. coli and Shigella PGDBs. Most of these PGDBs were generated computationally and lack the extensive manual literature-based curation of the EcoCyc K-12 database. Two of these PGDBs have undergone additional curation: the BioCyc PGDBs for strains W3110 and for E. coli B str. REL606. Both strains underwent a computational annotation normalization procedure in which gene names, product names, heteromultimeric protein complexes, and Gene Ontology terms were propagated from EcoCyc to their orthologous genes in these other two strains. This procedure was performed under the assumption that genome annotation pipelines typically introduce syntactically large but semantically insignificant variation in the naming of genes and gene products. In addition, E. coli B str. REL606 is undergoing literature-based curation to incorporated experimental information regarding the genes and pathways present in this straing but not in the EcoCyc strain MG1655. This curation is supported by the PortEco (formerly EcoliHub) project.

To select a given genome for querying in the BioCyc Web site, click on the word “change” under the Quick Search and Gene Search buttons in the upper right corner of most Web pages.

11 We Encourage Your Feedback

Feedback from the scientific community has been invaluable to improving EcoCyc during its many years of development. We strongly encourage your comments and suggestions for improvements in areas including the following. Please email suggestions or questions to biocyc-support at ai dot sri dot com.

The database content of EcoCyc – if you see an error or omission within EcoCyc, please report it using the “Report Errors or Provide Feedback” link at the bottom of every data page.
The presentation of information within the EcoCyc Web site
The analysis tools provided in conjunction with EcoCyc
The performance of the EcoCyc Web site

At every EcoCyc release we email a summary of new developments to our biocyc-users mailing list. To subscribe to this mailing list, please see http://biocyc.org/subscribe.shtml.

12 How to Learn More

Downloadable instructional videos on how to use EcoCyc
Publications on EcoCyc: [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
BioCyc User’s Guide
MetaCyc User’s Guide
Pathway/Genome Database Concepts Guide
How to Use a Pathway Tools Website such as EcoCyc
Guide to the Pathway Tools Schema
How to download Pathway Tools and organism flat-file databases

13 Acknowledgments

The development of EcoCyc is funded by NIH grants GM77678 and GM71962 from the NIH National Institute of General Medical Sciences.

Contributors to EcoCyc are listed on the credits page.

References

[1]	P. D. Karp, S. M. Paley, M. Krummenacker, M. Latendresse, J.M. Dale, T. Lee, P. Kaipa, F. Gilham, A. Spaulding, L. Popescu, T. Altman, I. Paulsen, I.M. Keseler, and R. Caspi. Pathway Tools version 13.0: Integrated software for pathway/genome informatics and systems biology. Brief Bioinform, 11:40–79, 2010. http://bib.oxfordjournals.org/cgi/content/abstract/bbp043.
[2]	BioCyc and Pathway Tools Download Information. Deletetitle. https://biocyc.org/download.shtml.
[3]	Pathway Tools Web Services. Deletetitle. https://biocyc.org/web-services.shtml.
[4]	M. AbuOun, P. F. Suthers, G. I. Jones, B. R. Carter, M. P. Saunders, C. D. Maranas, M. J. Woodward, and M. F. Anjum. Genome scale reconstruction of a Salmonella metabolic model: comparison of similarity and differences with a commensal Escherichia coli strain. J Biol Chem, 284(43):29480–8, 2009.
[5]	D. J. Baumler, R. G. Peplinski, J. L. Reed, J. D. Glasner, and N. T. Perna. The evolution of metabolic networks of E. coli. BMC Systems Biology, 5:182, 2011.
[6]	S. H. Yoon, M. J. Han, H. Jeong, C. H. Lee, X. X. Xia, D. H. Lee, J. H. Shim, S. Y. Lee, T. K. Oh, and J. F. Kim. Comparative multi-omics systems analysis of Escherichia coli strains B and K–12. Genome Biol, 13(5):R37, 2012.
[7]	S. Y. Gerdes, M. D. Scholle, J. W. Campbell, G. Balazsi, E. Ravasz, M. D. Daugherty, A. L. Somera, N. C. Kyrpides, I. Anderson, M. S. Gelfand, A. Bhattacharya, V. Kapatral, M. D’Souza, M. V. Baev, Y. Grechkin, F. Mseeh, M. Y. Fonstein, R. Overbeek, A. L. Barabasi, Z. N. Oltvai, and A. L. Osterman. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. Journal of Bacteriology, 185(19):5673–5684, Oct 2003.
[8]	A. R. Joyce, J. L. Reed, A. White, R. Edwards, A. Osterman, T. Baba, H. Mori, S. A. Lesely, B. Ø. Palsson, and S. Agarwalla. Experimental and computational assessment of conditionally essential genes in Escherichia coli. Journal of Bacteriology, 188(23):8259–8271, 2006.
[9]	T. Baba, T. Ara, M. Hasegawa, Y. Takai, Y. Okumura, M. Baba, K. A. Datsenko, M. Tomita, B. L. Wanner, and H. Mori. Construction of Escherichia coli K–12 in-frame, single-gene knockout mutants: The Keio collection. Mol Systems Biology, 2:2006.0008, 2006.
[10]	A.M. Feist, C.S. Henry, J.L. Reed, M. Krummenacker, A.R. Joyce, P. D. Karp, L.J. Broadbelt, V. Hatzimanikatis, and B.Ø. Palsson. A genome-scale metabolic reconstruction for Escherichia coli K–12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Systems Biology, 3:121–38, 2007. http://www.nature.com/doifinder/10.1038/msb4100155.
[11]	W. M. Patrick, E. M. Quandt, D. B. Swartzlander, and I. Matsumura. Multicopy suppression underpins metabolic evolvability. Mol Biol Evol, 24(12):2716–22, 2007.
[12]	M. Riley, T. Abe, M. B. Arnaud, M. K. Berlyn, F. R. Blattner, R. R. Chaudhuri, J. D. Glasner, T. Horiuchi, I. M. Keseler, T. Kosuge, H. Mori, N. T. Perna, G. Plunkett, K. E. Rudd, M. H. Serres, G. H. Thomas, N. R. Thomson, D. Wishart, and B. L. Wanner. Escherichia coli K-12: A cooperatively developed annotation snapshot–2005. Nuc Acids Res, 34(1):1–9, 2006.
[13]	I. M. Keseler, J. Collado-Vides, A. Santos-Zavaleta, M. Peralta-Gil, S. Gama-Castro, L. Muniz-Rascado, C. Bonavides-Martinez, S. Paley, M. Krummenacker, T. Altman, P. Kaipa, A. Spaulding, J. Pacheco, M. Latendresse, C. Fulcher, M. Sarker, A. G. Shearer, A. Mackie, I. Paulsen, R. P. Gunsalus, and P. D. Karp. EcoCyc: A Comprehensive Database of Escherichia coli biology. Nuc Acids Res, 39:D583–90, 2011.
[14]	I.M. Keseler, C. Bonavides-Martinez, J. Collado-Vides, S. Gama-Castro, R.P. Gunsalus, D. Aaron Johnson, M. Krummenacker, L.M. Nolan, S. M. Paley, I.T. Paulsen, M. Peralta-Gil, A. Santos-Zavaleta, A.G. Shearer, and P. D. Karp. EcoCyc: A comprehensive view of E. coli biology. Nuc Acids Res, 37:D464–70, 2009. http://nar.oxfordjournals.org/cgi/reprint/gkn751?ijkey=7epgizfnGFYQHCe&keytype=ref.
[15]	P. D. Karp, I.M. Keseler, A. Shearer, M. Latendresse, M. Krummenacker, S. M. Paley, I.T. Paulsen, J. Collado-Vides, S. Gama-Castro, M. Peralta-Gil, A. Santos-Zavaleta, M.I. Penaloza-Spinola, C. Bonavides-Martinez, and J. Ingraham. Multidimensional annotation of the Escherichia coli K-12 genome. Nuc Acids Res, 35:7577–90, 2007. http://nar.oxfordjournals.org/cgi/content/full/35/22/7577.
[16]	I.M. Keseler, J. Collado-Vides, S. Gama-Castro, J. Ingraham, S.Paley, I.T. Paulsen, M. Peralta-Gil, and P. D. Karp. EcoCyc: A comprehensive database resource for E. coli. Nuc Acids Res, 33:D334–7, 2005. http://nar.oupjournals.org/cgi/content/full/33/suppl\_1/D334?ijkey=80p4BbGpEFjLQ\&keytype=ref.
[17]	P. D. Karp, M. Arnaud, J. Collado-Vides, J. Ingraham, I.T. Paulsen, and M.H. Jr. Saier. The E. coli EcoCyc database: No longer just a metabolic pathway database. ASM News, 70(1):25–30, 2004.
[18]	P. D. Karp, M. Riley, M. Saier, I.T. Paulsen, S. Paley, and A. Pellegrini-Toole. The EcoCyc database. Nuc Acids Res, 30(1):56–8, 2002.
[19]	P. D. Karp, M. Riley, M. Saier, I.T. Paulsen, S. Paley, and A. Pellegrini-Toole. The EcoCyc and MetaCyc databases. Nuc Acids Res, 28(1):56–59, 2000.
[20]	P. D. Karp. Using the EcoCyc database. In Nucleic Acid and Protein Databases and How To Use Them, pages 269–280. Academic Press, London, 1999.
[21]	P. D. Karp and M. Riley. EcoCyc: The resource and the lessons learned. In Bioinformatics Databases and Systems, pages 47–62. Kluwer Academic Publishers, Norwell, MA, 1999.
[22]	P. Karp, M. Riley, S. Paley, A. Pellegrini-Toole, and M. Krummenacker. EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nuc Acids Res, 27(1):55–58, 1999.
[23]	P. Karp, M. Riley, S. Paley, A. Pellegrini-Toole, and M. Krummenacker. EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nuc Acids Res, 26(1):50–53, 1998.
[24]	P. Karp, M. Riley, S. Paley, A. Pellegrini-Toole, and M. Krummenacker. EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nuc Acids Res, 25(1):43–50, 1997.
[25]	P. Karp, M. Riley, S. Paley, and A. Pellegrini-Toole. EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nuc Acids Res, 24(1):32–40, 1996.

¹ “EcoCyc” is pronounced “eeko-sike”. It sounds like “ecology” and like “encyclopedia”.