Main

Antibiotic persistence is the ability of rare bacterial cells to survive high concentrations of antibiotics, often by assuming a dormant or non-growing state5. One mode of persistence occurs following stress; persisters are cells that exhibit a long lag phase before resuming fast growth6. Selection after lethal antibiotic exposure yields mutants, including metG* and hipA7, with high levels of persistence triggered by starvation7,8,9,10,11. The hipA7 mutation was also found in 5% of a small sample of clinical E. coli isolates12.

Characterizing the cell state of rare persisters has been a longstanding challenge8,13,14 that is now tractable given the recent development of prokaryotic single-cell RNA sequencing2,15 (scRNA-seq). A recent study used scRNA-seq to identify an enriched population of Klebsiella pneumoniae persisters following antibiotic treatment16. However, it is also critical to agnostically define such persister cell states in antibiotic-naive populations and determine the genetic basis of their emergence.

We thus set out to precisely characterize the emergence of antibiotic persisters, capture their transcriptional states, and contextualize those states relative to others. To that end, we generated a single-cell atlas of E. coli growth transitions. Within this atlas, we identified a unique persister cluster that is common across five genetic and physiological models of persistence. This persister cluster is distinct from all other growth phases and is primarily defined by translational deficiency. To understand which of the transcriptional markers of persisters may be causal to their formation, we carried out a comprehensive CRISPR-interference (CRISPRi) screen3,17,18 in three genetic models. We identified several gene products that contribute to persister formation, including Lon protease and YqgE, an uncharacterized protein that has not previously been implicated in these phenomena.

A distinct persister state

We focused first on metG*, a hyper-persistent mutant of MG1655 with a 12-bp deletion in the methionine-tRNA ligase gene8. We hypothesized that metG* hyper-persistence would be lag-dependent. Thus, we grew cells in a chemostat for more than 12 h (Fig. 1a) to establish a uniform exponential population. During exponential phase, metG* and wild-type (MG1655) cells treated with ampicillin or ciprofloxacin had survival rates below 0.01% both when assayed directly from the bioreactor (Extended Data Fig. 1a, time <2 h) or after dilution into fresh medium (Fig. 1b, time <2 h). As cells reached stationary phase, metG* cells showed a marked increase in lag phase persistence, increasing 100-fold in minutes (Fig. 1b, left, first versus second red arrow) before plateauing at 30–60% survival (Extended Data Fig. 1b, left). By contrast, survival of wild-type cells only reached 0.01–0.02% (Fig. 1b and Extended Data Fig. 1b, right). Concurrent with increased antibiotic survival, colony appearance times—a proxy for lag time19—became longer for metG* cells but not for wild-type cells (Fig. 1c).

Fig. 1: A single-cell RNA atlas of growth phases in E. coli reveals a distinct persister cell state.
figure 1

a, Schematic of bioreactor experiment. b, Left, metG* cells exhibit lag-dependent hyper-persistence upon dilution from stationary phase. n = 19 measurements from 4 biological replicates. Top, optical density at 600 nm (OD600) measurements over time. Bottom, survival after antibiotic treatment. Red arrows indicate timepoints used for PETRI-seq. Right, OD600 and survival measurements for wild-type E. coli. n = 14 from 3 biological replicates. Cipro, ciproflaxin; WT, wild type. c, Cumulative distributions of colony appearance times. metG* cells (left) exhibit extended and heterogeneous lag times relative to wild-type cells (right). d, Top, PETRI-seq of metG* cells before and after critical shift to hyper-persistence. Bottom, wild-type cells at matched timepoints. Cells were either sampled directly from the bioreactor (undiluted) or diluted into fresh medium and grown for 20 min or 2 h (n = 1,500 representative cells per population). In the bottom middle panel, the wild-type population was not sampled at 2 h. e, PETRI-seq of wild-type E. coli with higher resolution of growth states, including cells undiluted or after an indicated amount of time in fresh medium. n = 1,500 representative cells per population. f, Unsupervised clustering of combined metG* and wild-type populations; 2,000 representative cells are shown for each cluster. Cluster names indicate corresponding phenotype and growth condition.

Source Data

We sampled cells at critical timepoints (arrows in Fig. 1b) for high-throughput scRNA-seq by prokaryotic expression profiling by tagging RNA in situ and sequencing2 (PETRI-seq). We used an updated PETRI-seq protocol, which includes Cas9-driven depletion of ribosomal RNA and multiple stopping points for flexible use (Supplementary Fig. 1 and Methods). At the indicated timepoints, we sequenced cells taken directly from the bioreactor (‘undiluted’) and 20 min or 2 h after dilution into fresh medium. We used uniform manifold approximation and projection (UMAP) to visualize single cells in two dimensions. Before metG* cells became hyper-persistent, each population occupied a single area of the UMAP, which overlapped with matched wild-type samples (Fig. 1d, left). As metG* survival increased to 0.3% (Fig. 1d, top middle) and then 9% (Fig. 1d, top right), single-cell transcriptomes assayed after dilution became bimodal for the metG* cells, with one population resembling wild type and the other occupying a distinct space (best seen in the orange population in Fig. 1d, top right). We hypothesized that this distinct population corresponds to metG* persisters, as it emerges during lag phase precisely at the transition to hyper-persistence. This state was highly reproducible in replicate scRNA-seq of metG* cells (Extended Data Fig. 1d,e). Comparing these candidate persisters with co-existing non-persisters, we observed a marked reduction in absolute transcripts per cell (Extended Data Fig. 1f, purple violin and pie charts), which is similar to low mRNA abundance in dormant stationary cells2,20 (Extended Data Fig. 1f–h, undiluted). Given the large range in mRNA counts per cell, we downsampled every cell to a maximum of 38 mRNA counts for all analyses.

To better contextualize the emergent persister state, we next performed a high-resolution PETRI-seq experiment using wild-type cells in stationary, lag and exponential phases (Fig. 1e). Consistent with previous work, transcriptional changes occurred less than 3 min into lag phase21. We brought all cells together to generate a single-cell atlas of E. coli growth phases (Fig. 1f). Unsupervised clustering found seven transcriptional states, which we labelled on the basis of the samples in which they appeared. Sequencing of metG* cells after ampicillin confirmed annotation of the persister cluster (Fig. 2a). As shown in Fig. 1f, the persister state is distinct from stationary phase, meaning that persisters transcriptionally respond upon dilution into fresh medium, which occurs even in ampicillin-containing media (Extended Data Fig. 1i–k). Persisters are also distinct from wild-type cells in early lag (wild-type cells 3 min after dilution) or late lag (wild-type cells 10 min after dilution) (Fig. 1f and Extended Data Fig. 1l,m). However, both early lag and persister cells are in a transitional state between the stationary and exponential clusters based on principal component 1 (PC1) (Extended Data Fig. 1n). We used the principal component loadings to broadly define the dominant expression patterns in the cell atlas (Extended Data Fig. 1o). Translation and cold shock response genes22 increase from stationary to persister to exponential cells, whereas amino acid biosynthetic genes follow the reverse trend. Cells from exponential to early stationary phase appear the most aerobic; persisters and early lag cells are the least aerobic.

Fig. 2: Convergence of persister states implicates translational deficiency as a defining feature of persister transcriptomes.
figure 2

a, metG* cells before and after ampicillin treatment (n = 1,100 representative cells for each UMAP). Insets show the percentage of cells in the persister cluster. b, Lag phase antibiotic survival for wild-type cells after 6 days in stationary phase (WT 6-day stat) or for hipA7 cells after standard overnight culture. Survival of both populations is significantly higher than for wild-type cells after standard overnight culture (WT) (one-sided Mann–Whitney U test; WT: n = 14; WT 6-day stat: n = 3 (ampicillin) or n = 2 (cipro); hipA7: n = 4; metG*: n = 10). Data are mean ± s.d. c, 6-day WT and hipA7 cells have extended lag times. Left, cumulative distribution of colony appearance times. Right, mean and range of appearance times for hipA7 and 6-day wild-type populations are higher than for wild-type control cells (one-sided Mann–Whitney U test; P = 0.006; WT: n = 11; WT 6-day: n = 3; hipA7: n = 3; metG*: n = 1). Data are mean (of mean or range) ± s.d. of replicate populations. d, UMAP of hipA7 cells (n = 1,900 representative cells; 3 biological replicates). e, UMAP of wild-type cells (n = 1,900 representative cells; 1 replicate). f, Percentage of cells in the persister cluster (one-sided Mann–Whitney U test versus WT; WT: n = 7; WT ampicillin: n = 1; hipA7: n = 3; WT 6-day: n = 2; WT 6-day ampicillin: n = 1; tetracycline: n = 3; metG*: n = 3). In box plots, the centre line is the median, boxes delineate the 25th and 75th centiles and whiskers extend to minimum and maximum values. Tet, tetracycline. g, UMAP of wild-type cells in ampicillin (n = 36 cells in the entire dataset; 1 replicate). h, UMAP of 6-day wild-type cells (n = 1,900 representative cells; 2 biological replicates). i, UMAP of 6-day wild-type cells in ampicillin (n = 754 cells in the entire dataset; 1 replicate). j, UMAP of wild-type cells in tetracycline (n = 1,900 representative cells; 3 replicates). In all UMAPs, grey cells show all cells in the study for context. Points within the population of interest are coloured by cell cluster. *P < 0.05, **P < 0.005, ***P< 0.0005; NS, not significant (P > 0.05).

Source Data

To specifically find markers of the metG* persister state, we compared gene expression of the persister cluster to the early exponential cluster (Extended Data Fig. 2a(i)), the predominant type of co-occurring non-persister (orange and blue in Fig. 2a, left). Genes that are most upregulated in metG* persisters include rmf, cysK and mdtK (Extended Data Fig. 2a(i)), expression of which we validated using transcriptional fusions (Extended Data Fig. 3a–e). However, many persister markers found by this analysis are also expressed in stationary cells (Extended Data Figs. 2a(ii),(iv) and 3a,b). To find markers that are uniquely upregulated in persisters, we compared persisters to multiple neighbouring clusters (Extended Data Fig. 1p). Of these unique markers, the most upregulated genes include yhaM, which is involved in cysteine detoxification23, and again mdtK, a putative drug efflux pump24. We looked for known pathways that were enriched in persisters relative to all surrounding clusters (stationary, early/late lag and early exponential). No known pathways met these stringent criteria, but a single gene set—genes upregulated by PspF—was significantly enriched in persisters in all comparisons except versus early lag (Extended Data Fig. 1q). Upregulation by PspF typically occurs during cell envelope stress25 and has been seen in E. coli persisters26 and biofilms27. By the same analysis, early lag cells exhibit much more marked upregulation of genes relative to proximal clusters, reflecting a distinct and coordinated expression programme (Extended Data Fig. 1r–u and Supplementary Table 1). Oxidative stress genes upregulated by OxyR and iron uptake genes have previously been discovered as lag phase markers21, whereas to our knowledge, upregulation of oligopeptide transport genes is a novel finding (Extended Data Fig. 1s–u). We conclude that metG* persister cells do not turn on lag-specific processes, which may contribute to their failure to resume growth. Knockout of oxyR has previously been shown to be sufficient to extend E. coli lag times28, whereas non-oxidizing anaerobic conditions decrease lag times29.

Convergence of persister transcriptomes

Having defined a distinct persister state in metG* cells (Fig. 2a), we tested whether this transcriptional state resembles other models of persistence. HipA is a kinase that targets glutamyl tRNA synthetase (gltX)30, and the hipA7 mutation confers lag-dependent hyper-persistence6,31 (Fig. 2b,c). We performed PETRI-seq on wild-type and hipA7 cells 45 min after dilution from a standard overnight (Fig. 2d,e and Extended Data Fig. 4a,b). For hipA7 cells, we observed significantly increased occupancy in the same persister cluster as metG* cells (Fig. 2f). We also analysed the same hipA7 cells independently and found two cell clusters, one of which clearly mapped to the metG* persister cluster (Extended Data Fig. 5a–d).

We next explored whether our persister cluster more broadly represents a state of persistence that may occur independently of mutations related to translation. In the wild-type populations sampled around 45 min after dilution from overnight growth, 1.2% of cells are in the persister cluster (Fig. 2f), which matches the persister fraction in ampicillin after the same amount of time (Extended Data Fig. 4b, kill curve). To determine whether wild-type cells in the persister cluster survive antibiotics, we treated them with ampicillin. Ampicillin increased the proportion of wild-type cells in the persister cluster by more than 40 times (Fig. 2f,g) and made persisters the most abundant cell cluster. Together, these data support the existence of shared transcriptional programmes in wild-type and metG* persisters, although resolution at this low persistence rate is limited.

To increase the persistence rate without mutation, we starved wild-type E. coli for 6 days, which increased the persistence rate after dilution and 4 h of antibiotics to nearly 1% with a concurrent increase in lag times (Fig. 2b,c). We applied PETRI-seq to these ‘6-day’ wild-type cells after dilution into fresh medium (Fig. 2h and Extended Data Fig. 4c). Remarkably, 7.4% of these cells were in the persister cluster (Fig. 2f), significantly more than after a standard overnight culture (Fig. 2e). We analysed these cells independently, but they did not clearly group into two clusters (Extended Data Fig. 5e), possibly owing to a more continuous spectrum of states rather than bimodality (also evident in Fig. 2d versus Fig. 2h and in lag times (Fig. 2c)). To confirm that co-clustering was not simply because of the high complexity of the atlas, we analysed a subset of samples and again found enrichment in the persister cluster (Extended Data Fig. 5f–h). We also confirmed that convergence of models to the same persister cluster was not an artefact of low mRNA capture by downsampling cells to exactly 30 mRNAs and discarding cells with fewer than 30 counts (Extended Data Fig. 5i,j). As validation that 6-day wild-type cells in the persister cluster survive antibiotics, ampicillin treatment increased persister cluster occupancy fourfold and made persisters the most abundant cell type (Fig. 2f,i).

Finally, we turned to the clinical uropathogenic E. coli (UPEC) isolate strain CFT073 (ref. 32) which exhibits modestly increased antibiotic survival and concurrent lag time increase (Extended Data Fig. 6a,b). We sequenced UPEC cells 1 h after dilution from overnight and 4 h after dilution into ampicillin (Extended Data Fig. 6c). The 1-h population clustered into 4 cell states based on expression of fimbriae or flagellin genes or lack thereof (cluster 0) (Extended Data Fig. 6d,e). None of these clusters could be defined as a persister state, although ampicillin treatment enriched cells in cluster 0 (Extended Data Fig. 6f). We then integrated UPEC cells into our full atlas and found that UPEC cells that remained after ampicillin were highly enriched in the same cluster as MG1655 persister models (Extended Data Fig. 6g–j).

Co-clustering of different persister types is an important signal of convergence of persister states but does not mean that gene expression is identical across models. Although we find that marker genes and pathways significantly overlap, many genes and pathways also differ between persister models (Extended Data Figs. 2, 5k,l and 6k). Similar to metG* persisters, hipA7 persisters do not upregulate lag-specific pathways. 6-day wild-type persisters upregulate one of three lag-specific pathways—oligopeptide transport genes (Extended Data Fig. 5l).

Low translation underlies the persister state

Given a lack of strongly upregulated known pathways to define the persister cluster, we considered an alternative approach to make sense of this transcriptional signature. Specifically, metG* and hipA7 mutations are translation-related, so we treated wild-type cells in lag phase with tetracycline, a bacteriostatic antibiotic that inhibits translation33. Tetracycline-treated cells cluster with persister cells (Fig. 2f,j and Extended Data Fig. 4d), implicating translational deficiency as a defining feature of persister transcriptomes across models. Bulk proteomics of metG* cells after dilution supports that these persister proteomes are minimally changed from stationary phase (Extended Data Fig. 7a–d), despite the transcriptome responding to fresh medium (Fig. 1f and Extended Data Fig. 7f). As further confirmation of translational deficiency in metG* persisters, we analysed metG* cells that expressed both pcysK-GFP, a reporter for persisters (Extended Data Fig. 3b) and pLlacO1-RFP, a proxy for protein expression. Assayed persister cells exhibited reduced translation relative to growing cells (Extended Data Fig. 3f,g). Previous studies have implicated low protein expression in persistence, specifically finding slow fluorescent protein production in hipA7 persisters34 and showing that wild-type cells with low protein expression are more likely to survive antibiotics13,35.

We next assayed the role of transcriptional deficiency in persistence. Inhibition of transcription by rifampicin, a bacteriostatic antibiotic that targets RNA polymerase36 (Extended Data Fig. 8a–c), revealed that cells without active transcription retain ribosomal RNA (rRNA) and thus can be captured by PETRI-seq (Extended Data Fig. 8d) but have substantially lower mRNA counts than non-rifampicin-treated cells (Extended Data Fig. 8e). Comparing rifampicin-treated cells to persister models shows active transcription in persister cells (Extended Data Fig. 8f). Although PETRI-seq captures fewer mRNAs per cell for persisters than for non-persisters, comparison of metG* persisters to tetracycline treatment indicates that translational deficiency is sufficient to explain lower transcript counts (Extended Data Fig. 8g).

Tetracycline pharmacologically recapitulates a persister-like state but does not lead to sustained cell dormancy (Fig. 3a,b). We thus reasoned that genes that were differentially expressed between self-maintained persister cells and tetracycline-induced tolerant cells could be critical for persister formation and maintenance. Genes upregulated in metG*, hipA7 and 6-day wild-type persister cells versus tetracycline-treated cells significantly overlap but are not identical (Extended Data Fig. 9a). We computed PC1 for the persister cluster alone, which weakly separated sample types (Fig. 3c). Multiple pathways were significantly correlated with PC1 and differentially expressed in persisters relative to tetracycline-treated cells (Fig. 3c and Extended Data Fig. 9b). Upregulation of aminoacyl-tRNA ligase activity across persister types suggests a compensatory response to disrupted tRNA aminoacylation in the mutants. Surprisingly, metG itself was not one of the genes that was upregulated in metG* persisters (Extended Data Fig. 9c), indicating failure to compensate for a hypomorphic mutation (Extended Data Fig. 9d). Other pathways that were upregulated in persister cells include protein metabolism, amino acid biosynthesis and tricarboxylic acid (TCA) cycle (metG* only) (Fig. 3c). In the other direction, cells treated with tetracycline expressed more ribosomal components, which is consistent with downstream inhibition of translation relative to persister cells.

Fig. 3: Translation inhibition as a reference to predict pathways causal to persistence.
figure 3

a, Wild-type cells in tetracycline are highly antibiotic-tolerant but become susceptible after tetracycline is removed. Survival of cells in or after tetracycline is higher than for untreated cells (one-sided Mann–Whitney U test; P = 0.005; WT: n = 14; WT in tetracycline: n = 3; WT after tetracycline: n = 3), but cells in tetracycline survive better than cells after tetracycline treatment (P = 0.04). Data are mean ± s.d. b, After tetracycline treatment, wild-type cells do not extend lag times. Left, cumulative distribution of colony appearance times. Right, statistics for additional replicates (one-sided Mann–Whitney U test; after tetracycline: n = 2; no tet control: n = 11). Data are mean (of mean or range) ± s.d. of replicate populations. c, Top, distribution of cell types along PC1 calculated for the persister cluster alone. Bottom, moving average (size = 1,000 cells) of pathway expression along PC1 for terms that are significantly divergent between persisters and tetracycline-treated cells (iPAGE; Methods). Coloured asterisks indicate differential expression between each persister type versus cells in tetracycline.

Source Data

CRISPRi to identify persistence drivers

Having identified pathways with expression patterns suggestive of a role in persistence, we then carried out a systematic genetic interrogation to agnostically identify factors that are causal to persister formation and maintenance. We used CRISPR adaptation-mediated library manufacturing (CALM) to generate a comprehensive set of CRISPR RNAs (crRNAs) covering all E. coli genes3. We then used CRISPRi to probe the contribution of every gene to lag phase duration and antibiotic survival in metG*, hipA7 and wild-type MG1655 cells (Fig. 4a). For reference, we measured the contribution of every gene to exponential growth and confirmed that essential genes were substantially depleted (Extended Data Fig. 9f,g). Fitness effects were highly correlated for metG* and wild-type cells in exponential phase, when metG* has no phenotype (Extended Data Fig. 9h).

Fig. 4: Genome-wide CRISPRi screen reveals genes and pathways that are causal to lag-dependent persistence.
figure 4

a, Schematic of CRISPRi screen (Methods and Supplementary Table 5). Abx, antibiotics; amp, ampicillin; aTc, anhydrotetracycline. b, Correlation between gene perturbation effects on metG* lag versus wild-type lag (ρ = 0.84; n = 413 genes with false discovery rate (FDR) < 0.1 in at least 2 replicates of either comparison). c, Correlation between gene perturbation effects on hipA7 lag versus wild-type lag (ρ = 0.87; n = 215 genes with Bonferroni-corrected P value < 0.05 (see Methods; more stringent because there is only 1 replicate)). d, Gene perturbations enriched after lag phase outgrowth of metG* cells in all replicates. Solid circles indicate significant enrichment or depletion (Methods; FDR = 0.1). Data are mean ± s.d. Exp, exponential phase. e, Mean expression of genes in d for metG* persisters, hipA7 persisters or tetracycline-treated cells in the persister cluster. Genes that are significantly overexpressed in persisters versus tetracycline-treated cells are indicated (two-sided Mann–Whitney U test; n = 4,576 tet cells, 19,611 metG* cells and 975 hipA7 cells). Data are mean ± s.e.m. Expression across clusters is shown in Extended Data Fig. 9r.

Source Data

To identify genes involved in lag-dependent persistence, we focused primarily on gene perturbations that shortened lag times. For metG*, knockdown of two genes, lon and yqgE, markedly shortened lag times (Fig. 4b), though these genes do not affect wild-type or hipA7 lag times (Fig. 4b–d). Lon is a heat shock-induced protease with a disputed role in persistence37,38, and YqgE is an uncharacterized protein that has not previously been implicated in persistence or lag phase. In addition to these genes, CRISPRi perturbation of five other genes also significantly shortened lag phase in all metG* replicates and reduced metG* antibiotic survival (Fig. 4d and Extended Data Fig. 10a). Notably, these other driver genes are shared across genotypes, indicating that mechanisms of persister formation are simultaneously distinct and overlapping (Extended Data Fig. 9i). Specifically, crRNA targeting of rpoH39, which encodes the heat shock-induced σ32, and sucA, which encodes TCA cycle enzyme 2-oxoglutarate decarboxylase40, shortened lag times across genotypes (Fig. 4d), as did perturbation of three genes encoding primosomal proteins (Extended Data Fig. 10a). The primosome is essential for propagating plasmids with the ColE1 origin41, including our crRNA plasmid, which may confound implication of these genes in persistence.

Notably, the only highly significant gene perturbation that specifically shortened hipA7 lag time was hipA itself (Fig. 4c). Conversely, repression of gltX, which encodes the target of the HipA kinase, specifically lengthened hipA7 lag times (Fig. 4c). Repression of metG similarly extended lag times more in metG* cells than in wild-type cells (Fig. 4b).

We tested the hypothesis that genes found by scRNA-seq to be upregulated in persister cells relative to tetracycline-treated cells would be candidate drivers of persistence. Indeed, lon, yqgE, rpoH and sucA are significantly upregulated in metG* persister cells, whereas rpoH, sucA and hipA are upregulated in hipA7 persisters (Fig. 4e and Extended Data Fig. 9e). To consider a larger set of genes, we relaxed our criteria to include all 51 genes that were significantly enriched in metG* lag in at least two replicates. Of these, 21 were significantly upregulated in metG* persister cells (Extended Data Fig. 9j). This highly significant concordance (P < 10−4) cross-validates the two independent approaches and supports our driver gene hypothesis. Considering these two datasets together gives key biological insight, as shared hits are likely to function in persister cells and/or proximal to their formation. Additional gene set comparisons substantiated that transcriptional markers significantly overlap with persistence drivers across genotypes (Extended Data Fig. 9k–n). Notably, many top marker genes found by PETRI-seq did not prove functionally important by CRISPRi (Extended Data Fig. 10b), demonstrating the importance of our dual approach and supporting a model in which causal genes are enriched but still a minority of those expressed.

On the pathway level, we similarly saw concordance between CRISPRi and expression data. crRNA targeting of TCA cycle-related genes shortened lag times in all cell types (Extended Data Fig. 10c). TCA cycle activity has previously been implicated in lag-dependent persistence in wild-type E. coli29,42. We also found genotype-specific causal pathways. Targeting protein metabolism genes significantly reduced metG* lag time (Extended Data Fig. 10d). This gene set is expressed in metG*, hipA7 and 6-day wild-type persisters (Fig. 3c); it includes proteases but not top driver gene lon. The targeting of some pathways, including amino acid biosynthesis, specifically reduced wild-type lag and antibiotic survival (Extended Data Fig. 10e). Amino acid biosynthetic genes are also expressed across persister types (Fig. 3c) and have previously been found to affect E. coli tolerance during stationary phase43. Other expressed pathways (Fig. 3c and Extended Data Fig. 9b) may be important for persister resuscitation, as their repression lengthened lag times, although this may be due to slower growth (Extended Data Fig. 10f).

lon and yqgE in metG* hyper-persistence

Of the 7 top drivers of metG* persistence (Fig. 4d and Extended Data Fig. 10a), we selected lon, yqgE and priA for validation by genomic deletion. Of the genes not selected, sucA is part of the 2-oxoglutarate dehydrogenase complex, loss of which has already been shown to reduce lag-dependent persistence in wild-type cells29. dnaC, dnaT and rpoH are essential genes44. Excluded genes are also functionally related to selected genes (dnaC/T is part of the primosome with priA, and lon is induced by rpoH). As detailed below, deletion of lon or yqgE significantly reduced metG* lag-dependent persistence. However, deletion of priA did not affect metG* lag and was not pursued further (Extended Data Fig. 11a).

Lon protease, the top hit identified by CRISPRi (Fig. 4b,d), has been implicated in persistence previously, through a specific SulA-dependent mechanism. SulA is a cell division inhibitor that accumulates during antibiotic treatment and prevents growth resumption unless cleared by Lon38,45. SulA accumulation can explain why lon perturbation reduces antibiotic survival across genotypes but not why it reduces lag time in metG* cells (Fig. 4d). As expected, single deletion of lon strongly reduced antibiotic survival in both wild-type and metG* cells (Extended Data Fig. 11b). Furthermore, lag phase antibiotic survival of metG*-ΔlonsulA cells decreased more than 100,000-fold relative to the metG* parental strain (Fig. 5a), indicating a major sulA-independent role for Lon. Similar to wild-type cells, metG*lonsulA cells exhibit short, uniform lag times (Extended Data Fig. 11c). In the wild-type background, lon and sulA deletion decreased antibiotic survival 5.2-fold (Fig. 5a, grey versus black).

Fig. 5: Lon protease and YqgE have key roles in lag-dependent hyper-persistence.
figure 5

a, lon deletion decreases lag phase persistence of wild-type and metG* cells (one-sided Mann–Whitney U test; n = 14 (wild type), n = 5 (ΔlonsulA, metG*lonsulA), n = 15 (metG*)) independent of sulA. Data are mean ± s.d. b, The Lon inhibitor bortezomib reduces metG* lag phase when added before or during stationary phase. Left, cartoon of the experiment. Bortezomib was added either when colonies were picked (black arrow), after overnight growth but 3 h before dilution (pink arrow), or at dilution (green arrow). Right, growth curves after dilution and overnight culture. Line colours correspond to the arrows in the left panel. Adding bortezomib before or during the overnight culture significantly reduced lag times (n = 3; P = 0.03; one-sided Mann–Whitney U test) but did not affect doubling time. c, yqgE deletion decreases lag phase persistence of metG* cells. Left, survival of metG*-ΔyqgE cells is strongly correlated with the length of the preceding culture (two-sided Spearman’s ρ = 0.9; P = 10−6), whereas survival of metG* cells is not (ρ = 0.1; P = 0.7). Right, after growth for more than 24 h, lag phase survival is lower for metG*-ΔyqgE cells than for metG* cells (one-sided Mann–Whitney U test; P = 0.002, n = 6 biological replicates; fold change = 6). Data are mean ± s.d. d, Left, representative cumulative distributions of colony appearance times. Right, mean (top) and range (bottom) of appearance times for replicate populations (one-sided Mann–Whitney U test; n = 6 (metG*-pRFP+), n = 10 (metG*-pYqgE+), n = 7 (metG*yqgE-pRFP+; metG*yqgE-pYqgE+), n = 4 (WT-pRFP+ and WT-pYqgE+); pRFP+ is omitted from labels for brevity). Data are mean (of mean or range) ± s.d. of replicate populations. e, Distribution of CRISPRi effects on lag times of wild-type cells overexpressing YqgE (n = 4,486 genes). f, Protein expression in stationary phase. Significance is shown for each cell type relative to wild type at the indicated timepoint (one-sided Mann–Whitney U test; n = 5 (metG*) or 6 (all others) biological replicates). Data are mean ± s.e.m. AU, absorbance units.

Source Data

Lon can be pharmacologically inhibited by the small molecule bortezomib46. Bortezomib has been shown to reduce exponential phase E. coli survival in response to ciprofloxacin47 because of SulA accumulation. We found that without sulA, bortezomib also reduces antibiotic survival of metG* cells, further validating a sulA-independent mechanism and supporting protease inhibitors as potentially useful antibiotic adjuvants (Extended Data Fig. 11d,e).

We next used bortezomib to temporally place the role of Lon in persister formation and maintenance. Using metG*sulA cells, we added bortezomib either before cells reached stationary phase, during stationary phase, or at the start of lag phase (Fig. 5b, left). As a proxy for lag-dependent persistence, we measured population lag times. Bortezomib had the strongest effect when added before stationary phase but still had a significant effect during stationary phase (Fig. 5b, right, black and pink lines). However, inhibiting Lon at the start of lag phase had no effect (green line), indicating that Lon is key to persister formation during stationary phase. We then used proteomics to find candidate targets stabilized by deletion of Lon protease (Extended Data Fig. 12a,b and Supplementary Table 2). Proteomics implicates loss of iron–sulfur cluster assembly proteins IscS and IscU as well as reduction in iron–sulfur cluster binding proteins overall as potentially important during metG* stationary phase. Notably, Lon deletion also decreased expression of TCA cycle proteins, which could be an additional contributor to reduced persistence (Extended Data Figs. 10c and 12c,d).

yqgE, the other top hit in our screen, is uncharacterized and to our knowledge has not been previously implicated in persistence. We introduced yqgE deletions into metG* and wild-type cells. metG*yqgE survival to antibiotics was between 2,000- and 5-fold lower than the parental metG* strain (Fig. 5c) with a substantial dependence on the length of the preceding overnight culture (Fig. 5c and Extended Data Fig. 11f). This strong correlation implicates YqgE activity as most important early in stationary phase, although after growth for over 24 h, metG*yqgE cells still exhibit significantly reduced lag phase antibiotic survival (Fig. 5c, right and Extended Data Fig. 11g). yqgE deletion shortened lag times in the metG* background (Fig. 5d, solid orange versus solid blue). We also expressed the endogenous yqgE locus on a high-copy plasmid (pYqgE+). In both metG* and metG*yqgE backgrounds, pYqgE+ increased the right tail of colony appearance times relative to each parental strain (Fig. 5d, dotted versus solid lines). Likely related, these overexpression strains give rise to fewer colony-forming units (CFUs) after overnight growth (Extended Data Fig. 11h). The missing CFUs may be viable but nonculturable cells, a more extreme version of the persister state48. As expected, pYqgE+ increased antibiotic survival in metG*yqgE cells (Extended Data Fig. 11i).

In contrast to metG*, but matching our screen (Fig. 4d), deleting yqgE in wild-type cells did not affect lag phase antibiotic survival or lag times beyond a minor fitness decrease (Extended Data Fig. 11j–l). Similarly, yqgE deletion had no effect on persistence in 6-day wild-type cells (Extended Data Fig. 11m). However, yqgE overexpression in wild-type cells did increase lag times (Fig. 5d; dotted grey versus black) and lag phase antibiotic survival (Extended Data Fig. 11n), indicating that YqgE can promote dormancy and lag-dependent persistence beyond the metG* context. To find other factors critical for promotion of dormancy by YqgE, we carried out a CRISPRi screen in wild-type cells overexpressing yqgE. Resembling the metG* context, lon and rpoH were top hits for reversing YqgE-promoted dormancy (Fig. 5e). A deletion strain confirmed that lon is epistatic to YqgE-induced dormancy (Extended Data Fig. 11o,p). Although the enzymatic function of YqgE has yet to be shown experimentally, in silico prediction suggests that it has protein disulfide isomerase activity49 (Supplementary Fig. 2).

In investigating the timing of translational deficiency in metG* persisters, we found that metG* cells have reduced protein expression during stationary phase (Fig. 5f, orange versus black) prior to exhibiting lag-dependent hyper-persistence. Specifically, we tested this by transforming an inducible RFP and then growing cultures to stationary phase without inducer. Once cells reached stationary phase, we added isopropyl-β-d-thiogalactopyranoside (IPTG) to induce RFP expression. Given the apparent role of yqgE in promoting cell dormancy, we interrogated whether it may contribute to downregulation of translation. By carrying out the same experiment with ΔyqgE and metG*yqgE cells, we found that, indeed, the reduced protein expression in metG* depends on yqgE (Fig. 5f, blue versus orange) and that stationary wild-type cells exhibit higher protein expression when yqgE is knocked out (Fig. 5f, purple versus black). We reason that under certain conditions, YqgE represses protein expression, which may be critical to extending post-starvation lag time. To determine whether yqgE could have such a role across multiple species, we searched for homologues in 2,421 microbial genomes from diverse taxa. We detected yqgE homologues in 35% of genomes, which was more frequent than 70% of all MG1655 genes (Extended Data Fig. 11q). Previous work has implicated algH, the homologue of yqgE in Pseudomonas aeruginosa, in regulation of virulence factors in that species50, supporting a global regulatory role for YqgE-like proteins across Proteobacteria.

Discussion

Here we used unbiased systems approaches to characterize E. coli persister states and genetic factors that promote persister formation and maintenance. We applied PETRI-seq2 to wild-type and hyper-persistent E. coli to generate an atlas of growth states and discover convergent persister states defined primarily by translational deficiency. To discover driver genes underlying persistence, we carried out comprehensive CRISPRi screening3. In wild-type cells, our screen supported previously discovered driver pathways, including the TCA cycle29. In the metG* mutant, we found Lon protease and YqgE to be major drivers of hyper-persistence.

scRNA-seq facilitates a previously unattainable view of persister cell states and reveals that lag-dependent persister cells across multiple genotypes are in a transitional state between stationary and exponential phases (Fig. 1f and Extended Data Fig. 1n,o). Rather than markedly upregulating specific known pathways, persisters seem to be in a dysregulated state in which transcriptional patterns primarily reflect collateral responses to low translation (Fig. 2j). This convergent signature implicates translational deficiency as a core feature of persistence. Our work was conducted under laboratory conditions and primarily using non-pathogenic MG1655, but convergence of UPEC strain CFT073 persisters to the same persister cluster (Extended Data Fig. 6i,j) suggests broad relevance of these findings. Beyond basic phenomenology, scRNA-seq provides insight into causal genes and could be used to tailor drug adjuvants (Fig. 4d,e). As MG1655 and CFT073 strains have both been passaged for decades51, it will be critical for future research to extend scRNA-seq to fresh clinical isolates.

It is also important to note that PETRI-seq captures only a fraction of transcripts per cell, and limited resolution may affect our ability to detect upregulated genes and pathways. This is particularly true for the wild-type MG1655 and CFT073 persisters, which could only be isolated in small numbers after antibiotic enrichment (Fig. 2g and Extended Data Fig. 6i). Total transcript counts for each cell type are included in Supplementary Table 4 and show a range of transcriptome coverage. Notably, we detect more transcripts in metG* persisters than in early lag cells, which express multiple unique pathways found by PETRI-seq (Extended Data Fig. 1s–u). Thus, our ability to detect upregulated pathways in early lag cells but not in metG* persisters supports our key conclusion that the persister state is primarily defined by translational deficiency rather than specific upregulation of defined pathways.

Although we find that transcriptional states across persister models are convergent, CRISPRi reveals that major upstream mechanisms diverge (Fig. 4). For metG*, lon and yqgE are key drivers (Fig. 4b). By contrast, we find hipA itself to be the only major hipA7-specific driver gene (Fig. 4c). Both metG* and hipA7 affect tRNA aminoacylation, but CRISPRi hits overlap as much between these two strains as with the wild type (Extended Data Fig. 9i). This surprising result suggests that lag-dependent persisters are generated by diverse upstream mechanisms that nevertheless converge to similar transcriptional states. Notably, the metG* mutation does not seem to directly prevent translation, as lon deletion completely reverses hyper-persistence (Fig. 5a) and yqgE deletion recovers stationary phase translation (Fig. 5f).

Causal genes for metG*, hipA7 and wild-type persistence also overlap, as seen for the TCA cycle (Extended Data Figs. 9i and 10c). The effect of TCA cycle disruption on wild-type persistence is most profound. Previous work has implicated the TCA cycle in persistence and attributed this to self-digestion to generate reducing power29. It has also been found that lag-dependent persisters undergo more divisions as a population enters stationary phase, which may lead to stochastic generation of cells with very low protein levels52. Together, the combination of self-digestion and reductive division could lead to rare cells with extremely low protein abundance and translation rates. Once fresh nutrients are available, positive feedback from translation can exacerbate small differences and establish bimodality in the population. Compounding this divergence, translationally deficient persisters do not appear to express important lag pathways, including redox defence and iron uptake (Extended Data Figs. 1s–u and 5l).

In metG*, hyper-persistence requires Lon, a ubiquitous protease that has evolved to degrade a broad set of targets, including misfolded proteins53, antitoxins54, ribosomal proteins55 and sulfur-assimilation proteins56. Ribosomal protein degradation could explain the role of Lon in persistence but is not supported by our proteomics data (Extended Data Fig. 12e,f). Instead, degradation of other targets, such as iron–sulfur cluster assembly proteins (Extended Data Fig. 12a,b), may leave cells lacking key protein products that are needed to resume translation upon addition of fresh nutrients.

Finally, we identified YqgE as a major driver of metG* persistence that is sufficient to increase lag times and persistence in wild-type E. coli (Fig. 5d and Extended Data Fig. 11n). Lon is epistatic to YqgE activity (Fig. 5e and Extended Data Fig. 11o,p), suggesting that YqgE may modulate Lon-mediated proteolysis. Functional annotation of YqgE suggests that it has disulfide isomerase activity (Supplementary Fig. 2). Of note, a disulfide redox switch is known to tune Lon activity57. We hypothesize that YqgE acts as a checkpoint that responds to starvation, as its function is most important early in stationary phase (Fig. 5c). Truncated metG is likely to have reduced enzymatic activity58 that could become limiting upon nutrient depletion. Many possible signals could then trigger YqgE activity, including but not limited to activation of the stringent response59 or loss of translation fidelity60. YqgE could then promote Lon-mediated degradation of critical targets to reduce translation (Fig. 5f). Without YqgE, metG* cells remain partially hyper-persistent (Fig. 5c). In the absence of checkpoint YqgE, baseline Lon activity alone could ultimately lead to depletion of proteins critical for ramping up translation after starvation. In both cases, when fresh nutrients are added, low translation rates across the population lead to high persistence rates. As individual cells produce more proteins, positive feedback can drive a switch-like return to exponential growth, establishing a bimodal distribution of cell states.

Methods

Bacterial strains and culture conditions

E. coli MG1655, UPEC CFT073, and derivative mutant strains (Supplementary Table 6) were routinely grown at 37 °C with shaking. One millilitre of culture was grown in 14-ml round-bottom culture tubes shaking at 300 rpm, or larger volumes were grown in flasks at lower shaking speed. For all liquid experiments, we used supplemented M9 medium (SM9)8 (1× M9 salts (DF0485-17, Fisher Scientific), 0.4% glucose, 2 mM MgSO4, 0.1 mM CaCl2, 2 μM ferric citrate, 3.1 g l−1 Neidhardt Supplement Mixture (NSM01, ForMedium) and micronutrient supplement as previously described61). Neidhardt Supplement was autoclaved for 20 min, stirred for 10 min, and then the other medium components were added.

Semisolid medium was prepared as previously described62 but using SM9. To prepare semisolid SM9, Neidhardt Supplement Mixture was combined with SeaPrep agarose (3.5 g l−1, Lonza) and autoclaved for 22 min. Remaining medium components were added after autoclaving, as done for SM9 broth. Semisolid medium was then cooled to 37 °C. Cells were inoculated into semisolid medium at 37 °C and then briefly stirred. The resulting culture was placed in an ice bath for 30 min to let the medium gel. The ice bath must reach higher than the liquid level of the medium to evenly chill the entire volume. The semisolid culture was carefully transferred to 37 °C.

For strain construction and plasmid preparation, E. coli were grown in LB Miller Broth (DF0446-07-5, Fisher Scientific). LB Miller plates were used for growth on solid medium unless otherwise noted. For plasmid maintenance, plates or broth were supplemented with 25 μg ml−1 chloramphenicol, 50 μg ml−1 kanamycin, or 50 μg ml−1 carbenicillin. Cells were routinely pelleted by centrifugation at 5,000g for 5 min.

Plasmid construction

prmf-RFP and pmdtK-RFP were assembled by NEB HiFi assembly (E2621L) using pBbA6C-RFP63 as backbone (amplified with SB226 and SB227) and prmf-GFP64 or pmdtK-GFP64 as insert (amplified with SB224 and SB228).

All plasmids are listed in Supplementary Table 6. pBbS6C-dcas9, pBbS6C-metG, pBbS6C-metG*, and pBbS6C-yqgE were cloned by NEB HiFi assembly (E2621L) using a common vector backbone (pBbS6C-RFP63 amplified with SB167 and SB168) and the following gene inserts: pWJ445 amplified with SB170, SB171 (dCas9); MG1655 genomic DNA (gDNA) amplified with SB150, SB169 (metG); MG1655-metG* gDNA amplified with SB150, SB169 (metG*); MG1655 gDNA amplified with SB202, SB203 (yqgE).

To assemble pYqgE+ and pRFP+, intermediate plasmid pBbE6A-RFP was assembled by ligation of ZraI- and XhoI-digested pBbS6C-RFP63 and pBbE2A-RFP63. pBbE6A-RFP was amplified by PCR with SB168 and SB180 to generate the vector fragment. For pYqgE+, yqgE was amplified from MG1655 genomic DNA with SB178 and SB179. For pRFP+, RFP was amplified from pBbE2A-RFP63 with SB197 and SB198. Vector and inserts were assembled using the HiFi assembly kit (E2621L, New England Biolabs).

pBbS6A-yqgE (also called ‘i-pYqgE+’ for ‘inducible pYqgE+’) was assembled by NEB HiFi assembly (E2621L) using pBbS6A-yqgE63 as backbone (amplified with SB212 and SB213) and pYqgE+ for the insert (amplified with SB180 and SB214).

Strain construction

hipA7 cells were constructed by transferring the hipA7 mutation from TH126931 to our MG1655 strain. All deletion strains were constructed using λ Red-mediated recombination65. The following primers were used to amplify the template from pKD4 for recombination: SB165 and SB166 (yqgE), SB183 and SB184 (lon), SB185 and SB186 (priA), SB191 and SB192 (sulA). For all strains except ΔpriA, the kanamycin resistance (kanr) cassette was removed with pCP20, which was subsequently lost after non-selective overnight growth at 42 °C. Strains were confirmed to have no remaining antibiotic resistance.

Removal of kanr was not successful for MG1655-ΔpriA or metG*priA, as cultures did not grow at 42 °C. Instead, experiments in Extended Data Fig. 11a were done with the kanamycin marker still present. After outgrowths (Extended Data Fig. 11a), deletion of priA was confirmed again, and dnaC was checked for compensatory mutations66. ΔpriA strains were grown in M9 for cloning steps and then in SM9 for growth curves (Extended Data Fig. 11a).

Bioreactor growth

Ten litres of SM9 medium was prepared in a carboy. Around 160 ml were transferred into an autoclaved vessel (DASGIP) with 500 ml capacity. A silicone heater (GBH0250-1, BriskHeat) was used to bring the temperature of the medium to 37 °C, after which 100 μl of overnight culture was inoculated into the vessel, and medium flow into the vessel was turned on. An outflow pump maintained a constant level of medium. The culture was stirred with a stirrer bar at 500 rpm and air was flowed in at 0.1 l min−1. Medium flow rate was manually tuned to be above the E. coli doubling time. After >12 h, medium flow was turned off. Using a sampling port and syringe, samples were taken for OD600 measurement, antibiotic treatment, ScanLag, and/or PETRI-seq.

Antibiotic survival assays

To measure antibiotic tolerance, cells were incubated in SM9 containing 200 μg ml−1 ampicillin and/or 5 μg ml−1 ciprofloxacin for 4 h (unless otherwise noted) with shaking at 37 °C. For lag phase antibiotic survival, cells were taken either from the bioreactor or from an overnight culture and diluted 1:50 or 1:100 into SM9 plus antibiotics. When important, the length of the ‘overnight’ culture was noted (as in Fig. 5c or for 6-day stationary in Fig. 2b), but typical overnight cultures were grown for 16–24 h. For stationary phase antibiotic survival (‘undiluted’ in Extended Data Fig. 1a), antibiotics were added directly to the overnight culture. To count colonies after treatment, cells were pelleted, resuspended in PBS, and then plated on LB. CFUs were counted after 48 h and compared to CFUs before antibiotic treatment. Unless otherwise noted, replicates were biological replicates from distinct single colonies picked for overnight cultures.

To assay survival ‘in tet’ or ‘after tet’ (Fig. 3a,b), overnight cultures were diluted into fresh medium containing 54 μg ml−1 tetracycline. Cultures were incubated in tetracycline for 30 min at 37 °C with shaking. Then, antibiotics were added (in tet), or cells were pelleted, washed twice in tetracycline-free fresh medium, then treated with antibiotics (after tet). Cells were kept in antibiotics for 4 h.

To assay antibiotic survival after rifampicin (Extended Data Fig. 8b), overnight cultures were diluted into fresh medium containing 200 μg ml−1 rifampicin. Cultures were incubated for 30 min at 37 °C with shaking, then ampicillin or ciprofloxacin was added and incubated for 4 h (‘WT in rifampicin’). For comparison, an overnight culture was diluted into antibiotic-containing medium (without rifampicin, ‘WT’ on plot). To assay survival in rifampicin alone (Extended Data Fig. 8c), overnight cultures were diluted into fresh medium containing 200 μg ml−1 rifampicin and incubated for 1 h at 37 °C with shaking. CFUs were counted before and after rifampicin.

Appearance time assays

Cells were taken either from the bioreactor or from a standard overnight culture and ~100 CFU were spread on an LB or SM9 agar plate. Unless otherwise noted, replicates were measured from distinct single colonies picked for inoculation. To maximize reproducibility, all plates contained 25 ml of medium. Colony appearance times were not different between LB and SM9 plates. As detailed previously19, plates were put on a scanner (Epson V500 Photo) in the 37 °C incubator and scanned at 15-min intervals for 24–48 h.

Scanners were controlled by ScanningManager software, and images were analysed using Matlab scripts previously published19. Appearance times were found using the appearance output of getAppearanceGrowth. Minimum colony size was set to 20 and maximum set to 100.

PETRI-seq library preparation

Growth conditions for all PETRI-seq samples are detailed in Supplementary Table 3.

PETRI-seq of E. coli cells was carried out as detailed previously2. A stepwise protocol is available at: https://tavazoielab.c2b2.columbia.edu/PETRI-seq/. In brief, cells were pelleted and fixed overnight in 4% formaldehyde. The following day, cells were washed twice in PBS with RNase inhibitor (PBS-RI) and then resuspended in 50% ethanol in PBS-RI. In 50% ethanol, cells could be stored at −20 °C for at least 2 weeks. Cells were washed twice in PBS-RI to remove the ethanol and then permeabilized with lysozyme. Cells were washed twice again and then treated with DNase. After DNase inactivation, cells were washed twice in PBS-RI. As a stopping point, cells could then be resuspended in 50% ethanol in PBS-RI and saved at −20 °C for at least 2 weeks; then they were washed twice again in PBS-RI before resuming. To continue cell preparation, the cell pellet was resuspended in PBS-RI and counted using a haemocytometer. Split-pool barcoding, cell lysis, and second strand synthesis were performed as described, yielding 20 μl purified cDNA2. For tagmentation, EZ-Tn5 (TNP92110, Biosearch Technologies) was loaded by annealing SB117 and SB118 (Supplementary Table 6), diluting the oligonucleotides to 5 μM each in 50% glycerol, and then adding 2 μl EZ-Tn5 to 8 μl of the oligonucleotides. EZ-Tn5 was incubated with the oligonucleotides for 30 min at room temperature; loaded EZ-Tn5 was stored at −20 °C. 0.125 μl of loaded EZ-Tn5, 24.875 μl TD buffer (FC-131–1096, Illumina), and 5 μl water were added to 20 μl purified cDNA and incubated at 55 °C for 5 min then brought to 10 °C. 12.5 μl NT (FC-131–1096, Illumina) was immediately added to stop the reaction. Tagmented cDNA was amplified in a 500 μl PCR with Q5 polymerase (M0491L, New England Biolabs): 100 μl 5× buffer, 10 μl 10 mM dNTPs (N0447L, New England Biolabs), 5 μl Q5 polymerase, 85 μl Q5 High GC Enhancer, 0.5 μM N70x (Nextera Index Kit v2 Set A, TG-131-2001, Illumina; or equivalent from Integrated DNA Technologies), 0.5 μM i50x (E7600S, New England Biolabs; or equivalent from Integrated DNA Technologies). Libraries were amplified until the early exponential phase (~16–18 cycles): 72 °C 3 min; 95 °C 30 s; cycle: 95 °C 10 s, 55 °C 30 s, 72 °C 30 s; 72 °C 5 min. PCR reactions were pooled (if the 500 μl reaction had been split into multiple PCR tubes), and 100 μl was taken, purified twice with AMPure XP beads (A63881, Beckman Coulter), and eluted in 30 μl water. The resulting libraries could be sequenced directly (non-depleted) or rRNA-depleted using Cas9.

rRNA depletion of PETRI-seq libraries by Cas9

PETRI-seq libraries were subjected to rRNA depletion by the canonical Cas9::crRNA::tracrRNA tripartite complex67. To prepare tracrRNA, a dsDNA template (C2425) was made by PCR of pWJ4023 with Q5 polymerase and primers W2031 and W2032. Alternatively, C2425, which is 96 bases long, could be made by ordering and annealing complementary oligonucleotides. C2425 was used for T7 in vitro transcription with the TranscriptAid T7 High Yield Transcription Kit (K0441, Thermo Scientific) by combining the following in a 20 μl reaction: 4 μl 5× reaction buffer, 2 μl 100 mM ATP, 2 μl 100 mM CTP, 2 μl 100 mM GTP, 2 μl 100 mM UTP, 1 μl T7 RNAP enzyme, 700 ng C2425. The reaction was incubated at 37 °C for 4 h, during which a white precipitate became visible. 1 μl DNase I (AMPD1, Millipore Sigma) was added and incubated at 37 °C for an additional 45 min to digest the DNA template. RNA was purified using the Norgen Biotek Total RNA purification kit (37500, Norgen) to generate J703 (tracrRNA). To prepare crRNAs, 45 μM W2034 (T7 promoter) and 45 μM W2035-W2141 (separate reaction for each) were combined in annealing buffer (10 mM Tris pH 7.5, 50 mM NaCl, 1 mM EDTA), heated to 95 °C for 5 min then cooled to room temperature. 1 μl of annealed product was used for T7 in vitro transcription (K0441, Thermo Scientific) by adding the following: 4 μl 5× reaction buffer, 2 μl 100 mM ATP, 2 μl 100 mM CTP, 2 μl 100 mM GFP, 2 μl 100 mM UTP, 1 μl T7 RNAP enzyme, 6 μl water. The reaction was incubated at 37 °C for 4 h, during which a white precipitate became visible. One microlitre DNase I (AMPD1, Millipore Sigma) was added and incubated at 37 °C for an additional 45 min to digest the DNA template. Each resulting crRNA was purified using the Norgen Biotek Total RNA purification kit (37500, Norgen). To anneal tracrRNA to crRNA, 70 pmol tracrRNA (J703) and 70 pmol crRNA were combined in 10 μl of annealing buffer (10 mM Tris pH 7.5, 50 mM NaCl, 1 mM EDTA), heated to 95 °C for 5 min, then slowly cooled to room temperature to yield 7 pmol μl−1 tracrRNA::crRNA. All 59 annealed tracrRNA::crRNA were pooled in an equimolar ratio. rRNA was depleted by combining the following in a 50 μl reaction: 5 μl 10× reaction buffer (Z03386, GenScript), 0.74 μl tracrRNA::crRNA (5.18 pmol total tracrRNA::crRNA; 0.088 pmol of each), 10 μl Cas9 (Z03386, GenScript), 49–80 ng PETRI-seq library. The reaction was incubated at 37 °C for 90 min then purified twice with 1× AMPure beads. The library concentration was measured using the Agilent Bioanalyzer High Sensitivity DNA Kit (5067-4626, Agilent). Libraries were sequenced for 75 cycles (58 R1, 17 R2) using the NextSeq 500/550 High Output Kit v2.5 (20024906, Illumina). rRNA-depleted libraries were loaded at ~2× the recommended concentration to account for cleaved rRNA fragments without both Illumina adapters. Non-depleted libraries were loaded at ~1.5× the recommended concentration.

Our rRNA depletion strategy is in theory very similar to DASH68, which amplifies cDNA after Cas9 cleavage. Further optimization could include testing the differences between these techniques.

Fluorescence-activated cell sorting

metG* cells were transformed with fluorescent transcriptional reporters for rmf, cysK and mdtK promoters64 (Supplementary Table 6). Overnight cultures were diluted 1:100 (rmf, cysK, dual markers) or 1:50 (mdtK) into SM9 then grown for 3.5 (rmf), 3.17 (cysK), 2.5 (mdtK), 5.8 (dual cysK/rmf), or 5.25 (dual cysK/mdtK) hours at which point they reached OD600 of 0.401 (rmf), 0.336 (cysK), 0.238 (mdtK), 0.35 (dual cysK/rmf), or 0.59 (dual cysK/mdtK). Cells were centrifuged at 5,000g for 5 min and then resuspended in PBS. Using an S3e Cell Sorter (12007058, Bio-Rad), cells were analysed, gated by forward scatter versus side scatter (Bio-Rad ProSort; Extended Data Fig. 3h,i), then sorted by GFP expression (high or low) into PBS. Sorted cells were counted (CFU), inoculated into antibiotic-containing SM9, and/or used for ScanLag. For the protein expression assay shown in Extended Data Fig. 3f,g, metG*-pcysK-GFP cells were transformed with pBbA6C-RFP63 (35290, Addgene), which expresses RFP under the LlacO1 promoter. Overnight cultures were diluted 1:50 into SM9 + 500 µM IPTG then grown for 4.6 h (OD600 = 0.182). We noted that because of the stochasticity in metG* lag times, time to reach a particular OD600 after dilution from an overnight varied substantially by experiment. Cells were resuspended in PBS, analysed, gated by forward scatter versus side scatter (Extended Data Fig. 3h,i), then sorted by GFP and RFP expression into SM9 (Extended Data Fig. 3f). GFP- or RFP-only controls were used to compensate for overlapping emissions of GFP and RFP. Because the cells come out of the sorter in PBS, the final composition of medium was 71% SM9 in PBS. Cell density was too low to successfully pellet the cells and change medium. Cells were grown at 37 °C with shaking (300 rpm) and analysed at given timepoints over the next day. OD600 stayed below 0.01 for the duration of the experiment, likely reflecting high purity of cells with long lag times and possibly reduced growth rate from 29% PBS. To see RFP expression in persister cells (Extended Data Fig. 3g), populations were gated on high GFP (cysK+). To make Extended Data Fig. 3, FlowJo 10.8.1 was used. Distributions were plotted using the layout editor. metG* cells without a fluorescent protein expression vector were used to subtract background autofluorescence (for Extended Data Fig. 3g).

Generating crRNA library with CALM

E. coli crRNA libraries were generated using CALM, as previously described3 with one minor modification. C2185 (insert library) and C2184 (backbone) were assembled by Gibson reaction (E2621L, New England Biolabs) and transformed into MG1655 cells without pWJ445 (dCas9 plasmid). The library was grown in LB broth containing 50 μg ml−1 kanamycin at 37 °C until OD600 reached ~0.4 (about 4 h). The resulting library was pelleted and used to miniprep an assembled crRNA plasmid library, labelled M60. Sequencing of this library confirmed high coverage of the genome with each gene targeted on average by 56 unique crRNAs.

CRISPRi screen sample collection

Supplementary Table 5 includes details about each CRISPRi library. Generally, electrocompetent cells were prepared from the parental strain containing either pWJ445 (pTet-dCas9) or pBbS6C-dCas9 (pLlacO1-dCas9). Different inducers (IPTG or aTc) were used for replicates to avoid inducer-specific effects. ~200 ng of M60 (crRNA plasmid library) was electroporated with 50 μl of cells using the MicroPulser (Bio-Rad) set to the default E. coli program 1 (1 mm, 1.8 kV, 6.1 ms). Cells were recovered in 500 μl SOC medium for 1.5 h at 37 °C. A small volume was taken to count colonies on LB agar with or without selection antibiotics (kanamycin + chloramphenicol) in order to calculate transformation efficiency and ensure adequate library coverage. The crRNA library contains at most 500,000 crRNAs3, and transformations routinely yielded >100 million transformants. The remaining volume of recovered cells was transferred to a flask containing 225 ml SM9 with kanamycin + chloramphenicol. For YqgE/RFP overexpression screens (SBC308-SBC315; Supplementary Table 5), 500 μM IPTG was added 2 h later to induce yqgE or RFP. For all libraries, cells were grown at 37 °C until they reached an OD600 of ~0.4 (3–4 h). The resulting library (L) was divided for either late exponential/stationary dCas9 induction before lag phase assays or exponential dCas9 induction before exponential assays. Samples were also taken from L for SBC95 and SBC125 (Supplementary Table 5). Late exponential/stationary induction allowed for minimal loss of essential crRNAs (area under the curve = 0.48–0.52 for wild-type/metG* before and after induction), so all genes could be assayed in lag phase.

Lag phase assays

For lag phase assays, 20 ml of cell library (L) was pelleted and resuspended in 20 ml SM9 containing kanamycin, chloramphenicol, and either anhydrotetracycline (aTc; 20 nM) or isopropyl-β-d-thiogalactopyranoside (IPTG; 500 μM). Centrifugation likely was not necessary here but was done every time for consistency. The culture was grown overnight (14–22 h) to induce dCas9 and reach stationary phase. The next day, cells were pelleted and resuspended in the same volume of SM9 without inducer or antibiotics. Samples were taken for SBC96, SBC126, SBC191, SBC205, SBC316, SBC308, SBC310, SBC312 and SBC314 (Supplementary Table 5).

For lag outgrowth, resuspended cells were either inoculated into 500 ml semisolid SM9 (~80 million cells per litre for SBC101, SBC131, SBC210, SBC318; ~2 × 109 cells per litre for SBC100, SBC130) or diluted 100× into SM9 broth (SBC102, SBC132, SBC192, SBC309, SBC311, SBC313, SBC315). Outgrowth times are shown in Supplementary Table 5.

For lag antibiotic treatment, resuspended cells were diluted 100× into SM9 containing 200 μg ml−1 ampicillin and 5 μg ml−1 ciprofloxacin and incubated for the indicated amount of time (Supplementary Table 5). After antibiotic treatment, cells were pelleted, washed in PBS, then resuspended in SM9 and inoculated into 500 ml semisolid SM9 medium at a density ~50 × 106 cells per litre. After 2 days, cell samples were collected for SBC98, SBC99, SBC128, SBC129, SBC207, SBC208, SBC209 and SBC317. Semisolid medium was used to minimize interclone competition.

Exponential assays

For exponential assays, dCas9 was induced in exponential phase, and cells were not grown overnight to stationary phase. The cell library (L) was diluted 200x into SM9 containing kanamycin, chloramphenicol, and 20 nM aTc or 500 μM IPTG. Cells were grown for 3.5–4.5 h (OD600 = ~0.2-0.4). Samples were taken for SBC133 and SBC211 (Supplementary Table 5) and also diluted 100× into SM9 containing kanamycin, chloramphenicol, and 20 nM aTc or 500 μM IPTG. These cells were grown for 3–4 h then sampled for SBC134 and SBC212.

CRISPRi library preparation

Collected cell samples (described above) were pelleted and miniprepped (Qiagen). 400 ng of DNA were amplified in a 60 μl PCR with Q5 polymerase (M0491L, New England Biolabs), 0.5 μM of forward primer (equimolar mixture of W1397, W1398, W1399, W1400), and 0.5 μM of reverse primer (W1699). The reaction was thermocycled as follows: 98 °C 30 s; 10× 98 °C 10 s, 55 °C 20 s, 72 °C 30 s; 72 °C 2 min. PCR products were purified by double-sided AMPure cleanup (left-side ratio = 0.8×; right-side ratio = 1.4×) then eluted in 40 μl H2O. 2.5 μl of purified DNA was used for a second PCR in 100 μl using Q5 polymerase, 0.5 μM forward primer (CRISPRi_PCR_2_F; Supplementary Table 6), and 0.5 μM reverse primer (CRISPRi_PCR_2_R; Supplementary Table 6). The reaction was thermocycled as follows: 98 °C 30 s; 6× 98 °C 10 s, 55 °C 20 s, 72 °C 30 s; 72 °C 2 min. PCR products were purified by two AMPure cleanups (first 0.9×, then 0.8×) and eluted in 30 μl. The library concentration was measured using the Agilent Bioanalyzer High Sensitivity DNA Kit (5067-4626, Agilent). Libraries were sequenced for 75 cycles using the NextSeq 500/550 High Output Kit v2.5 (20024906, Illumina). Single-end reads are ideal because only Read 1 is useful for mapping crRNAs. However, depending on the forward primer used for PCR 2, as few as 58 cycles can be allocated to read 1.

Antibiotic susceptibility with bortezomib

Bortezomib (5043140001, Millipore Sigma) stock was prepared by dissolving in DMSO. For the assays in Extended Data Fig. 11d,e, single colonies were picked into SM9 containing 1% DMSO and 100 μM bortezomib. Control (– bzmb) cultures were started in the same way but SM9 contained 1% DMSO and no bortezomib. After overnight culture, antibiotic survival was assayed as described in ‘Antibiotic survival assays’, but for lag phase assays, antibiotic-containing SM9 was supplemented with 1% DMSO ± 100 μM bortezomib.

Lag times after bortezomib treatment

For full growth with bortezomib (dotted black line in Fig. 5b), single colonies of metG*sulA cells were picked into 1 ml SM9 containing 1% DMSO and 100 μM bortezomib (5043140001, Millipore Sigma). For bortezomib addition during stationary phase (dotted pink line in Fig. 5b), single colonies of metG*sulA cells were picked into 1 ml SM9. After 20 h, bortezomib was added to 100 μM. For bortezomib treatment during lag phase (dotted green line in Fig. 5b) or not at all (control; grey line in Fig. 5b), single colonies of metG*sulA cells were picked into 1 ml SM9 containing 1% DMSO. After 24 h of growth, all cultures were diluted 100× into SM9 containing either 100 μM bortezomib and DMSO (green line in Fig. 5b) or only 1% DMSO (all other samples). Cells were grown on a plate reader (37 °C with continuous shaking; PowerWave XS2, BioTek) and OD600 measured at 10-min intervals.

Stationary phase translation assay

E. coli cells of the indicated genotype (Fig. 5f) containing pBbS6C-RFP were grown for 24 h in 1 ml SM9. Two-hundred microlitres of overnight culture were transferred to a 96-well plate and supplemented with 2.5 mM IPTG. Cells were grown on a plate reader (37 °C with continuous shaking; Synergy Neo2, BioTek). RFP (570 excitation, 620 emission) and OD600 were measured at 10-min intervals.

metG complementation of metG* mutation

Wild-type or metG* cells carrying either pBbS6C-metG or pBbS6C-metG* (Supplementary Table 6) were grown overnight in 1 ml SM9 containing chloramphenicol. Overnight cultures were diluted 1,000× into SM9 containing chloramphenicol and 500 μM IPTG. These cultures were grown overnight again. The following day, lag phase antibiotic survival was assayed with ampicillin and ciprofloxacin (Extended Data Fig. 9d).

Quantitative proteomics

Overnight cultures of 3 colonies each of MG1655, metG*, and metG*-Δlon-ΔsulA were grown in 1 ml SM9 for ~19 h at 37 °C. Stationary samples were taken directly from the overnight cultures. For wild-type (MG1655) exponential cells (Extended Data Fig. 7), MG1655 overnight cultures were diluted 200× into fresh SM9 and grown for 90 min (final OD600 = 0.2–0.23). For metG* lag/persister cells (Extended Data Fig. 7), metG* overnight cultures were diluted 100× into fresh SM9 and grown for 30 min. OD600 did not increase in that 30 min (replicate 1: ODinitial = 0.069, OD30min = 0.065; replicate 2: ODinitial = 0.07, OD30min = 0.065; replicate 3: ODinitial = 0.072, OD30min = 0.066).

Cells were collected in Eppendorf tubes and washed twice with ice-cold PBS. Cells were then lysed in lysis buffer containing 8 M urea, 0.1 M ammonium bicarbonate, and protease inhibitors (1 mini-Complete EDTA-free tablet). The lysate was cleared by centrifugation at 14,000 rpm for 30 min at 4 °C. The supernatant was transferred to a new tube, and the protein concentration was determined using a BCA assay (Pierce). Subsequently, 10 µg of total protein was subjected to disulfide bond reduction with 10 mM DTT (at 56 °C for 30 min) followed by alkylation with 10 mM iodoacetamide (at room temperature for 30 min in the dark). Excess iodoacetamide was quenched with 5 mM DTT (at room temperature for 15 min in the dark). Samples were then diluted sixfold with 50 mM ammonium bicarbonate and digested overnight at 37 °C with a trypsin/Lys-C mix (1:100). The next day, digestion was stopped by the addition of 1% TFA (final v/v), followed by centrifugation at 14,000g for 10 min at room temperature to pellet precipitated lipids. Cleared digested peptides were desalted on an SDB-RPS Stage-Tip disk69 and dried down in a speed-vac. Peptides were resuspended in 10 µL of 3% acetonitrile/0.1% formic acid and injected onto a Thermo Scientific Orbitrap Fusion Tribrid mass spectrometer using a DIA method for peptide MS/MS analysis.

The UltiMate 3000 UHPLC system coupled with an EASY-Spray PepMap RSLC C18 column was used to separate fractionated peptides with a gradient of 5–30% acetonitrile in 0.1% formic acid over 90 min at a flow rate of 300 nl min−1. After each gradient, the column was washed with 90% buffer B for 10 min and re-equilibrated with 98% buffer A (0.1% formic acid, 100% HPLC-grade water) for 30 min. Survey scans of peptide precursors were performed from 350–1,200 m/z at 120 K FWHM resolution with a 1 × 106 ion count target and a maximum injection time of 60 ms. The instrument was set to run in top speed mode with 3-s cycles for the survey and MS/MS scans. After a survey scan, 26 m/z DIA segments were acquired from 200–2,000 m/z at 60 K FWHM resolution with a 1 × 106 ion count target and a maximum injection time of 118 ms. HCD fragmentation was applied with 27% collision energy, and resulting fragments were detected using the rapid scan rate in the Orbitrap. The spectra were recorded in profile mode.

DIA data were analysed with the MaxDIA software platform within the MaxQuant software environment using a library-free approach70. The search was set up with the reference E. coli proteome database downloaded from UniProt. The false discovery rate (FDR) was set to 1% at the peptide precursor level and 1% at the protein level. Results obtained from MaxQuant were further analysed using the standard pipeline for differential analysis with the DEP package71. Proteins were filtered for inclusion in 2 out of 3 replicates of at least one condition. Data was normalized by variance stabilizing transformation. Missing data was imputed using the MinProb method with q = 0.01. Significantly enriched proteins were defined by alpha = 0.05 and lfc = log2(1.5) (Supplementary Table 2). For the principal components analysis (PCA) in Extended Data Fig. 7c, LFQ intensity (for included samples) was log-transformed and scaled with StandardScaler to centre each protein with mean of 0 and s.d. of 1. Principal components were calculated from all proteins using sklearn72. See next section for PCA and UMAP in Extended Data Fig. 7a,b.

PETRI-seq analysis

Barcode demultiplexing was carried out as previously described2 with the following minor modification73: before extracting the unique molecular identifier (UMI) sequence, PEAR74 was used to merge reads 1 and 2 when they overlapped. Only non-overlapping reads were carried forward because read 2 should contain cDNA sequence, and the end of read 1 should contain barcode 1. Note that this may not apply when sequencing more than 75 cycles. Also, read 2 was trimmed if it matched the reverse complement of the end of read 1, an artefact we think occurs due to hairpin formation. The full pipeline uses trimmomatic75 (v0.33) to filter reads, Cutadapt76 (v1.18) to demultiplex, UMI-tools77 (v0.5.5) to extract UMIs, bwa78 (v0.7.17) to align, and featureCounts79 (v1.6.3) to annotate features.

Seurat (version 4.1.1)80 was used for normalization, dimensionality reduction, and clustering of PETRI-seq data. In brief, the matrices produced by demultiplexing and UMI collapsing were read into a Seurat object. All MG1655 samples in this study (Supplementary Table 3) were combined in the same Seurat object. For Extended Data Fig. 6g–j, a new Seurat object was made with all MG1655 cells plus CFT073 cells; accessory genes only in the CFT073 genome were omitted. For all analysis, rRNA counts were excluded except for Extended Data Fig. 8. Barcodes were filtered for more than 9 and fewer than 1,000 mRNA UMIs. All cells were then downsampled to 38 UMIs using the SampleUMI function (max.umi = 38, upsample = FALSE). UMI counts were log-normalized using the geometric mean of all cell UMI counts as a scale factor. Gene counts were scaled and centred to a mean of 0 and s.d. of 1 (Seurat ScaleData). Principal components were calculated with all genes. For the full cell atlas (Fig. 1f), principal components 1–10 were used to compute UMAP81 coordinates (default parameters) and to find neighbouring cells. Clusters were found using default parameters82 (Louvain algorithm) at resolution 0.32. For hipA7 cells alone (Extended Data Fig. 5a), principal components 1–5 were used to find neighbouring cells, and clusters were found at resolution 0.1. For extended stationary (6-day) wild-type cells with metG* and standard wild-type cells (Extended Data Fig. 5f), principal components 1–6 were used to find neighbouring cells, and clusters were found at resolution 0.16. For the full atlas downsampled to ~30 mRNA UMIs (Extended Data Fig. 5i), cells were downsampled as described with max.umi = 30. Then, only cells with exactly 29 or 30 mRNA UMIs were kept in the Seurat object. Cells were processed and clustered as with the full atlas (10 principal components, resolution = 0.34). For CFT073 cells alone (Extended Data Fig. 6d,e), CFT073 accessory genes were included, principal components 1–10 were used to find neighbouring cells, and clusters were found at resolution 0.31. For CFT073 with MG1655 cells (Extended Data Fig. 6g), principal components 1–10 were used to find neighbouring cells, and clusters were found at resolution 0.38.

To project proteomics samples with scRNA-seq (Extended Data Fig. 7a,b), proteomics samples were log-normalized using the geometric mean of the scRNA-seq library. Proteomics samples were then merged into a single Seurat object with downsampled, log-normalized scRNA-seq data. This entire Seurat object was scaled and centred with ScaleData. For the PCA (Extended Data Fig. 7a), loadings were extracted from the scRNA-seq Seurat object and used to project all cells and proteomic samples. For UMAP (Extended Data Fig. 7b) and clustering (Extended Data Fig. 7a,b), principal components 1–6 were used with n.neighbors = 50 and k.param = 50. Clusters were found at resolution 0.32. If principal components 1–10 and default n.neighbors and k.param are used (as with scRNA-seq alone), then the stationary and lag proteomes form their own cluster; exponential proteomes still cluster with early exponential transcriptomes.

When the full cell atlas is shown or analysed (Fig. 1 and Extended Data Figs. 1n–u and 5i), only cell samples relevant up to that point in the text are shown or included in expression analysis, but all cells (as listed in Supplementary Table 3) were used to compute principal components, UMAP coordinates, and cell clusters. See Supplementary Table 3 for details of which cell samples are included in each figure.

For Extended Data Fig. 8, which defines transcriptional deficiency, different thresholds were used to retain cells with very low mRNA counts. Specifically, in Extended Data Fig. 8f,g, all cells with total RNA above a library-specific threshold (between 16–64 total UMIs) were retained. By contrast, Fig. 2f includes only cells with at least 10 mRNAs, as these are the cells used for UMAP and clustering. rRNA depletion is also important to consider when defining transcriptional deficiency (Extended Data Fig. 8d–f). Extended Data Fig. 8d,e only shows libraries that were not subjected to rRNA depletion. In Extended Data Fig. 8f, all libraries are included with a slightly different threshold used for depleted or non-depleted libraries.

Differential expression analysis from scRNA-seq

To find genes differentially expressed between cell clusters or pre-defined populations, a custom pipeline combining edgeR83 and Seurat’s FindMarkers tool was used. EdgeR was used with TMM normalization to calculate log2(fold change) from pseudobulk samples. Pseudobulk samples are calculated by summing all counts from all single cells of a given population; single-cell transcriptomes are taken before downsampling. For P values, limma’s84 rankSumTestWithCorrelation (the default for Seurat’s FindMarkers; two-sided Wilcoxon–Mann–Whitney) was used with downsampled, log-transformed single-cell data as input. Using downsampled cells for significance testing gives the result most consistent with the centred edgeR data. Total UMI counts by sample (before and after downsampling) are provided in Supplementary Table 4.

CRISPRi analysis

CRISPRi sequencing reads were aligned to reference genomes for E. coli then to S. aureus (used for library manufacturing3). Functional spacers were identified as described3 based on presence of an “NGG” PAM sequence. Only functional E. coli spacers were used for downstream analysis.

For lag and exponential comparisons, spacer abundance post-outgrowth was compared to pre-outgrowth. For lag antibiotic treatment, spacer abundance post-antibiotics plus outgrowth was compared to after outgrowth only. For simplicity, consider post-antibiotics as a “post” condition relative to outgrowth only (‘pre’) in the description below.

To compare CRISPRi libraries, spacers were filtered to remove any position with fewer than 10 reads in both post and pre libraries. Then, the frequency of each spacer was calculated by dividing the number of reads for that spacer by the total number of reads in the library. A pseudocount of 0.99 was added to spacers with 0 counts. Based on the assumption that spacers targeting intergenic regions outside of promoters would not affect phenotypes, we used these intergenic spacers to normalize spacer abundance in both pre and post libraries. All spacer frequencies were normalized as follows (for exemplified spacer labelled A):

$${{\rm{enrichment}}}_{{\rm{A}}}={\log }_{2}{({{\rm{spacer}}}_{{\rm{A}}}/{{\rm{GM}}}_{{\rm{null}}})}_{{\rm{post}}}-{\log }_{2}{({{\rm{spacer}}}_{{\rm{A}}}/{{\rm{GM}}}_{{\rm{null}}})}_{{\rm{pre}}}$$

where GMnull is the geometric mean of the frequencies of all null (intergenic) spacers.

To calculate gene enrichment scores, mean enrichment scores for spacers aligned within or directly upstream of each gene were calculated. The number of spacers mapping to each gene varied, which was important for computing gene enrichment P values. The null distribution of enrichment scores for intergenic spacers was randomly sampled to generate pseudogenes with n spacers. This was repeated to generate 100,000 simulated replicates for every relevant n. To assign a P value, each gene enrichment score was compared to a simulated null distribution with the same number of spacers as included for that gene. Significantly enriched or depleted genes were found based on a FDR of 0.1 using the Benjamini–Hochberg method85. In most cases, significant genes were further filtered by significance in multiple replicates. For hipA7, only one replicate of each screen was done. In Fig. 4b, we wanted to highlight top hits, so we used Bonferroni correction to threshold only the most significant hits. In other figures, we used FDR of 0.1 for the hipA7 replicates.

For each gene in the CRISPRi screens, enrichment and significance were calculated independently for crRNAs targeting the antisense or sense strand; the strand with strongest effect (by significance then enrichment score) is determined and included in each relevant figure. For Fig. 4b,c, strand is noted in source data. In Fig. 4d and Extended Data Fig. 10, the strand shown is antisense unless otherwise noted. To assess depletion of essential genes (Extended Data Fig. 9f,g), we used a stringent set of genes found to be essential in all of four previous datasets86.

Pathway enrichment with iPAGE

To find pathways significantly correlated with principal component 1 or 2 of the cell atlas (Extended Data Fig. 1o), we divided the principal component loadings into high (greater than 0.025) and low (less than −0.025) groups. We ran iPAGE87 in discrete mode (up, down) with maximum P value of 0.001 and independence = 0. Redundant pathways were filtered manually, and representative ones are shown.

To find genes enriched in either the early lag or the persister cluster (Extended Data Figs. 1p–u and 5k,l), differential expression analysis was performed as described. Using the Benjamini–Hochberg method85, an FDR of 0.01 was applied to select significantly over- and under-represented genes. These significance scores were used as input for iPAGE87, which was run in discrete mode (up, down, neutral) with maximum P value of 0.05 and independence = 0. Pathways were then filtered further by P values indicated in figure legends. Redundant pathways and those indicating enrichment of a single operon were filtered manually; representative terms are shown. To compute mean expression, the AverageExpression function in Seurat was used, and mean gene expression values were averaged for all genes in a given set.

To find genes enriched in each persister type versus early exponential cells (Extended Data Figs. 2 and 6k) differential expression analysis was performed as described. Using the Benjamini–Hochberg method85, an FDR of 0.01 was applied to select significantly over- and under-represented genes. These significance scores were used as input for iPAGE87, which was run in discrete mode (up, down, neutral) with maximum P value of 0.1 and independence = 0. Pathways shown in Extended Data Fig. 2a–c(iv) are the top (P < 0.0005) non-redundant gene sets overexpressed in each persister type; when these pathways are also significant for another persister type, they are also labelled in that panel (*P < 0.05, **P < 0.005, ***P < 0.0005).

To find genes enriched in persister cell groups versus tetracycline-treated cells (Figs. 3c and 4e and Extended Data Figs. 9a,b,e,i–n and 10a,b), differential expression analysis was performed as described. An FDR of 0.05 was applied85 to select over- and under-represented genes. These significance scores were used as input for iPAGE87, which was run in discrete mode (up, down, neutral) with maximum P value of 0.05 and independence = 0. To select pathways to show in Fig. 3c and Extended Data Fig. 9b, loadings of principal component 1 were also used as input for iPAGE (continuous mode, 8 bins, max_p = 0.005, independence = 0). The intersection of gene sets enriched in the top PC1 bin (P < 0.001) and gene sets over-represented in at least one persister type (P < 0.05; based on marker genes versus tetracycline-treated cells) are shown in Fig. 3c and Extended Data Fig. 9b. Redundant pathways were manually filtered.

To find genes enriched after CRISPRi perturbation (Extended Data Fig. 10c–e), an FDR of 0.1 was applied85 using P values (computed as described above from null distribution) to select over- and under-represented genes. These significance scores were used as input for iPAGE87, which was run in discrete mode (up, down, neutral) with maximum P value of 0.05 and independence = 0. Gene sets were subsequently filtered by P value < 0.01. For each gene, antisense- and/or sense-targeting crRNAs can be significant. For this analysis, only one strand was used for input; if one or both were significant in a single direction, the gene was assigned that direction, but if the antisense and sense cRNAs were significant in opposite directions, then the one with the higher enrichment score (absolute value) was used. Pathways significantly enriched in >3 of 5 metG* replicates, >1 of 3 wild-type replicates, or in 1 hipA7 replicate are shown in Extended Data Fig. 10c–e.

To find pathways enriched in proteomics data (Extended Data Fig. 12a–d), differential protein analysis was performed as described with DEP. Fold changes were used as input for iPAGE87, which was run in continuous mode with 5 bins and maximum P value of 0.05.

Identification of E. coli gene homologues

The proteins of E. coli K12 (n = 4,136) were downloaded from Biocyc88 version 25.1. To identify potential homologues, E. coli proteins were searched against genomes of diverse microbial organisms. A total of 2,421 genomes downloaded from JGI IMG were included in the search89. These genomes were selected based on quality (High Quality = “Yes” in IMG portal) and optimization for biodiversity. They represent 39 phylum, 68 classes and 168 orders (Supplementary Table 7) based on GTDB taxonomic classification90. The protein search was done using DIAMOND91 under specific parameters: “blastp -e 1e-10 -k 10000000 --query-cover 66 --subject-cover 50 -b8 -c1”. Protein hits with maximum e-value equal to E-10 were kept as potential homologues for downstream analysis. For each protein, the number of genomes with homologues was counted and converted to frequency, shown in Extended Data Fig. 11q.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.