Abstract
Deuterostomes are one major group of bilaterians composed by hemichordates and echinoderms (collectively called Ambulacraria) and chordates. Comparative studies between these groups can provide valuable insights into the nature of the last common ancestor of deuterostomes and that of bilaterians. Indirect development of hemichordates, with larval phases similar to echinoderms and an adult body plan with an anteroposterior polarity like chordates and other bilaterians, makes them a suitable model for studying the molecular basis of development among deuterostomes. However, a comprehensive, quantitative catalogue of gene expression and chromatin dynamics in hemichordates is still lacking. In this study, we analysed the transcriptomes and chromatin accessibility of multiple developmental stages of the indirect-developing hemichordate Ptychodera flava. We observed that P. flava development is underpinned by a biphasic transcriptional program probably controlled by distinct genetic networks. Comparisons with other bilaterian species revealed similar transcriptional and regulatory dynamics during hemichordate gastrulation, cephalochordate neurulation and elongation stages of annelids. By means of regulatory networks analysis and functional validations by transgenesis experiments in echinoderms, we propose that gastrulation is the stage of highest molecular resemblance in deuterostomes and that much of the molecular basis of deuterostome development was probably present in the bilaterian last common ancestor.
Similar content being viewed by others
Main
Deuterostomes comprise a major group of animals that share several morphological and developmental features, including radial cleavage, deuterostomy, formation of the mesoderm by enterocoely, mesoderm-derived skeletal tissues and the presence of pharyngeal openings or slits1,2,3. Although their monophyly has recently been challenged by phylogenomic studies4,5, current consensus still groups them together as one of two major bilaterian groups alongside protostomes. Recent comparative studies of gene expression dynamics suggests conservation of key developmental mechanisms throughout metazoan life cycles6; therefore, as a major bilaterian group, knowledge of the genetic controls underlying deuterostome development can potentially provide insights into the nature of both the deuterostome and bilaterian last common ancestors (LCA). Finding appropriate deuterostome models is thus key to shedding light into the genetic mechanisms underlying the diversification of body plans and life strategies in this group in relation to the deuterostome and the bilaterian LCAs.
There are three deuterostome groups: chordates, echinoderms and hemichordates. The last two form a group called Ambulacraria based on molecular data, as the sister group to chordates7,8,9. Despite morphological differences of the adult body plans, several hemichordate groups undergo indirect development with free-swimming, planktonic larvae such as those of most echinoderms10,11. The aforementioned molecular similarities are paralleled by similar patterning mechanisms, complemented with conserved expression patterns of key regulatory genes, which have been shown to control the formation of the dorsoventral (DV) axis, mesoderm, nervous system and primordial germ cells in echinoderm and hemichordate embryos and larvae12,13,14,15,16,17. In contrast to pentaradial symmetry in adult echinoderms18,19, hemichordates share bilateral symmetry with chordates, with a distinctive tripartite body plan of proboscis, collar and trunk20,21. They also show conserved structures such as pharyngeal gill slits, which is probably a synapomorphy of deuterostomes22,23, as well as a complex organization of the nervous system far from a simple nerve net24. Importantly, several hemichordate structures exhibit conserved patterning mechanisms during development. These include the overall anteroposterior (AP) axis regionalization25 and the AP patterning of the ectoderm in the proboscis and collar by signalling organizing centres similar to those of the vertebrate central nervous system26 and the formation of coelomic cavities from an endomesoderm region27,28. The patterning mechanism of the DV polarity is also similar, although the DV axis is inverted in chordates17,29. Other functionally analogous structures with uncertain homology include a collar nerve cord which follows a morphogenesis process similar to chordate neurulation30,31 but using different genes for neuronal patterning32. These similarities could potentially extend to the cis-regulatory machinery33. Whether the deuterostome ancestor was a direct or indirect developer has been controversial but, in any case, echinoderm pentaradial symmetry was probably secondarily acquired and indirect-developing hemichordates could help reconstruct the origins and the ancestor of this group and elucidate this question.
However, despite remarkable efforts in elucidating the genomic complexity and patterning mechanisms in hemichordates, a comprehensive understanding of the hemichordate developmental programs and how the genome is temporally regulated is still lacking. This also includes assessing whether a transcriptionally comparable developmental period can be recognized in deuterostomes and to what extent the developmental gene regulatory networks (GRNs) are conserved among different deuterostome groups34. While there have been efforts to begin the functional characterization of cis-regulatory elements in hemichordates such as Saccoglossus kowalevskii33,35, the current lack of cis-regulatory information for indirect-developing hemichordates, as well as cis-regulatory data spanning the whole genome, precludes further comparisons to assess the degree of conservation among different deuterostome lineages.
To address these questions, we characterized developmental transcriptomes and chromatin accessibility of Ptychodera flava, an indirect-developing hemichordate, based on its new chromosomal-level genome assembly36. We discovered that P. flava exhibits a biphasic developmental control with two major transcriptional programs which are regulated by different classes of transcription factors (TFs). Comparative developmental transcriptomics and GRN analyses suggest that gastrulation represents the stage of highest molecular resemblance of deuterostomes. Additionally, we validated our data exploring the cis-regulatory activities of open chromatin regions in the vicinity of the gene encoding FoxA, one of the main TFs with conserved gene expression across similar deuterostome stages and also a driver for the gastrula cis-regulatory landscape based on our analyses. Overall, our results suggest that gastrulation is controlled by conserved gene regulatory circuits in deuterostomes and align with recent efforts in characterizing the dynamics of gene expression across metazoan development and life cycles, providing a global view for understanding the conservation of the molecular basis underlying deuterostome development.
Results
Two transcriptional programs during P. flava development
Indirect-developing hemichordates go through several morphologically distinct developmental stages. After two days of embryogenesis, late gastrulae of P. flava hatch and undergo a series of tornaria larval stages that stay in the water column for months before metamorphosing into a worm-like body plan (Fig. 1a). We analysed gene expression profiles using RNA-seq at 16 stages, covering the entire P. flava life cycle, to understand the dynamics of gene expression during its development (Fig. 1a,b and Supplementary Fig. 1). Overall, we identified 28,413 genes expressed across all stages (Supplementary Data 1). Similarities at the whole-transcriptome level suggest two independent programs of gene expression at early and late development, with late gastrula (LG) and the Muller stage (the newly hatching tornaria) as the transitional point (Fig. 1c and Supplementary Fig. 1g,h). Differential gene expression analysis (DGEA) throughout development identified 23,726 dynamically expressed genes (hereafter referred to as temporal genes), which represent 83% of all genes detected and can be grouped into 22 clusters on the basis of their temporal gene expression profiles (Fig. 1d, Supplementary Fig. 1I and Supplementary Data 1). Several of these gene clusters clearly correspond to their deployment at different developmental phases (for example, clusters 1 and 2 match to maternal factors used in early embryogenesis; clusters 6–8 are activated during embryogenesis; clusters 15–19 are generally deployed during the larval phase; clusters 20–22 are predominantly used during metamorphosis and in adults). We noted that genes expressed at the Muller stage (Fig. 1d) include clusters deployed mainly during gastrulation (clusters 12 and 13) and later larval stages (clusters 15 and 16). This observation reinforces that Muller stage represents a turning point between embryonic and larval transcriptional states, since its transcriptomic profile is essentially using embryonic-specific genes but starting to shift to a ‘larval-like’ transcriptomic program. Gene ontology (GO) enrichment analysis of these clusters revealed major developmental processes taking place at the time of peak expression (Supplementary Data 2). For example, we found GO terms related to RNA processing, transcriptional activation and mitosis during early embryogenesis (for example, clusters 2 and 5–8), organ development during gastrulation and early larval periods (cluster 10), as well as cilia and axon formation at gastrula and larval stages (clusters 13–16) (Supplementary Fig. 2 and Supplementary Data 3). These results corroborate our understanding of the temporal controls of transcriptional programs orchestrating different developmental processes and are in line with our current understanding of metazoan developmental transcriptomics6.
To understand how the transcriptional programs are controlled, we focused on the usage of TFs during P. flava development (Fig. 1e and Extended Data Fig. 1). For an approach to identify the underlying gene expression regulatory machinery, we classified the TFs into major classes (following animalTFDB and automated orthology relationships with vertebrates) and looked at their expression throughout development (Supplementary Data 1 and 4). Chi-squared analysis of TF class proportions in each developmental stage suggests a gradual but biphasic program of TFs, with over-representation of E2F, HMG (including Sox), homoeodomain Pou and T-box TFs that are maternally deposited and used at early stages and later classes such as bZIP, bHLH, DM and RXR (Fig. 1e); some classes such as homoeodomain and bHLH showed a more gradual incorporation to late development. The timing of these TFs generally corresponds to their roles in development in other species. For example, the Pou and T-box (including Brachyury) classes, mediate pluripotency and cell fate specification in early stages and gastrulation37,38,39,40; we also observed TFs such as one steroidogenic factor with an increase in gene expression in larval stages, which suggests steroidogenic signalling and gonad maturation function at later stages41. Additionally, several TF classes exhibit high agreement in co-expression with temporal gene clusters throughout development, supporting that the two distinct developmental programs of embryonic and larval development are controlled by different TF combinations (Fig. 1f). Notably, RXR (class 22) and thyroid hormone (THR) (class 27) pathways are involved in metamorphosis in a variety of animals42,43 and they are co-expressed with clusters 20 and 21, genes predominantly expressed during P. flava metamorphosis (red asterisks in Fig. 1f). This observation suggests a similar role of RXR and THR functions during P. flava metamorphosis. In sum, these results indicate that the two major developmental programs of P. flava are controlled by two sets of TFs to perform temporal processes known in metazoan development, including other deuterostomes.
Gastrulation is the most similar stage in deuterostomes
Next, we sought to understand the transcriptional dynamics of P. flava in the context of deuterostome evolution. For this, we compared the transcriptome of P. flava against those of four other deuterostomes, plus one protostome as an outgroup (Fig. 2; see Methods and Extended Data Fig. 1 for a graphical summary; Extended Data Figs. 2–8, Supplementary Figs. 3 and 4, Supplementary Note 1 and Supplementary Data 5–8). Comparing the transcriptional dynamics of P. flava against other deuterostome species revealed similarities at mid and mostly late development. The most similar stages between hemichordate and echinoderms (the sea urchin Strongylocentrotus purpuratus and the crinoid Anneissia japonica) were larval stages (Fig. 2a), with genes enriched in signal transduction and metabolism (Extended Data Figs. 2 and 3). This suggests that the LCA of Ambulacraria possessed larval stages. We noted that the hemichordate early gastrula transcriptome bears high resemblance to those of the sea urchin mesenchyme blastula and early gastrula stages; the former stage represents the initial gastrulation process in sea urchins when the primary mesenchyme cells ingress into the blastocoel44. Genes in common between these stages were enriched in neurogenesis and germ layer fate commitment (Extended Data Figs. 2b,c and 3b,c). Comparison with cephalochordates (Branchiostoma lanceolatum and Branchiostoma floridae) showed that hemichordate mid development has higher resemblance to cephalochordate neurulation than to mid or late developmental stages of echinoderms, when comparing the three lineages using a set of 3,192 common orthologues predicted to have been present in the deuterostome LCA (Fig. 2a,b, Extended Data Fig. 6 and Supplementary Data 7). This suggests that the set of predicted deuterostome LCA orthologues might have retained their gastrulation transcriptional dynamics in hemichordates and cephalochordates. Genes expressed in common between hemichordate and cephalochordate mid development were enriched in neuron migration and cell fate commitment but also AP patterning (Extended Data Figs. 4 and 5), probably reflecting the relative timing of comparable developmental events such as neurogenesis and mesoderm/endoderm separation. Hemichordate and cephalochordate late development stages also showed similarities and were enriched in genes related to metabolism and signal transduction (Extended Data Figs. 4 and 5).
To better understand the underlying molecular mechanisms driving these similarities in gene expression, we looked at the dynamics of TFs in these species. We noticed a trend of similar expression of Pou, HMG and T-box classes in early and mid development and bZip and Homeobox in late development (Supplementary Fig. 4a,b), roughly matching similar observations outside deuterostomes45. Transcriptional dynamics of pairwise orthologous TFs also showed similarities at late but also early and mid development (Supplementary Fig. 4c,d). Using a set of 234 orthologous TFs predicted to have been in the deuterostome LCA, we detected groups of TFs with distinct patterns of temporal dynamics across deuterostomes (Supplementary Fig. 4e–g, Extended Data Fig. 7 and Supplementary Data 8). The vast majority of these TFs retain their relative timing of expression during development and we noticed a group of TFs with high expression in ambulacraria gastrulation and cephalochordate neurula stages, including foxa, nkx2.1, nkx2.2, brachyury and foxg (Fig. 2c and Extended Data Figs. 7 and 8); some have been deployed differently in different deuterostomes (such as otp in echinoderms). The observation that amphioxus neurula stages are transcriptomically similar to hemichordate gastrula probably reflects the exact timings of comparable developmental events; for example, neurogenesis and separation of mesoderm and endoderm occur at the mid gastrula in P. flava and during neurulation in amphioxus46,47. Transcriptomic similarity between hemichordate gastrula and amphioxus neurula may be attributed to core sets of TFs regulating the conserved developmental events in the two lineages.
Comparing P. flava against the annelid Owenia fusiformis as an outgroup of deuterostomes revealed similarities between hemichordate gastrula stages and the annelid gastrula, elongation and early larva stages (Fig. 2d and Extended Data Fig. 8a), with genes related to skeletal morphogenesis, neuron fate commitment and AP patterning (Extended Data Fig. 8b,c and Supplementary Data 6). Hemichordate larval stages also showed similarity to O. fusiformis mitraria and competent stages, with similar patterns of signal transduction and metabolism (Extended Data Fig. 8d–f). We also observed the maximal similarity of orthologous TF expression around gastrulation and elongation (Extended Data Fig. 8g). The maximal similarity between a deuterostome and a protostome being in gastrulation aligns with our observations that the transcriptomic age indexes (TAI) of gastrula stages in P. flava are relatively older than the other stages (Fig. 2e), supported also by the observation that genes deployed during embryogenesis were enriched in more ancient genes (Fig. 2f). Overall, these results suggest that gastrulation is the stage with the highest degree of molecular resemblance of deuterostomes and this conservation may have predated deuterostome origins.
Cis-regulation analysis during P. flava gastrulation
Owing to the high degree of gene expression resemblance we observed during deuterostome gastrulation, we focused our study on cis-regulation of gene expression in this period of P. flava development. We identified open chromatin regions (OCRs) in the P. flava genome using assay for transposase-accessible chromatin sequencing (ATAC-seq)48 at four time points: early blastula (EB), early, mid and late gastrula (LG) (Fig. 1a, indicated with the red dots). We identified 82,807 OCRs throughout gastrulation, mostly located in ±20 kilobase (kb) vicinity of genes, predominantly within intergenic regions or introns, followed by promoters (Extended Data Fig. 9a,b and Supplementary Data 9 and 10). Binding motifs of trans-regulators such as developmental TFs appear more frequently outside promoters and in distal elements (Supplementary Fig. 5). Nearly 70% of the annotated genes have at least one OCR associated, with ~50% associated with two or more OCRs (Extended Data Fig. 9c). We observed a trend of OCR accessibility in accordance with expression levels of nearby genes (Extended Data Fig. 9d), suggesting that cis-regulation generally occurs as transcriptional activation during P. flava gastrulation. Genes annotated as trans-developmental (with GO terms in transcription or development, as similarly defined in ref. 49; Methods) tend to be associated with cis-regulatory elements (CREs), with more OCRs per gene than housekeeping genes (Extended Data Fig. 9e,f). This is in agreement with previously observed trends towards complexity of cis-regulatory architecture in genes orchestrating metazoan development49,50,51.
We identified 17,488 dynamically accessible OCRs throughout gastrulation, which were categorized into six clusters (Fig. 3a). These six clusters were enriched with specific DNA-binding motifs of major TF classes that matched those controlling gene expression during development in other metazoan lineages (Fig. 3b, Extended Data Fig. 9g and Supplementary Fig. 6). While there is overlap between blastula and gastrula OCRs, a substantial amount of OCRs are found specifically in blastula or during gastrulation. Furthermore, these stage-specific OCRs show enrichment for different motifs. For example, motifs of T-box TFs (including BRACHYURY, EOMES, TBR and TBX) are enriched in the OCRs at the EB and early gastrula stages. On the other hand, Sox-binding motifs are enriched in OCRs after gastrulation occurs. This dynamic is consistent with the transcriptome data, in which T-box and Sox factors are highly expressed during embryogenesis, with T-box predominantly present before gastrulation and Sox more abundant during gastrulation (Fig. 1e). Other examples include blastula binding motifs of E2F and the p53/63/73 family which are related to cell division (the latter being a known germ layer regulator52,53,54) and gastrula motifs of Dlx, Fox, Gsc and Pax which are known to be expressed in different germ layers during P. flava gastrulation (Extended Data Fig. 9)12,17,55,56. These results suggest that the highly expressed TF genes are functionally active in regulating target genes (TGs) via the identified OCRs. We further crossed these data with early temporal genes and also found binding motifs for early TFs, such as T-box factors, E2F and p53/63/73 (Supplementary Fig. 7), aligning with the early factors found on the basis of direct clustering of the ATAC-seq data (Fig. 3b). This concordance between the binding sites of temporal OCRs and the binding sites in the proximity of temporal genes suggests that potential regulatory features of temporal genes can be inferred on the basis of their surrounding cis-regulatory landscape.
Inferred regulatory networks during P. flava gastrulation
Developmental processes are driven by complex interactions between many genes constituting developmental GRNs. To assess putative cis-regulatory interactions in P. flava, we used analysis algorithm for networks specified by enhancers (ANANSE)57, a computational tool that uses cis-regulatory and gene expression data to generate GRN graphs and predict TF–TG interactions, focusing on activating TFs. These graphs are composed of nodes (the TFs and TGs) and edges (the interactions between TFs and TGs, whose score values are calculated using an additive model of gene expression, OCR accessibility, distance between OCR and its corresponding transcription start site (TSS) and motif enrichments within the OCR (Fig. 3c). We generated two independent graphs, one of EB and one of LG (Figs. 1a and 3c and Supplementary Data 11 and 12). The number of genes represented in both graphs is similar, both in number of TFs, number of effector genes and the relative connectivity of genes (Extended Data Fig. 9h–k). These graphs however showed structural differences as reflected by changes in the centrality of TFs (how much a given gene is embedded within each GRN, related to the number of connections to other genes)58 (Fig. 3d,e), in connections per TG, in number of self-regulating TFs and a net gain of trans-developmental centrality in LG (Extended Data Fig. 9l,m). Overall, TF classes showing higher centrality corresponded to the dominantly expressed TFs at the respective stage (Figs. 1e,f and 3e). For example, E2F and Pou classes show high centrality in the EB network and bZIP, ETS, GATA, Homeobox and Forkhead factors show higher centrality in the LG network. This suggests a different set of core TFs constituting GRNs at each stage. Common TFs found in the two networks (64% of all TFs in either graph) are enriched in terms of regulation of gene expression, which is reassuring of our analysis, with overall overlapping GO terms and functional categories in the two stages but higher enrichments for cell migration and organogenesis in LG-specific TFs (Supplementary Fig. 8a–e). The genes that are being controlled in the two networks also show differences. Out of their top TGs (defined as top % highest in degree), they share 615 (Supplementary Fig. 8f), with top targets exclusive of EB enriched in chromatin regulation (Supplementary Fig. 8g) and top targets of LG enriched in developmental and morphogenetic processes (Supplementary Fig. 8I). Interestingly, more than half of the gastrula-specific TGs were detected as postgastrulation temporal genes in our clustering analysis compared to EB-specific TGs (Fig. 1d and Extended Data Fig. 9n). These observations suggest that, in comparison to the EB network, the LG network is orchestrated by a different core of TFs controlling genes related to organogenesis and signalling, including genes that are poised to participate in larval development.
By the time of LG, the three germ layers have segregated into morphologically distinct territories in P. flava. On the basis of the known germ layer-specific expressing genes at the LG stage of P. flava (Supplementary Data 13), we subdivided the LG network into circuits that potentially mediate germ layer development (Extended Data Fig. 10a–c). Within those circuits, we noticed that TF genes showing high centrality tend to express broadly, while low centrality genes often display restricted expression patterns in the respective germ layer (Extended Data Fig. 10d,e). For example, xbp1 (ref. 13), soxb2 and otx are the most connected genes within the ectoderm circuit and are known to express in broad ectodermal domains at the LG stage. Similarly, genes show high centrality in the mesoderm network (for example, foxg and foxf) and endoderm circuit (for example, soxb1a and foxa) are expressed broadly in the mesoderm-derived protocoel and endoderm-derived archenteron of the LG embryos, respectively. Genes that are known to express in restricted regions in the corresponding germ layer, such as gsx (posterior ectoderm59), foxq2 (anterior ectoderm12), msx (dorsal mesoderm13) and tsg (anterior endoderm13), show low centrality in the respective circuit (Extended Data Fig. 10a–e). Other examples of TF genes with low centrality displaying restricted germ layer-specific expression patterns are shown in Extended Data Fig. 10e. The agreement between spatial expression patterns of TF genes and their hierarchical positions in GRNs coincides with the expected roles of TFs in specifying germ layers and the subdomains constituting the specific germ layer, with highly central TFs enriched in morphogenetic and developmental terms compared to low centrality TFs (Extended Data Fig. 10f). This suggests that this metric can help identify potential downstream TFs. We finally used ANANSE influence57 to detect factors responsible for the transition between the network states of EB and LG and found a number of well-known developmental TFs with known roles in germ layer-specific fate induction (Fig. 3f, Extended Data Fig. 10g and Supplementary Data 14). Altogether, the inferred networks based on RNA-seq and ATAC-seq data can potentially serve as useful models for understanding network evolution through cross-species comparison and for future functional validations.
Conservation of cis-regulatory mechanisms in deuterostomes
We reasoned that the transcriptional similarities we observed across species, including the expression dynamics during hemichordate gastrulation, might be attributed to conserved cis-regulatory interactions. We aimed to validate the conservation of deuterostome cis-regulatory interactions of foxa due to the following reasons. We observed that foxa is among the set of TFs with conserved expression during gastrulation and/or neurulation (Fig. 2c), appearing as a contributor to the transcriptomic similarities of deuterostome gastrula/neurula stages (Fig. 4a–d) and with high expression during mid development in the three deuterostome lineages (Fig. 4e). Foxa is also one of the most central TFs in the hemichordate LG network (Fig. 4f) and it is found among the top influential factors in the transition to the LG cis-regulatory landscape according to our ANANSE analysis (Fig. 4g). In addition, in the hemichordate gastrula ANANSE network, we observed putative cis-regulatory interactions between foxa and other orthologous genes that constitute two deuterostome regulatory kernels, the echinoderm endomesoderm kernel and the chordate axial mesoderm kernel60,61,62 (Fig. 4h,i and Supplementary Data 15). To assess the degree of conservation of the cis-regulatory information of foxa, we generated reporter constructs covering one to four potential CREs which we detected in our ATAC-seq data upstream of the P. flava foxa gene (Fig. 4j), the expression of which is initiated at the EB stage in the presumptive endomesoderm17,56. We used sea urchin embryos as a reporter assay system and observed that P. flava foxa CREs 2–4 are able to drive specific expression patterns, with CRE4 being sufficient to drive reporter expression in ectoderm and endoderm, which recapitulates the endogenous expression of sea urchin foxa56,63 (Fig. 4k–m and Supplementary Data 16). On the other hand, CRE1 is probably a general repressor, as the reporter construct CRE1–4 (containing CRE1 and active CRE2-4) lost transcriptional activity (Supplementary Data 16). The repressive role of CRE1 is consistent with the decrease of its openness after the EB stage (Fig. 4j), when the gene starts to express zygotically (Fig. 4e). These results validate the biological activities of the OCRs revealed by ATAC-seq and suggest similar regulatory inputs to the foxa gene during sea urchin and hemichordate gastrulation. This suggests that the conservation of the endomesoderm kernel could be pushed further to the common ancestor of Ambulacraria, reinforcing the notion that part of this kernel already existed in the bilaterian ancestor64. Overall, these results suggest that the transcriptomic similarities during deuterostome gastrulation could be the product of conserved cis-regulatory interactions that can be dated back to the common ancestor of ambulacrarians or deuterostomes.
Discussion
In this study, we generated an extensive catalogue of developmental transcriptomics of the indirect-developing hemichordate P. flava. The developmental processes of P. flava is underlied by transcriptional programs exhibiting a biphasic behaviour, with LG and the Muller tornaria larva as the turning point (Fig. 1). The two major transcriptional programs are controlled by distinct groups of TFs as suggested by our co-expression analysis with temporal genes. Comparative transcriptomics with other deuterostome species and analyses of transcriptome ages suggest that gastrulation represents the stage of highest molecular resemblance within deuterostomes (Fig. 2).
Overall, our comparisons suggest that deuterostome gastrulation and larva stages are underlain by gene expression dynamics that predate the origin of this group, building on previous knowledge from recent and contemporary efforts6,34. Similar transcriptional dynamics between ambulacrarian larval stages indicates that similar transcriptional programs are deployed for constructing the larval forms, supporting a common indirect-developing ambulacrarian ancestor. Similarities between hemichordate gastrulation and cephalochordate neurulation but also annelid elongation stages (Fig. 2b,c; also shown in ref. 6) suggest that the underlying mechanisms around gastrulation predate the origin of deuterostomes. Similarities in ambulacrarian gastrulation seem to be mostly related to cell fate commitment, especially mesoderm and neurogenesis, suggesting that these mechanisms might have not changed during ambulacrarian evolution. Overall, the potential regulators orchestrating these transcriptional programs (the TFs) also appear to be largely conserved in this group, as shown by the early and late deployment of HMG and bZIP factors among deuterostome species as well as numerous orthologous TFs which share their temporal expression profile. Taken together with similarities to protostome late development, differences in TF expression timing (during or after gastrulation) between ambulacraria and cephalochordates might indicate that several processes taking place during postgastrulation stages in chordates and protostomes could have been deployed at an earlier time in development in ambulacraria. This aligns with our current knowledge that some developmental mechanisms occur after gastrulation in chordates, such as neurogenesis and the segregation of mesoderm and endoderm46,47
Overall, our findings suggest that deuterostome gastrulation features conserved mechanisms that date back to the common ancestor of Bilateria and are not a deuterostome-exclusive innovation. Our findings also suggest that deuterostome transcriptional programs are largely similar in composition despite the morphological disparity and differences in their life cycles, with some temporal adjustments probably accounting for the differences in the timing of developmental mechanisms between stages and for the differences in their lifestyles. A more parsimonious interpretation, as interphyla comparison can sometimes be difficult due to the taxonomical definition of phyla and the taxon sampling of choice65,66, is that only mid embryogenesis and larval stages appear more conserved when comparing distantly related phyla due to differences in their lifestyles because of the fundamental mechanisms of development at play during those stages such as epithelial–mesenchymal transition, AP patterning and germ layer specification. More comparative data on hemichordates and other deuterostomes at the tissue or cell-type levels, are required to explore the corresponding cell types that contribute to the overall transcriptomic similarities we observed. These could potentially hint at the degree of conservation of gene programs and the conservation of homologous mechanisms between cell types, germ layers and other structures throughout development across species.
Our comparative observations align with our cis-regulatory analyses. Our network analyses suggest the switch to the gastrulation cis-regulatory program consists of rewiring and addition of TFs to the GRN already at play during blastula, which translate into changes in TF/trans-developmental centrality and an increase in TF self-regulation (Fig. 3). This feedback property is expected to stabilize distinct regulatory states for germ layer specification. The effector genes also reflect the underlying processes at play in each stage, with an embedding of morphogenetic and postgastrulation effector genes in gastrula. Additionally, TF genes exhibiting higher centrality correspond to their broader spatial expression patterns, consistent with their hierarchical roles within the individual germ layer network. These results also reflect our previous observations on transcriptomic similarity across species, with a set of core TFs that have retained their timing of expression in deuterostomes detected as influential in the transition to the gastrulation cis-regulatory landscape. This suggests network inference based on RNA-seq and ATAC-seq data can serve as useful models for understanding network evolution through cross-species comparison and for future functional validations. Together, our observations suggest that the deuterostome LCA had a similar temporal program of TF genes which probably constituted hardwired GRNs, which aligns with findings about conserved GRN mechanisms despite genome rearrangements in echinoderms34. One putative example could be the endomesoderm kernel, for which we predicted potential cis-regulatory interactions. The ability of CREs near the hemichordate foxa gene to drive gene expression in sea urchin embryos suggests conserved regulatory inputs in the two lineages, reinforcing the idea that the endomesoderm kernel emerged before the onset of Ambulacraria. Similarly, the prediction of potential cis-regulatory interactions between genes of the chordate axial mesoderm kernel in P. flava suggests that these interactions might predate the origin of chordates and were present at least in the LCA of deuterostomes, although further functional studies are required to test this hypothesis. Of note is to mention that, since hemichordates lack a homologous structure to the notochord9,67, new questions arise as to which structure might deploy potential cis-regulatory interactions such as those of the axial mesoderm kernel in hemichordates. Potential candidates could be the endodermal region and the mesoderm-derived protocoel, since some of the orthologous genes within the network are known to express in these embryonic territories12,15,17. More validation using in situ hybridization is needed to corroborate the co-expression of genes belonging to this GRN. Likewise, we acknowledge the limitation of our methods as these analyses are based on leveraging gene expression and chromatin accessibility to reconstruct TF–TG interactions and cannot reconstruct potential protein–protein interactions (such as those between ligands and receptors). Likewise, although TF binding motif association is based on sequence homology-informed automated transfer similar to standard practices68, it is done in a lenient fashion, with multiple motifs assigned per TF and the highest scoring one taken into account on a TF/TG case-by-case basis, which may hamper true 1:1 associations such as those derived from ChIP–seq experiments69. Nevertheless, the transcriptomic analysis and the inferred networks provide a global view and working models for future studies. Overall, these comparative analyses depict a more complete picture of the conserved cis-regulatory interactions during gastrulation of deuterostome LCA.
In summary, this study presents a high-throughput, genome-wide, multimodal catalogue of quantitative regulatory genomics in hemichordates. In doing so, it establishes the basis for comparative studies in deuterostome lineages with a global characterization of the embryonic and larval development of an indirect-developing hemichordate. All together our data effectively generate hypotheses for exploring the similarities and divergences of the molecular mechanisms underlying deuterostome and metazoan development.
Methods
Sample collection and animal husbandry
P. flava adults were collected from Penghu Islands, Taiwan. Methods for spawning induction and embryo culture were described previously46. For RNA-seq, total RNA of each developmental stage was isolated from one to three biological replicates using the RNeasy Micro Kit (Qiagen). Library preparation for stranded polyA-based messenger RNA was then performed using the TruSeq Stranded mRNA Library Prep Kit (Illumina). The multiplexed complementary DNA libraries were subjected to sequencing with 151 base pair paired-end read length on a HiSeq 2500 sequencer (Illumina). The average sequencing depth was 24,861,133 paired reads per sample. For ATAC-seq, embryos at different stages were collected to reach 75,000 cells per sample (~700 early blastulae, ~150 early gastrulae, ~100 mid gastrulae and ~50 late gastrulae). The collected embryos were treated with 0.1% N-acetylcysteine for 10 s to remove the jelly coat, followed by incubating with 8.5 mM of dithiothreitol in seawater containing 5.7 mM of NaOH. The cells were then lysed for the transposition reaction and PCR amplification and the purified libraries were sequenced as described49.
In situ hybridizations
In situ hybridizations were performed using embryos of P. flava at the LG stage, as previously described in refs. 59,70,71.
EGFP quantification
Owing to mosaic incorporation of the construct, which commonly happens, we counted an embryo as positive if it showed at least one cell or region of the embryo with fluorescence signal. For each germ layer and CRE construct, the percentage of EGFP embryos was calculated as the ratio of positive embryos for a given germ layer divided by the total number of EGFP embryos observed. EGFP expression of CRE1–4 was inconsistent and yielded only two positive embryos, which we considered inconclusive.
Gene annotation
Generation of protein sequences
We ran TransDecoder (https://bio.tools/TransDecoder) using the set of longest isoforms per gene of P. flava, querying against the SWISSPROT and Pfam databases to gather evidence at the sequence homology and protein domain level. This evidence was subsequently incorporated into TransDecoder and we retrieved the longest predicted protein sequences with the parameter --single-best-only.
Gene ontology annotation
Using the TransDecoder-predicted set of protein sequences, GO terms were generated using a combination of BLAST2GO and eggNOG annotation software, using RefSeq non-redundant proteins database72 and eggNOG metazoa73 as reference databases, respectively.
Gene ontology enrichment analysis
For every cluster, GO enrichment analysis was performed using the topGO R package74,75 using the whole gene set of P. flava as gene universe and the elim method74.
Trans-developmental and housekeeping classification
A gene was classified as trans-developmental if annotated with any of the following GO terms: GO:0044430, GO:0006412, GO:0003735, GO:0005840, GO:0005842, GO:0030529, GO:0006414, GO:0009119, GO:0009062, GO:0044262, GO:0006725; but none of the GO terms for housekeeping genes. A gene was classified as housekeeping if annotated with any of the following GO terms: GO:0009790, GO:0030154, GO:0043565, GO:0007267, GO:0008380, GO:0003700, GO:0140297, GO:0010467, GO:0010468, GO:0016070; but none of the GO terms for trans-developmental genes.
Transcription factor annotation
We first generated groups of orthology using P. flava, purple sea urchin, Japanese feather star, amphioxus and well-annotated vertebrate species (human and zebrafish) using OrthoFinder76,77 (Supplementary Data 5). Using human and zebrafish queries for TFDB Metazoa78, we transferred TF class annotation to their respective orthogroup and thus to every gene of the other species. Second, we also added to this classification all genes identified as a TF homologue by the GimmeMotifs tool motif2factors (briefly, transferring TF annotation based on orthology evidence using OrthoFinder against a standard set of model species; see gimmemotifs documentation), which was required for ANANSE (see later). We selected ‘top’ TF classes as having high coefficient of variation in the expression datasets of P. flava, S. purpuratus and B. lanceolatum. The resulting TF annotation was used for downstream TF analysis.
Time-series RNA-seq analysis
To quantify gene expression, reads were pseudo aligned against the set of longest transcripts per isoform per gene using kallisto79. RNA-seq counts were corrected for batch effect and quantile normalized using edgeR80,81,82 and RUVSeq83 taking the 10,000 bottom-ranked genes in edgeR as in silico empirical, invariable genes. Pairwise Spearman correlation of stages and replicates was done, clustered and visualized in R84 using base and ComplexHeatmap85 packages. We also clustered the developmental transcriptomes in an unsupervised manner using Pearson correlation with bootstrapping (co-occurrency as described in refs. 86,87). DGEA was performed using DESeq2 (ref. 88) using likelihood ratio test (LRT) and RUV-correction as reduced formula. Genes with an adjusted P < 0.05 (negative binomial LRT) were taken as differentially expressed. For every replicate, variance-stabilized counts were averaged to generate a time-series dataset of every differentially expressed gene throughout development. For every gene, expression data were scaled on a z-score (standard deviation over mean). Temporal clusters were then generated using hierarchical clustering on the scaled dataset, with the following parameters: method = “ward.D2”. Data visualization was done using R base (https://www.r-project.org/), ggplot2 (ref. 89), RColorBrewer90, colorspace91, pheatmap (RRID:SCR_016418) and ComplexHeatmap85 packages under R v.4.0.3.
Transcription factor analysis
For a given TF class X in a given species (P. flava, B. lanceolatum and S. purpuratus) at a given developmental stage Y, we computed (1) the sum of genes of TF class X expressed (>1 transcripts per million) at stage Y; and (2) the sum of transcripts per million (using kallisto92, see earlier) for every expressed gene of class X at stage Y. We used the calculations above to perform a Chi-squared test of abundance of every TF class in every developmental stage of P. flava. To measure correlation between classes and temporal gene expression, we first computed the average expression profile (z-scored) of every temporal cluster. Then, for a given TF class X, we computed the number of TFs of class X with a similar expression profile (z-scored) as that of the stage-specific cluster (with a Spearman correlation >0.6), divided by the number of TFs in class X. We did this for every TF class with every temporal cluster. For motif discovery analysis, we first extracted the OCRs (differential or not) associated with genes of each temporal cluster of gene expression (prelarval development). Then, we performed motif enrichment analysis using the program FindMotifsGenome.pl from the Homer tool suite93.
Re-analysis of other species
Re-analysis of sea urchin
We retrieved RNA-seq reads of S. purpuratus from ref. 94 and mapped them against the Spur5.0 (ref. 95) set of longest isoforms per gene using kallisto92. Read counts were imported to R using the tximport package. Counts were transformed to account for batch effect using edgeR and RUVSeq as explained above and we used edgeR to identify the top variable genes in lieu of performing DGEA because of the lack of replicates in the original dataset (the only sample with two replicates, mesenchyme blastula, was kept as separate replicates for all the analyses). We clustered these highly variable genes in R using hierarchical clustering. We also annotated the TF repertoire of sea urchin and measured TF class composition of each transcriptome as well as measured correlation between classes and temporal gene expression, as explained above for P. flava. Code and full documentation is available at the online repository of the project.
Re-analysis of B. lanceolatum
RNA-seq reads of B. lanceolatum from ref. 49 were mapped against the set of longest isoforms per gene of the currently available transcriptome96 using kallisto92. Read counts were imported to R using the tximport package. We used DESeq2 to identify temporally expressed genes using the negative binomial likelihood ratio test as explained above. We clustered the resulting temporally expressed genes in R using hierarchical clustering to retrieve clusters of temporal genes. We also annotated the TF repertoire of B. lanceolatum and measured TF class composition of each transcriptome as well as measured correlation between classes and temporal gene expression, as explained above for P. flava. Code and full documentation are available at the online repository of the project.
Re-analysis of A. japonica
RNA-seq reads of A. japonica from ref. 19 were mapped against the set of longest isoforms per gene of the currently available transcriptome using kallisto92. Read counts were imported to R using the tximport package. We used DESeq2 to identify temporally expressed genes using the negative binomial likelihood ratio test as explained above. We clustered the resulting temporally expressed genes in R using hierarchical clustering to retrieve clusters of temporal genes. Code and full documentation are available at the online repository of the project.
Re-analysis of B. floridae
RNA-seq reads of B. floridae from ref. 97 were mapped against the set of longest isoforms per gene of the currently available transcriptome using kallisto92. Read counts were imported to R using the tximport package. We used DESeq2 to identify temporally expressed genes using the negative binomial likelihood ratio test as explained above. We clustered the resulting temporally expressed genes in R using hierarchical clustering to retrieve clusters of temporal genes. Code and full documentation are available at the online repository of the project.
Re-analysis of O. fusiformis
We retrieved the published expression data and the temporal gene cluster information of O. fusiformis from the original repository of the reference publication6.
Orthology and gene age assignment
We ran OMA standalone98 using the set of predicted protein sequences of P. flava and a database of proteomes from 24 species (Supplementary Data 5), using standard parameters. The resulting orthogroups matrix was used to assign a gene age to each orthogroup (and therefore to every P. flava gene in a given orthogroup) based on Dollo parsimony using the Count software99. We also parsed the .orthoxml output file of hierarchical orthogroups to assign a gene family ID to every P. flava, B. floridae, B. lanceolatum, S. purpuratus, A. japonica and O. fusiformis gene. These gene id–gene family ID association tables were used in our comparative analyses (see later). As an alternative method for orthology inference, we ran OrthoFinder76,77 using standard parameters on the same set of proteomes as we did for OMA. We retrieved one-to-one orthologues from the results and also parsed the Orthogroups.tsv output file to assign a gene family ID to every P. flava, amphioxus and sea urchin gene. In addition, we also ran BLASTp best-reciprocal hits between all species of interest (see section on ‘Comparative analysis’ next, including P. flava and B. floridae) to retrieve one-to-one orthologues.
Comparative analyses
Using our OMA output, we retrieved all one-to-one orthologues between P. flava and S. purpuratus, between P. flava and A. japonica, between P. flava and B. lanceolatum, between P. flava and B. floridae, and between P. flava and O. fusiformis For a given pair of species, we used a set of custom wrappers we called comparABle (https://github.com/apposada/comparABle), which incorporate well-established methods in the literature (such as pairwise transcriptomic comparisons and orthogroup overlap; Extended Data Fig. 1; see later), to perform comparative analyses between these pairs of species. Alternatively, we also used the Orthofinder and BLASTp reciprocal best hits outputs to retrieve one-to-one orthologues between these species and run the comparative analysis. For the unified deuterostome comparisons, we used a list of 3,192 1:1:1 orthologues inferred to have been present in the deuterostome LCA in ref. 36. Code and full documentation are available at the online repository of the project.
Pairwise transcriptomic correlations
Using the whole temporal transcriptome datasets of each species, we subsetted these datasets to retrieve only common one-to-one orthologues and calculated distance and correlation metrics (Pearson correlation, Spearman correlation and Jensen–Shannon divergence) between pairs of transcriptomic stages. We calculated the average Jensen–Shannon divergence using bootstrapping of 1,000 iterations with replacement and also performed co-occurrence analysis86,87 using all genes and bootstrapping with replacement using the concatenate of the two datasets.
Identification of genes driving similarities between stages
Using the most closely matched pairs of stages according to the correlations described above, as well as selecting particularly interesting stage pairs, we identified key genes expressed during specific stage combinations across various species. Briefly, for a pair of stages a.i and b.j, we generated a design matrix to compare a.i and b.j against the rest of stages, performed linear modelling on the concatenate dataset of one-to-one orthologues using limma100 and retrieved the top genes with P < 0.05. Because this method finds genes specifically expressed in these two stages and not the others, it effectively retrieves a subset of the genes that drive similarities between stages a.i and b.j in a conservative manner, at the expense of other genes conspicuously expressed which might also be important for these similarities even if they are not captured by this method.
Gene family usage analysis
We compared all possible combinations of clusters of temporal genes between pairs of species by performing orthogroup overlap analysis6,49. For a given cluster in species ‘a’ a.x and a given cluster in species ‘b’ b.y, we counted the number of overlapped orthogroups between a.x and b.y (the number of genes from a given orthogroup in clusters a.x and b.y) and subjected these to upper-tail hypergeometric test. Using the information of the pairs of clusters sharing the highest amount of orthogroups, we retrieved the genes from P. flava belonging to those orthogroups and performed functional category analysis using the clusters of orthologous groups annotation derived from eggNOG (see earlier).
Comparative analyses using TFs
We generated reciprocal best hits of P. flava versus B. lanceolatum and P. flava versus S. purpuratus using BLASTp with standard parameters, and used these as proxy for one-to-one orthologues as an alternative method to OMA or OrthoFinder. Upon seeing similar trends regardless of using OMA or reciprocal best hits, we used reciprocal best hits to identify conserved TFs between the two pairs of species. We subsetted the expression datasets to keep the one-to-one TFs and performed the same distance metric correlations between pairs of species. Using the information on the most similar stages we proceeded with linear modelling to retrieve the TF genes driving similarities between stages as explained above.
Gene age enrichment and TAI analyses
We performed gene age enrichment analysis using contingency tables for every cluster and gene age. Results were considered significant when post hoc Chi-squared test P value was <0.05. We performed TAI analysis using the R package myTAI and the flatline test.
ATAC-seq analysis
ATAC-seq data analyses were performed as described in ref. 101, using standard pipelines48,49. Briefly, reads were aligned with Bowtie2 using a de novo sequenced, HiC-guided assembly of the genome of P. flava (deposited; ‘Data availability’). We generated a dataset of peaks using Macs2 (ref. 102) for peak calling and Bedtools103 to quantify the accessibility of each chromatin region. Differential analysis was performed using DESeq2 v.1.18.0 in R v.3.4.3 (ref. 88) using LRT. An adjusted P = 0.05 was set as a cutoff for statistical significance of the differential accessibility of the chromatin in ATAC-seq peaks. Motif enrichment was calculated using the program FindMotifsGenome.pl from the Homer tool suite93. The k-means clustering of the ATAC-seq signal was performed using R84 and visualized using ComplexHeatmap85. OCRs were assigned to the closest TSS using bedtools closestbed and standard parameters. Categorization of OCRs into distal, proximal, intronic and so on, as well as de novo motif finding and motif enrichment analysis, was done using the program annotatePeaks from the Homer tool suite93.
Cis-regulatory reporter assays
The ATAC-seq data revealed four potential regulatory elements (REs) locating within 3 kb upstream of the foxa coding sequence. PCR products containing 4, 3, 2 or 1 REs were cloned into the Hugo’s lamprey construct vector35,104 upstream of an egfp coding sequence and an SV40 polyadenylation sequence using In-Fusion Snap Assembly master mixes (Takara). To prepare the injection mixtures, 4 ng of reporter constructs (PCR products), 1.2 μl of 1 M KCl, 154 ng of carrier DNA (HindIII-digested S. purpuratus genomic DNA) and 5 μg of Dextran Alexa 555 were added into nuclease-free water (Invitrogen) to make 10 μl of mixture. About 4 pl of this mix was introduced into S. purpuratus zygotes and the EGFP signal was observed at blastula and gastrula stages.
Cis-regulatory networks
We computed two networks of cis-regulation (EB and LG) using ANANSE57. We initially generated a custom database of TF–domain association using GimmeMotifs motif2factors105, tailored for the genome annotation of P. flava using the proteomes and TF annotation of Homo sapiens and Danio rerio as reference, as well as the proteomes of several deuterostome species (including echinoderm and hemichordate) and the protostome O. fusiformis as outgroup to provide additional phylogenetic signal for the orthology. Next, we recentred the ATAC-seq OCR co-ordinates around the summit of the peak of chromatin accessibility using samtools mpileup to obtain a dataset of summit-centred peaks. For each developmental stage, we first ran ANANSE binding to create a genome-wide profile of TF binding across regions of open chromatin using the .bam alignment of the ATAC-seq and the set of summit-centred peaks and then ran ANANSE network to create a network of TFs and TGs using the binding profile and the RNA-seq expression data. The resulting network was imported to R and analysed using iGraph. We subsetted the network to retrieve edges of the top 5% (ANANSE score value, ‘prob’, above quantile 0.95) and subsequently computed network size, number of connections per TG and per TF class centrality using custom wrappers for iGraph functions in R. Germ layer genes were retrieved from in situ hybridization data on hemichordates in the scientific literature and assigned to the newest annotation using BLAST best-reciprocal hits (Supplementary Data 11). Genes from the endomesoderm/paraxial mesoderm kernels were retrieved from the scientific literature60,61,106,107 and used as query on our OMA orthology run, our Orthofinder orthology run or our set of BLAST reciprocal best hits against the newest annotation of P. flava a BLAST best-reciprocal hit against the newest annotation of P. flava. We then subsetted the postprocessed LG graph (used for all of our analysis, with the top 5% of interactions) querying for the respective orthologues of the aforementioned genes in P. flava. For each of the two queries to the LG graph we also computed a probability distribution of randomly finding as many genes connected as queried. This probability distribution was computed as 100,000 iterations of random subsampling of the postprocessed LG graph, taking into account both TFs and effector genes, as the latter are naturally less connected than TFs because of the directed nature of these graphs. Graphs were pruned for visualization using a custom R iGraph wrapper to keep the top three-to-five interactions per gene and visualized using the iGraph R package.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
P. flava genome assembly used in this work is publicly available at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA747109. The version described in this paper is JASXRY010000000. Raw and processed sequencing data were deposited in GEO (GSE207835). All information for Supplementary Data is publicly available via Figshare at https://figshare.com/account/home#/projects/189303 (ref. 108) and http://hdl.handle.net/10261/359692 (ref. 109). Figures are publicly available via GitHub at https://github.com/apposada/ptychodera_cisreg_development (ref. 110).
Code availability
All code is publicly available via GitHub at https://github.com/apposada/ptychodera_cisreg_development (ref. 110).
References
Satoh, N. Chordate Origins and Evolution: The Molecular Evolutionary Road to Vertebrates (Academic, 2016).
Gee, H. Across the Bridge: Understanding the Origin of the Vertebrates (Univ. Chicago Press, 2018).
Nanglu, K., Cole, S. R., Wright, D. F. & Souto, C. Worms and gills, plates and spines: the evolutionary origins and incredible disparity of deuterostomes revealed by fossils, genes, and development. Biol. Rev. Camb. Philos. Soc. 98, 316–351 (2023).
Simakov, O. et al. Deeply conserved synteny and the evolution of metazoan chromosomes. Sci. Adv. 8, eabi5884 (2022).
Kapli, P. et al. Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria. Sci. Adv. 7, eabe2741 (2021).
Martín-Zamora, F. M. et al. Annelid functional genomics reveal the origins of bilaterian life cycles. Nature 615, 105–110 (2023).
Halanych, K. M. et al. Evidence from 18S ribosomal DNA that the lophophorates are protostome animals. Science 267, 1641–1643 (1995).
Cameron, C. B., Garey, J. R. & Swalla, B. J. Evolution of the chordate body plan: new insights from phylogenetic analyses of deuterostome phyla. Proc. Natl Acad. Sci. USA 97, 4469–4474 (2000).
Röttinger, E. & Lowe, C. J. Evolutionary crossroads in developmental biology: hemichordates. Development 139, 2463–2475 (2012).
Lowe, C. J. Molecular insights into deuterostome evolution from hemichordate developmental biology. Curr. Top. Dev. Biol. 141, 75–117 (2021).
McClay, D. R. Evolutionary crossroads in developmental biology: sea urchins. Development 138, 2639–2648 (2011).
Röttinger, E. & Martindale, M. Q. Ventralization of an indirect developing hemichordate by NiCl2 suggests a conserved mechanism of dorso-ventral (D/V) patterning in Ambulacraria (hemichordates and echinoderms). Dev. Biol. 354, 173–190 (2011).
Röttinger, E., DuBuc, T. Q., Amiel, A. R. & Martindale, M. Q. Nodal signaling is required for mesodermal and ventral but not for dorsal fates in the indirect developing hemichordate, Ptychodera flava. Biol. Open 4, 830–842 (2015).
Chang, Y.-C. et al. Regulatory circuit rewiring and functional divergence of the duplicate admp genes in dorsoventral axial patterning. Dev. Biol. 410, 108–118 (2016).
Fan, T.-P., Ting, H.-C., Yu, J.-K. & Su, Y.-H. Reiterative use of FGF signaling in mesoderm development during embryogenesis and metamorphosis in the hemichordate Ptychodera flava. BMC Evol. Biol. 18, 120 (2018).
Lin, C.-Y., Yu, J.-K. & Su, Y.-H. Evidence for BMP-mediated specification of primordial germ cells in an indirect-developing hemichordate. Evol. Dev. 23, 28–45 (2021).
Su, Y.-H. et al. BMP controls dorsoventral and neural patterning in indirect-developing hemichordates providing insight into a possible origin of chordates. Proc. Natl Acad. Sci. USA 116, 12925–12932 (2019).
Formery, L. et al. Molecular evidence of anteroposterior patterning in adult echinoderms. Nature 623, 555–561 (2023).
Li, Y. et al. Genomic insights of body plan transitions from bilateral to pentameral symmetry in Echinoderms. Commun. Biol. 3, 371 (2020).
Urata, M. & Yamaguchi, M. The development of the enteropneust hemichordate Balanoglossus misakiensis KUWANO. Zool. Sci. 21, 533–540 (2004).
Swalla, B. J. in Principles of Developmental Genetics (ed. Moody, S.) 114–128 (Academic, 2007).
Simakov, O. et al. Hemichordate genomes and deuterostome origins. Nature 527, 459–465 (2015).
Lowe, C. J., Clarke, D. N., Medeiros, D. M., Rokhsar, D. S. & Gerhart, J. The deuterostome context of chordate origins. Nature 520, 456–465 (2015).
Andrade López, J. M., Pani, A. M., Wu, M., Gerhart, J. & Lowe, C. J. Molecular characterization of nervous system organization in the hemichordate acorn worm Saccoglossus kowalevskii. PLoS Biol. 21, e3002242 (2023).
Lowe, C. J. et al. Anteroposterior patterning in hemichordates and the origins of the chordate nervous system. Cell 113, 853–865 (2003).
Pani, A. M. et al. Ancient deuterostome origins of vertebrate brain signalling centres. Nature 483, 289–294 (2012).
Green, S. A., Norris, R. P., Terasaki, M. & Lowe, C. J. FGF signaling induces mesoderm in the hemichordate Saccoglossus kowalevskii. Development 140, 1024–1033 (2013).
Kaul-Strehlow, S. & Stach, T. A detailed description of the development of the hemichordate Saccoglossus kowalevskii using SEM, TEM, histology and 3D-reconstructions. Front. Zool. 10, 53 (2013).
Lowe, C. J. et al. Dorsoventral patterning in hemichordates: insights into early chordate evolution. PLoS Biol. 4, e291 (2006).
Nomaksteinsky, M. et al. Centralization of the deuterostome nervous system predates chordates. Curr. Biol. 19, 1264–1269 (2009).
Miyamoto, N. & Wada, H. Hemichordate neurulation and the origin of the neural tube. Nat. Commun. 4, 2713 (2013).
Kaul-Strehlow, S., Urata, M., Praher, D. & Wanninger, A. Neuronal patterning of the tubular collar cord is highly conserved among enteropneusts but dissimilar to the chordate neural tube. Sci. Rep. 7, 7003 (2017).
Yao, Y. et al. Cis-regulatory architecture of a brain signaling center predates the origin of chordates. Nat. Genet. 48, 575–580 (2016).
Marlétaz, F. et al. Analysis of the sea urchin genome highlights contrasting trends of genomic and regulatory evolution in deuterostomes. Cell Genom. 3, 100295 (2023).
Minor, P. J. et al. I-SceI meganuclease-mediated transgenesis in the acorn worm, Saccoglossus kowalevskii. Dev. Biol. 445, 8–15 (2019).
Lin, C.-Y. et al. Chromosome-level genome assemblies of 2 hemichordates provide new insights into deuterostome origin and chromosome evolution. PLoS Biol. 22, e3002661 (2024).
Nichols, J. et al. Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell 95, 379–391 (1998).
Niwa, H., Miyazaki, J. & Smith, A. G. Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells. Nat. Genet. 24, 372–376 (2000).
Babaie, Y. et al. Analysis of Oct4-dependent transcriptional networks regulating self-renewal and pluripotency in human embryonic stem cells. Stem Cells 25, 500–510 (2007).
Osorno, R. & Chambers, I. Transcription factor heterogeneity and epiblast pluripotency. Philos. Trans. R. Soc. Lond. B 366, 2230–2237 (2011).
Mangelsdorf, D. J. et al. Characterization of three RXR genes that mediate the action of 9-cis retinoic acid. Genes Dev. 6, 329–344 (1992).
Wang, X., Matsuda, H. & Shi, Y.-B. Developmental regulation and function of thyroid hormone receptors and 9-cis retinoic acid receptors during Xenopus tropicalis metamorphosis. Endocrinology 149, 5610–5618 (2008).
Shao, C. et al. The genome and transcriptome of Japanese flounder provide insights into flatfish asymmetry. Nat. Genet. 49, 119–124 (2017).
Hardin, J. D. & Cheng, L. Y. The mechanisms and mechanics of archenteron elongation during sea urchin gastrulation. Dev. Biol. 115, 490–501 (1986).
Adryan, B. & Teichmann, S. A. The developmental expression dynamics of Drosophila melanogaster transcription factors. Genome Biol. 11, R40 (2010).
Lin, C.-Y., Tung, C.-H., Yu, J.-K. & Su, Y.-H. Reproductive periodicity, spawning induction, and larval metamorphosis of the hemichordate acorn worm Ptychodera flava. J. Exp. Zool. B 326, 47–60 (2016).
Carvalho, J. E. et al. An updated staging system for cephalochordate development: one table suits them all. Front. Cell Dev. Biol. https://doi.org/10.3389/fcell.2021.668006 (2021).
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Marlétaz, F. et al. Amphioxus functional genomics and the origins of vertebrate gene regulation. Nature 564, 64–70 (2018).
Irimia, M. et al. Extensive conservation of ancient microsynteny across metazoans due to cis-regulatory constraints. Genome Res. 22, 2356–2367 (2012).
Bogdanović, O. et al. Dynamics of enhancer chromatin signatures mark the transition from pluripotency to cell specification during embryogenesis. Genome Res. 22, 2043–2053 (2012).
De Rosa, L. et al. p63 Suppresses non-epidermal lineage markers in a bone morphogenetic protein-dependent manner via repression of Smad7. J. Biol. Chem. 284, 30574–30582 (2009).
Yang, A. et al. p63, a p53 homolog at 3q27-29, encodes multiple products with transactivating, death-inducing, and dominant-negative activities. Mol. Cell 2, 305–316 (1998).
Santos-Pereira, J. M., Gallardo-Fuentes, L., Neto, A., Acemel, R. D. & Tena, J. J. Pioneer and repressive functions of p63 during zebrafish embryonic ectoderm specification. Nat. Commun. 10, 3049 (2019).
Harada, Y. et al. Embryonic expression of a hemichordate distal-less. Zool. Sci. 18, 57–61 (2001).
Taguchi, S. et al. Characterization of a hemichordate fork head/HNF-3 gene expression. Dev. Genes Evol. 210, 11–17 (2000).
Xu, Q. et al. ANANSE: an enhancer network-based computational approach for predicting key transcription factors in cell fate determination. Nucleic Acids Res. 49, 7966–7985 (2021).
Borgatti, S. P. Centrality and network flow. Soc. Netw. 27, 55–71 (2005).
Ikuta, T. et al. Identification of an intact ParaHox cluster with temporal colinearity but altered spatial colinearity in the hemichordate Ptychodera flava. BMC Evol. Biol. 13, 129 (2013).
Davidson, E. H. & Erwin, D. H. Gene regulatory networks and the evolution of animal body plans. Science 311, 796–800 (2006).
Hinman, V. F., Nguyen, A. T., Cameron, R. A. & Davidson, E. H. Developmental gene regulatory network architecture across 500 million years of echinoderm evolution. Proc. Natl Acad. Sci. USA 100, 13356–13361 (2003).
Holland, L. Z. in Evolutionary Developmental Biology of Invertebrates (ed. Wanninger, A.) 91–133 (Springer, 2015).
Oliveri, P., Walton, K. D., Davidson, E. H. & McClay, D. R. Repression of mesodermal fate by foxa, a key endoderm regulator of the sea urchin embryo. Development 133, 4173–4181 (2006).
Boyle, M. J., Yamaguchi, E. & Seaver, E. C. Molecular conservation of metazoan gut formation: evidence from expression of endomesoderm genes in Capitella teleta (Annelida). EvoDevo 5, 39 (2014).
Levin, M. et al. The mid-developmental transition and the evolution of animal body plans. Nature 531, 637–641 (2016).
Hejnol, A. & Dunn, C. W. Animal evolution: are phyla real? Curr. Biol. 26, R424–R426 (2016).
Lowe, C. J. in Current Topics in Developmental Biology (ed. Gilbert, S. F.) 75–117 (Academic, 2021).
Lambert, S. A. et al. Similarity regression predicts evolution of transcription factor sequence specificity. Nat. Genet. 51, 981–989 (2019).
Agius, P., Arvey, A., Chang, W., Noble, W. S. & Leslie, C. High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLoS Comput. Biol. 6, e1000916 (2010).
Luo, Y.-J. & Su, Y.-H. Opposing nodal and BMP signals regulate left–right asymmetry in the sea urchin larva. PLoS Biol. 10, e1001402 (2012).
Lu, T.-M., Luo, Y.-J. & Yu, J.-K. BMP and Delta/Notch signaling control the development of amphioxus epidermal sensory neurons: insights into the evolution of the peripheral sensory system. Development 139, 2020–2030 (2012).
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
Alexa, A., Rahnenführer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006).
Alexa, A. & Rahnenfuhrer, J. topGO: enrichment analysis for gene ontology. R package version 2.56.0 (2024).
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Hu, H. et al. AnimalTFDB 3.0: a comprehensive resource for annotation and prediction of animal transcription factors. Nucleic Acids Res. 47, D33–D38 (2019).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal RNA-seq quantification with kallisto. Nat. Biotechnol. 34, 525–527 (2016).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
McCarthy, D. J., Chen, Y. & Smyth, G. K. Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleic Acids Res. 40, 4288–4297 (2012).
Chen, Y., Lun, A. T. L. & Smyth, G. K. From reads to genes to pathways: differential expression analysis of RNA-seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research 5, 1438 (2016).
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
Ripley, B. D. The R project in statistical computing. MSOR Connections 1, 23–25 (2001).
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Levy, S. et al. A stony coral cell atlas illuminates the molecular and cellular basis of coral symbiosis, calcification, and immunity. Cell 184, 2973–2987 (2021).
Álvarez-Campos, P. et al. Annelid adult cell type diversity and their pluripotent cellular origins. Nat. Commun. 15, 3194 (2024).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Wickham, H. Ggplot2. WIREs Comput. Stat. 3, 180–185 (2011).
Neuwirth, E. ColorBrewer palettes. R package RColorBrewer version 1.1-3 (2022).
Zeileis, A. et al. colorspace: a toolbox for manipulating and assessing colors and palettes. J. Stat. Softw. 96, 1–49 (2020).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Tu, Q., Cameron, R. A. & Davidson, E. H. Quantitative developmental transcriptomes of the sea urchin Strongylocentrotus purpuratus. Dev. Biol. 385, 160–167 (2014).
Arshinoff, B. I. et al. Echinobase: leveraging an extant model organism database to build a knowledgebase supporting research on the genomics and biology of echinoderms. Nucleic Acids Res. 50, D970–D979 (2022).
Brasó-Vives, M. et al. Parallel evolution of amphioxus and vertebrate small-scale gene duplications. Genome Biol. 23, 243 (2022).
Hu, H. et al. Constrained vertebrate evolution by pleiotropic genes. Nat. Ecol. Evol. 1, 1722–1730 (2017).
Altenhoff, A. M. et al. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res. 49, D373–D379 (2021).
Csurös, M. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26, 1910–1912 (2010).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Gil-Gálvez, A. et al. Gain of gene regulatory network interconnectivity at the origin of vertebrates. Proc. Natl Acad. Sci. USA 119, e2114802119 (2022).
Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Parker, H. J., Bronner, M. E. & Krumlauf, R. A Hox regulatory network of hindbrain segmentation is conserved to the base of vertebrates. Nature 514, 490–493 (2014).
Bruse, N. & van Heeringen, S. J. GimmeMotifs: an analysis framework for transcription factor motif analysis. Preprint at bioRxiv https://doi.org/10.1101/474403 (2018).
Di Gregorio, A. The notochord gene regulatory network in chordate evolution: conservation and divergence from Ciona to vertebrates. Curr. Top. Dev. Biol. 139, 325–374 (2020).
Zhang, T. et al. A single-cell analysis of the molecular lineage of chordate embryogenesis. Sci. Adv. 6, eabc4773 (2020).
Pérez-Posada A. et al. Perez-Posada et al. Supplementary Data August 24. Figshare https://figshare.com/projects/Perez-Posada_et_al_Supplementary_Data_August_24/189303 (2024).
Pérez-Posada, A. et al. Ptychodera flava genomics and transcriptomics data. DIGITAL.CSIC https://doi.org/10.20350/DIGITALCSIC/16340 (2024).
Perez-Posada, A. Apposada/Ptychodera_cisreg_development: V1.0 July 2024 zenodo-ready. Zenodo 10.5281/zenodo.13145821 (2024).
Luttrell, S. M., Su, Y.-H. & Swalla, B. J. Getting a head with Ptychodera flava larval regeneration. Biol. Bull. 234, 152–164 (2018).
Acknowledgements
We thank Y.-C. Chang, Y.-C. Chen, C.-T. Tsai and C. Chou for collecting P. flava samples for RNA-seq and ATAC-seq. We thank the NGS High Throughput Genomics Core, Biodiversity Research Center, Academia Sinica. This study was supported by the Ministry of Science and Technology, Taiwan (grant no. MOST-110-2326-B-001-006 to Y.-H.S. and grant no. MOST-110-2621-B-001-001-MY3 to J.-K.Y.) and Academia Sinica (AS-GC-111-L01 to Y.-H.S. and J.-K.Y.). J.J.T. and J.L.G.S. were supported by the European Research Council (grant agreement no. 740041) and the Spanish Ministerio de Economía y Competitividad (grant no. PID2019-103921GB-I00). J.J.T. was also funded by the Spanish Ministerio de Ciencia e Innovación (grant no. PID2022-141288NB-I00). We thank A. G. Galvez and P. Firbas for advice with the analyses and M. Van der Sande, S. Frolich and S. Van Heeringen, for their advice with ANANSE. A.P.-P. thanks the rest of the authors and all the members of the Skarmeta/Tena laboratory, as well as J. Solana, D. Salamanca-Diaz and M. Rosselló, for their support during the difficult times of this project.
Author information
Authors and Affiliations
Contributions
A.P.-P. performed the RNA-seq, ATAC-seq, TF analyses, network analyses, orthology analyses and comparative analyses and wrote the first and last versions of the manuscript. Che-Yi Lin generated the genome annotation. Ching-Yi Lin and Y.-C.C. performed the in situ hybridizations. T.-P.F. performed the cis-regulatory reporter assays. J.L.G.S., Y.-H.S. and J.-K.Y. conceptualized the project. J.J.T., Y.-H.S. and J.-K.Y. supervised and provided feedback on the analysis and the writing. All authors edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Ecology & Evolution thanks José Martín-Durán, Veronica Hinman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Graphical summary of our comparative workflow.
We followed two general approaches to assess gene expression similarity between pairs of species. Approach 1 (A-E) comprises comparing transcriptional dynamics across pairs of stages. Approach 2 (G-I) comprises comparing gene family usage throughout time in development. For Approach 1, we (A) retrieved all pairwise one-to-one orthologous gene expression data of a pair of species. The two expression matrices, with the same number of rows and arranged in the same order, were normalised and concatenated. From this matrix, we performed (B) principal component analysis and (C) clustering of the different stages. D: we retrieved the top similar pairs of stages using several metrics. For each pair of similar pairs of stages, or for a given pair of stages of interest, we retrieved the genes responsible for driving these high similarities. One way is by performing weighted linear regression and retrieval of least-squared distance residuals (E); in this study (F) we took the two pairs of stages of choice as pseudo-replicates in a non-bespoke linear modelling (two-sided) analysis against the rest of stages of both species. Genes detected as “specifically expressed” in the pair of stages (p-value < 0.1 and logFC in top 10% values) were considered as driving similarities, and used for downstream analysis. Alternatively to the pairwise transcriptional dynamics in Approach 1, we used orthogroups as proxy for gene families and tested the degree of overlapping gene families between pairs of temporal clusters in Approach 2 (G). H: for every pair of modules from each species, we measured the overlap in gene families and tested it against a hypergeometric distribution (one-tailed hypergeometric test). The top pairs of modules enriched in overlapping gene families were used for downstream analysis (I). See Methods for more details.
Extended Data Fig. 2 Comparative with S. purpuratus.
A: Jensen-Shannon Distance (left-most), Pearson correlation (left), Spearman correlation (right), and co-occurrence (right-most) of pairwise comparisons of developmental transcriptomes between P. flava (rows) and sea urchin S. purpuratus (columns) using pairwise one-to-one orthologues. B-C: Gene Ontology terms of genes found specifically expressed in the stages of P. flava early gastrula and S. purpuratus mesenchyme blastula. Bar length represents fold change of observed/expected genes and colour intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). D: Temporal gene clusters detected in the reanalysis of S. purpuratus data. E: Orthology overlap in pairwise comparisons of gene families from temporal clusters. Colour intensity indicates -logpvalue of the one-tailed (upper) hypergeometric test. F: Functional Category Enrichment Analysis between pairs of overlapping developmental clusters of P. flava and S. purpuratus. Asterisks indicate two-tailed Fisher’s Test p < 0.05. G: Gene ontology enriched terms of genes driving similarities between pairs of larval stages in P. flava and S. purpuratus.
Extended Data Fig. 3 Comparative with A. japonica.
A: Jensen-Shannon Distance (left-most), Pearson correlation (left), Spearman correlation (right), and co-occurrence (right-most) of pairwise comparisons of developmental transcriptomes between P. flava (rows) and A. japonica (columns) using pairwise one-to-one orthologues. B-C: Gene Ontology terms of genes found specifically expressed in the stages of P. flava early gastrula and A. japonica hatching and doliolaria stages. Bar length represents fold change of observed/expected genes and colour intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). D: Temporal gene clusters detected in the reanalysis of A. japonica data. E: Orthology overlap in pairwise comparisons of gene families from temporal clusters. Colour intensity indicates -logpvalue of the one-tailed (upper) hypergeometric test. F: Functional Category Enrichment Analysis between pairs of overlapping developmental clusters of P. flava and A. japonica. Asterisks indicate two-tailed Fisher’s Test p < 0.05.
Extended Data Fig. 4 Comparative with B. lanceolatum.
A: Jensen-Shannon Distance (left-most), Pearson correlation (left), Spearman correlation (right), and co-occurrence (right-most) of pairwise comparisons of developmental transcriptomes between P. flava (rows) and B. lanceolatum (columns) using pairwise one-to-one orthologues. B-C: Gene Ontology terms of genes found specifically expressed in the stages of P. flava early and mid gastrula and B. lanceolatum 18hpf stages. Bar length represents fold change of observed/expected genes and colour intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). D: Gene Ontology terms of genes found specifically expressed in the stages of P. flava Metchnikoff larva and B. lanceolatum pre-metamorphic stage. Bar length represents foldchange of observed/expected genes and color intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). E: Temporal gene clusters detected in the reanalysis of B. lanceolatum data. F: Orthology overlap in pairwise comparisons of gene families from temporal clusters. Colour intensity indicates -logpvalue of the one-tailed (upper) hypergeometric test. G: Functional Category Enrichment Analysis between pairs of overlapping developmental clusters of P. flava and B. lanceolatum. Asterisks indicate two-tailed Fisher’s Test p < 0.05.
Extended Data Fig. 5 Comparative with B. floridae.
A: Jensen-Shannon Distance (left-most), Pearson correlation (left), Spearman correlation (right), and co-occurrence (right-most) of pairwise comparisons of developmental transcriptomes between P. flava (rows) and B. floridae (columns) using pairwise one-to-one orthologues. B: Gene Ontology terms of genes found specifically expressed in the stages of P. flava mid gastrula and B. floridae neurula 3 stages. Bar length represents fold change of observed/expected genes and colour intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). C: Gene Ontology terms of genes found specifically expressed in the stages of P. flava Tornaria (Muller) larva and B. floridae larva 2. Bar length represents fold change of observed/expected genes and colour intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). D: Temporal gene clusters detected in the reanalysis of B. floridae data. E: Orthology overlap in pairwise comparisons of gene families from temporal clusters. Colour intensity indicates -logpvalue of the one-tailed (upper) hypergeometric test. F: Functional Category Enrichment Analysis between pairs of overlapping developmental clusters of P. flava and B. floridae. Asterisks indicate two-tailed Fisher’s Test p < 0.05.
Extended Data Fig. 6 Comparative between three deuterostome species using 3,192 orthologous genes.
A: Heatmap of temporal expression of the 3,192 orthologous genes in the three species. B: Pairwise Jensen-Shannon divergence of the three species using the 3,192 orthologous genes. Colour scale is clipped at 10 to highlight inter-species similarity values. C: Pairwise Spearman correlation of the three species using the 3,192 orthologous genes. Colour scale is clipped at 0.8 to highlight interspecies correlation values.
Extended Data Fig. 7 Temporal expression profile of the 234 1:1:1 orthologous TFs clustered as shown in Supplementary Figure S4E, F.
Only names of TFs with predicted gene symbol (either from eggNOG or our list of in situ hybridisation genes) are shown.
Extended Data Fig. 8 Comparative with O. fusiformis.
A: Jensen-Shannon Distance (left-most), Pearson correlation (left), Spearman correlation (right), and co-occurrence (right-most) of pairwise comparisons of developmental transcriptomes between P. flava (rows) and O. fusiformis (columns) using pairwise one-to-one orthologues. B: Gene Ontology terms of genes found specifically expressed in the stages of P. flava mid gastrula and O. fusiformis early larva. Bar length represents foldchange of observed/expected genes and colour intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). C: Gene Ontology terms of genes found specifically expressed in the stages of P. flava mid gastrula and O. fusiformis elongation. Bar length represents foldchange of observed/expected genes and colour intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). D: Orthology overlap in pairwise comparisons of gene families from temporal clusters. Colour intensity indicates -logpvalue of the upper tail hypergeometric test. E: Functional Category Enrichment Analysis between pairs of overlapping developmental clusters of P. flava and B. floridae. Asterisks indicate two-tailed Fisher’s Test p < 0.05. F: Mean Jensen-Shannon Divergence of P. flava and O. fusiformis stages using pairwise orthologous TFs (Bootstrap 1000 iterations). F: Top: Gene Ontology terms of genes from gene families found in common between the temporal clusters of P. flava cluster 03 and O. fusiformis cluster 3. Bottom: Gene Ontology terms of genes from gene families found in common between the temporal clusters of P. flava cluster 04 and O. fusiformis cluster 3. Bar length represents fold change of observed/expected genes and colour intensity represents -log10(qvalue) (one-tailed Fisher’s Exact Test with FDR correction for multiple testing). G: Jensen-Shannon Distance of pairwise comparisons of developmental transcriptomes between P. flava (rows) and O. fusiformis (columns) using pairwise one-to-one orthologous TFs.
Extended Data Fig. 9 Open Chromatin Regions analysis.
A: Cumulative density curve of OCRs relative to the transcription start site of genes. Blue lines indicate quantile thresholds (See legend in B). B: Distribution of open chromatin regions (OCRs) by relative distance to the TSS (left), and by categorization in different genomic features (right). C: Distribution of genes in relation to the number of associated OCRs. D: Boxplots of counts in OCRs of each developmental stage, divided in tiers regarding the level of expression of their associated genes (lo=low expression, med=medium expression, hi=high expression). Centre line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile ranges. A subsample of 1000 data points per boxplot is shown. E: Fraction of trans-developmental genes associated with OCRs. Asterisks indicate Fisher’s test, two-sided. F: Left: boxplot of the number of OCRs associated with housekeeping or trans-developmental genes (N = 400). Centre line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile ranges; points, data points. Right: Cumulative density function of fraction of genes in relation to the number of associated OCRs. X axis: number of associated OCRs; Y axis: fraction of genes in the population. G: Histogram (grey) and density plot (red) of the similarity scores of two hundred motifs inferred de novo from the genome of P. flava, and their most closely similar known vertebrate motifs (from the HOMER databases). H-L: different metrics of each network: (H) number of genes, (I) number of TFs, (J) relative outdegree per TF, (K) in-degree per target gene (two-tailed Wilcoxon’s test), (L) number of self-regulating TFs (TF genes with a binding motif of itself in its proximity). For all boxplots: Centre line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile ranges; points, outliers. M: Scatter plot of the centrality of trans-developmental genes at each stage. X axis: centrality in early blastula; Y axis: centrality in late gastrula. Colour code indicates fold-change from EB to LG. N: Percentage of top target genes from each network that are present in post-gastrulation temporal gene clusters. Y axis: gene percentage (out of the total of top target genes).
Extended Data Fig. 10 Gene network analyses.
A,B,C: Sub-graphs of genes found to be expressed in ectoderm (A), mesoderm (B), and endoderm (C) based on in situ hybridization experiments. Only top incoming and outgoing interactions per gene are shown. D: Relative centrality values for different transcription factors in the network, based on the germ layer where they are expressed. E: In situ hybridizations of TF genes expressed in different germ layers, showing an agreement (inverse relationship) between embryonic territory specificity and centrality in the network. F: (right) beeswarm plot showing density of TFs by relative centrality, binned in three tiers of low, mid, and high centrality based on quantiles. (down) Lists of enriched GO terms for low centrality, low-to-mid centrality, mid-to-high centrality, and high centrality TF genes in the networks. Terms are coloured by tiers of centrality following a colour scheme from beeswarm plot. G: Sub-graph of influential TF genes as calculated by ANANSE influence. Node size represents level of expression, edge thickness represents interaction score. Black gene symbols indicate in-situ hybridisation orthology; grey gene symbols indicate eggNOG-predicted name. Only the top two outgoing and incoming interactions per TF gene are shown.
Supplementary information
Supplementary Information
Extended Data figure legends; Supplementary Data descriptions 1–16, Figs. 1–8 and legends, Note 1, Table 1 and references.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Pérez-Posada, A., Lin, CY., Fan, TP. et al. Hemichordate cis-regulatory genomics and the gene expression dynamics of deuterostomes. Nat Ecol Evol (2024). https://doi.org/10.1038/s41559-024-02562-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41559-024-02562-x