Introduction

Hepatocellular carcinoma is the fourth most common cancer worldwide and the third leading cause of cancer deaths in patients, with a 5-year survival rate of approximately 18%1. Due to the specificity of HCC, metastasis of cancer cells is most often seen in the intrahepatic portal vein metastasis or occurs in lymph node metastasis, which leads to the limited clinical means of treating metastatic HCC, including surgical resection, radiotherapy, and immunotherapy for the immunotherapy for primary tumors2. However, these methods are not a good treatment option for HCC that has developed metastases3. Therefore, early diagnosis of patients with a high risk of hepatocellular carcinoma metastasis is important, and the underlying mechanisms of hepatocellular carcinoma metastasis have not yet been clarified.

Nowadays, although cancer metastasis is increasingly being studied, the multi-omics mechanisms at the cellular and molecular levels remain unelucidated4,5,6. In recent years, the development of single-cell RNA sequencing technology has helped us to study the potential mechanisms and drivers of HCC metastasis7,8,9,10. Meanwhile, the development of immunotherapy technology has also provided new solutions to study the mechanism of11 in tumor metastasis. It has been shown that certain metabolic properties are required for tumor metastasis12. Therefore, we further explored the mechanisms related to liver cancer metastasis from the perspective of cell metabolism and analyzed the metabolic process of cancer cells. By analyzing the single-cell sequence data of tumor metastasis samples, we found that the immune cells in the liver cancer microenvironment had a specific distribution. Based on the phylogenetic tree, we identified two hepatocyte subpopulations: lymph node metastasis-associated hepatocytes (LNNMAHs) and portal vein metastasis-associated hepatocytes (PVMAHs). In clinical applications, the combination of bulk RNA sequencing technology allows for the interpretation of multi-omics features and average gene expression levels in patient tissues13,14. Therefore, combining bulk RNA-seq data with scRNA-seq data in hepatocellular carcinoma can provide a more accurate understanding of the dynamic process of metastasis occurring in hepatocellular carcinoma and identify the drivers of cancer metastasis.

In conclusion, this study combines single-cell and bulk sequencing data to reveal the potential mechanisms and drivers of lymph node metastasis and portal vein metastasis in HCC, to provide a new scheme for predicting early metastasis in patients, and to provide a new research idea for further investigation of immune microenvironmental differences and molecular mechanisms of HCC metastasis.

Results

Single-cell transcriptomic data reveal distinct cellular compositions and immune landscapes between PT and LNM, PVM samples in HCC

The flow chart of this study is shown in Fig. 1A. In order to systematically study and reveal the cellular characteristics of HCC metastasis, we collected 36,900 cells from 13 patients, used the t-SNE dimensionality reduction algorithm15 to classify the cells in the PT, LNM and PVM samples into independent cellular subpopulations, respectively (Supplementary Fig. S1,S2 and S3). Then, the cells in the three types of samples were manually annotated into six major cell types: hepatocytes, Myeloid cells, T/NK cells, B cells, endothelial cells, and fibroblasts, and finally the state of cellular composition in each type of sample was obtained (Fig. 1B). In addition, we further demonstrated the cellular distribution of PT, LNM and PVM samples (Fig. 1C-E). The results showed that there were large differences in cell distribution among the three sample sources and that the proportions of B cells, fibroblasts and endothelial cells differed significantly.

Fig. 1
figure 1

Cellular composition in primary tumor (PT) and lymph node metastasis (LNM), portal vein metastasis (PVM) samples of HCC analyzed by single-cell RNA-seq. (A) Flowchart of this study. (B) Number and percentage of cell types obtained after labeling. (CE) Cell distribution in primary tumor (PT), lymph node metastasis (LNM) and portal vein metastasis (PVM) samples.

To further explain the impact of the activities of these cell subpopulations on HCC metastasis, we performed separate pathway functional analyses for these cell subpopulations. Interestingly, GSVA results showed that the MYC target V1 was significantly activated in LNM derived hepatocytes and T/NK cells (Supplementary Fig. S4A and S5A). Most studies have shown that this target can affect the proliferation and metastasis of HCC16,17,18,19. Abnormally expressed mitosis-associated pathways were found in LNM-derived Myeloid cells, which were also strongly associated with cancer cell proliferation (Supplementary Fig. S5B). We found significant activation of the interferon-a response pathway in B cells (Supplementary Fig. S4B), and it has been shown that the secretion of interferon-a by plasma cells differentiated from B cells promotes the recruitment of myeloid-derived suppressor cells, which enables the tumor cells to evade the tumor killing by CD8 + T cells, ultimately leading to the development of metastasis in HCC20. Differently, MYC target, DNA replication pathway was significantly activated in PVM-derived hepatocytes (Supplementary Fig. S4C). Specifically, we found that PVM derived T/NK cells exhibited multiple aberrantly activated pathways, including MYC targets, DNA replication, E2F targets, PI3K-AKT-mTOR signaling pathway, apoptosis, and p53 signaling pathway (Supplementary Fig. S4D). We speculate that differences in the composition and function of different cell subpopulations in the HCC microenvironment contribute to tumor colonization at different sites. The functions of the other cells are shown in Supplementary Figure. S5.

Cellular origin of tumor metastasis analyzed by copy number variation and clonal evolution

To determine the clonal evolutionary trajectory and cellular origin of metastasis-associated malignant hepatocytes, we used the inferCNV algorithm to infer malignant hepatocytes in PT, LNM, and PVM specimens, respectively. 10,464 PT samples were determined to be malignant compared to the reference cells. Similarly, 1,243 and 2,946 malignant cells were adjudicated from LNM- and PVM-derived hepatocytes, respectively (Fig. 2B and D). The phylogenetic tree of malignant cells indicating the evolution of cancer metastasis was plotted using Uphyloplot2 software. The results showed that an increase in chromosome 10p was found in malignant hepatocytes from all PT samples (Fig. 2A). Interestingly, an increase in 5p, 7q, 8q, 9q and a deletion of 10q were found in all malignant hepatocytes of LNM origin, while a new variant arose: a deletion of 12p (Fig. 2C), whereas an increase in 1q, 7p, 7q, 8q, and 22q was found in all malignant hepatocytes of PVM origin, while three new variants arose: deletions of 1p, 4p, and 11q (Fig. 2E). Upset plots showed that 251 genes involved in CNV events arising in the LNM were shared by four subclonal cell populations in the PT sample (Fig. 2F), and we also found 290 genes in these chromosomal regions that were shared by all subclones in the LNM sample (Fig. 2G). Similarly, 53 genes involved in CNV events occurring in PVM were shared by three subclonal cell populations in PT samples (Fig. 2H), and variants of 387 genes shared by all subclones in PVM samples were identified in these chromosomal regions (Fig. 2I). These results suggest that hepatocytes in LNM versus PVM may be derived from different subclones of the PT samples described above, and that malignant cells further produce new CNV subclones to promote cancer progression and metastasis.

Fig. 2
figure 2

Copy number variation and clonal evolution analysis of primary tumor (PT) and lymph node metastasis (LNM), portal vein metastasis (PVM) hepatocytes. (A) Phylogenetic tree of the PT sample. (B) All CNV events for the LNM sample. (C) Phylogenetic tree of the LNM sample. (D) All CNV events in PVM-derived hepatocytes. (E) Phylogenetic tree of the PVM sample. (FG) UpSet plots showing genes shared by PT and LNM subclones on related chromosomes. (HI) UpSet plots between PT and PVM. (J) Intra-tumor heterogeneity between PT, LNM and PVM. (K, L) PT and LNM, labeling pathways of malignant hepatocytes in PVM samples.

To further assess the differences in intra-tumor tissue heterogeneity in PT versus LNM and PVM samples, we calculated tumor heterogeneity scores for the samples. The results showed that the diversity score of PT was higher than that of LNM and PVM, while the score of LNM was slightly higher than that of PVM (Fig. 2J). GSVA analysis of malignant tumor hepatocytes showed that MYC targets, Wnt/b-Catenin and Notch signaling pathways, and epithelial-mesenchymal transition were significantly activated in malignant tumor hepatocytes from screened LNM and PVM samples, and, in addition, metabolic pathways, such as hypoxia and glycolysis, were significantly expressed in PVM samples (Fig. 2K and L).

Pseudotemporal trajectory analysis of LNM and PVM-associated malignant hepatocyte subpopulations

To further elucidate the mechanism of hepatocyte subpopulation metastasis, we used the t-SNE algorithm to reclassify 14,653 malignant hepatocytes into 11 subpopulations, termed as hepatocyte clusters 0–10 (Fig. 3A and B). We found that cell cluster 1, consisting of 2127 hepatocytes, was composed of cells from the PT and LNM samples, and cell clusters 0 and 4, consisting of 4362 hepatocytes, were composed of cells from the PT and PVM samples. Although cell cluster 10 contained both PT, LNM, and PVM samples, we removed this cell subpopulations in subsequent analyses due to its low cell count and lack of distinct distributional features. The analysis showed that malignant hepatocytes showed branching trajectories in different directions and gave rise to three cell states in one two branches determined by cell fate (Fig. 3C and D). Among them, the LNM cell trajectory showed that the S2 state was mainly enriched with PT cells (92.3%), the S3 state was mainly enriched with LNM cells (70.0%), while the S1 state was enriched with both PT and LNM cells (100%). GSVA analysis showed that there were significant differences in the functions activated in the three states (Fig. 3E), e.g., MYC targets and DNA repair-related pathways were significantly activated in the S3 state, whereas energy metabolism pathways and epithelial-mesenchymal transition pathways were markedly inhibited, suggesting that during the differentiation and development of PT cells into the LNM subpopulation, some of the pre-existing tumor features were suppressed, while new functions were activated. Similarly, trajectory analysis of PVM showed that state S2 was predominantly enriched for PT cells (85.8%), state S3 was predominantly enriched for PVM cells (97.8%), and state S1 was enriched for both PT and PVM cells (100%).The GSVA results showed an expression state similar to that of LNM, as well as significant activation of P53 and metabolism-related pathways (Fig. 3F).Based on the branching points of LNM and PVM, the cells after time-fitting sorting were analyzed using the Branched Expression Analysis Modeling (BEAM) algorithm to identify branching-dependent genes contributing to the three cellular states (Fig. 3G and H). Here we identified 162 LNM-dependent and 99 PVM-dependent genes, respectively. These genes regulate the process of cell differentiation from state S1 (initial branching) to S2 (cell fate 1) and S3 (cell fate 2) states on the cell. In conclusion, since there was a shift from primitive cells to two different cell fates, we hypothesized that this shift may have driven the transfer of HCC from PT to LNM and PVM. Therefore, we defined these cells as cancer metastasis-associated hepatocytes (CMAHs). GSVA analysis showed significant activation of functional pathways such as MYC targets, Wnt/b-Catenin, and others in Lymph node metastasis-associated hepatocytes (LNMAHs) compared to primary malignant hepatocytes (Fig. 3I), and additional activation of metabolism-related pathways in Portal vein metastasis-associated hepatocytes (PVMAHs) (Fig. 3J), which flanked the suggestion that CMAHs possessed a greater metastatic potential.

Fig. 3
figure 3

Pseudotemporal trajectory analysis distinguishes between lymph node metastasis-associated hepatocytes (LNMAHs) and portal vein metastasis-associated hepatocytes (PVMAHs). (A) Results of malignant hepatocyte distribution with different tissue sources color-coded. (B) Distribution of reclustering results. (C, D) Pseudotemporal trajectories (left), pseudotimes (top right) and cell states (bottom right) of malignant hepatocytes in PT and LNM, PVM. (E, F) Marker pathway differences between LNM and PVM. (G, H) Heatmaps of branching-dependent gene mapping of LNM and PVM. (I, J) Significantly different marker pathways between LNMAHs and PVMAHs compared to CMAHs.

Intercellular communication reveals cellular communication and molecular mechanisms present in cancer cell metastasis

To gain insight into the mechanisms of the cellular environment in which malignant hepatocytes reside during metastasis and target colonization, we analyzed the level of intercellular communication in different samples using the R package CellCall. The results showed some similarities between the results of LNM and PVM, that myeloid cells, fibroblasts, and endothelial cells were the main receptors receiving messages from hepatocytes (Fig. 4A and C), and that these intercellular communications may be a prerequisite for the development of metastasis in malignant hepatocytes. Therefore, we further analyzed the pathway activity of these cells, and the results of LNM showed that myeloid cells and endothelial cells significantly activated part of the oncogenic pathway in intercellular communication. Intercellular communication between myeloid, endothelial, and fibroblasts was more active (Fig. 4B), so the ligand-receptor-transcription factor regulatory relationships of these cells were characterized, resulting in 10 transcription factors that were significantly expressed in this process: FOXO3, ATF4, MYC, STAT6, CREB1, EP300, FOS, MEF2C, STAT3 and EGR1 (Fig. 4E). Similarly, analysis of pathway activity in PVM showed that in addition to a similar expression pattern to LNM, some novel cellular communication results in PVM: T/NK-Endothelial, B- Endothelial were also significantly expressed in part of the oncogenic pathway (Fig. 4D). 14 transcription factors were identified as significantly expressed in these cell pairs: ATF4, FOXO3, CTNNB1, ELK4, HEY1, IKBKB, RB1, RBPJ, RELA, RORA, SMAD1-4 (Fig. 4F).

Fig. 4
figure 4

Intercellular communication and molecular mechanisms present during lymph node metastasis (LNM) and portal vein metastasis (PVM). (A) Circle diagram of intercellular communication in LNM. (B) Bubble diagram of signaling pathway activity occurring during cellular communication in LNM. (C) Results of cellular communication in PVM. (D) Bubble plots show the pathway activities of intercellular communication in PVM. (E, F) Sankey plots showing significantly expressed ligand-receptor-transcription factors in LNM and PVM.

Validation of the correlation of CMI with multi-omics data and clinicopathologic features in the TCGA-LIHC cohort

We identified specific marker genes for LNMAHs, PVMAHs and intersected these marker genes with branch-dependent genes (Fig. 5A and B). Finally, eight representative genes for LNMAH (AZGP1, MCM7, LINC01287, SNCG, FABP5, LINC01419, NDRG1, DKK1) and four representative genes for PVMAH (HSPA6, RGS5, ARC, CDC45) were obtained, respectively (Fig. 5A and B). To estimate the relative abundance of CMAHs in the TCGA cohort, the CMI of bulk RNA-seq data from 418 patients was calculated as an indicator of the relative abundance of CMAHs in the cohort. The Kruskal-Wallis test showed that the LNM index correlated well with T stage (p-value = 3.4*10 − 3), survival time (p-value = 3.3*10 − 3), and Stage (p-value = 2.7*10 − 2), whereas the PVM index was significantly correlated only with survival time (p-value = 8*10 − 3). Subsequently, one-way cox regression analysis showed that LNM index (p-value = 4.1*10 − 2) and PVM index (p-value = 1.8*10 − 2) were significant prognostic factors for overall survival (OS) in the LIHC cohort. The results of the Kaplan-Meier survival analysis showed that patients with a high LNM index and a high PVM index had significantly worse OS (p-value = 2.8*10 − 4 and 1.1*10 − 2) (Fig. 5C and D). In addition, to further reveal the differences between patients with high and low CMI at the genomic level, we calculated the Tumor Mutation Burden (TMB) index and Mutant-Allele Tumor Heterogeneity (MATH) index in the LNM versus PVM sample groups by SNP data. Interestingly, the TMB index was significantly higher in the high metastatic risk group in the LNM sample (p-value = 5.7*10 − 3) and not significant in the MATH index (p-value = 0.27) (Fig. 5E). However, the MATH index in the PVM sample was significantly higher in the high metastatic risk group (p value = 9.5*10 − 6), whereas the TMB index was not significant (p value = 0.17) (Fig. 5F). Differently, more subversion was found in the PVM samples with high CMI, whereas this was not found in LNM (Fig. 5G and H). Lollipop plots showed the mutation types and loci of AZGP1, MCM7, and NDRG1 genes between the LNM high and low groups (Supplementary Fig. S6A, B and C). Similarly, mutations occurring in the HSPA6 and RGS5 genes in PVM were also missense mutations (Supplementary Fig. S6D and E).

Fig. 5
figure 5

Validation of the correlation of the Cancer Metastasis Index (CMI) with multi-omics data and clinical information. (A, B) Flowchart of identification of LNMAH and PVMAH marker genes. (C, D) Survival curves of patients with high metastatic index. (E, F) Box plots showing differences in TMB and MATH in patients with different metastatic indices. (G, H) Box line plots indicate the proportion of SNP events and transitions and reversals. Histograms indicate the overall proportion of SNP events.

Differential expression of LNM and PVM in metabolic functions

To further investigate the changes in cellular metabolic functions during malignant hepatocyte metastasis, we utilized filtered single-cell data (Fig. 1C, D and E) as the dynamic process of HCC metastasis from the primary foci to the lymph nodes or portal vein. The single-cell results showed abnormal activation of glycolysis, glucose metabolism, and pyruvate metabolism pathways during LNM (Supplementary Fig. S7), and abnormal activation of pyruvate metabolism, glycolysis, TCA cycle, glucose metabolism, and arachidonic acid metabolism pathways during PVM (Supplementary Fig. S8). Similarly, GSVA results showed consistent results (Supplementary Fig. S9A and S10A). This suggests that malignant cells undergoing metastasis significantly activate some of the metabolic pathways to produce a variety of metabolites, such as pyruvate and lactate, which promotes the invasive and migratory ability of cancer cells. In addition, the results of REACTOME metabolic pathways showed that immunomodulatory pathways between lymphocytes and non-lymphocytes, and chemokine receptor pathways were significantly activated in samples with high LNM index (Supplementary Fig. S9B), whereas chromatin modification, cellular transcription, and mitophagy pathways related to cell proliferation were significantly activated in samples with high PVM index (Supplementary Fig. S10B). In addition, the network diagrams showed the interactions between metabolism-related genes involved in the significantly enriched metabolic pathways (Supplementary Fig. S9C and S10C), as well as the coordinated interactions between these pathways to achieve specific functions (Supplementary Fig. S9D and S10D).

Identification of drugs targeted to treat patients with high CMI

First, we screened highly sensitive drugs targeting subpopulations of LNMAHs and PVMAHs as well as highly sensitive drugs targeting patients with high LNM index and PVM index in order to screen candidates for inhibiting metastasis occurring in HCC (Fig. 6A and B). Here, we identified one LNM-sensitive GDSC-derived drug: Dasatinib and three PVM-sensitive GDSC-derived drugs: Pevonedistat, Pyridostatin, Ulixertinib, and six PVM-sensitive CTRP-derived drugs were finalized: Indisulam, Daporinad, CAY10618, STF-31, Birinapant, Decitabine. these drugs showed similar expression trends in patients with metastasis-associated malignant hepatocytes and high CMI: the IC50 values of the drugs were significantly under-expressed in comparison to MHs and patients with low CMI (Fig. 6C, D, F, and G), and showed a negative correlation (Fig. 6E and H). In addition, Supplementary Figure. S11 and Supplementary Figure. S12 show the expression trends of other PVM-sensitive drugs. For LNM patients, we preferred Dasatinib to study its therapeutic mechanism and performed differential analysis of mRNA expression levels against its drug targets. The results showed that KRAS was significantly overexpressed in LNMAHs compared with MHs and was negatively correlated with the IC50 value (log2FC = 1.20, p-value < 0.0001, Cor = -0.24) (Fig. 6I). Similarly, for PVM patients, we considered Decitabine as their drug candidate, and differential analysis showed that GSTP1 was significantly highly expressed in PVMAHs and negatively correlated with IC50 values (log2FC = 1.41, p-value < 0.0001, Cor = -0.49) (Fig. 6J). Finally, KRAS served as a therapeutic potential target for Dasatinib in the treatment of HCC lymph node metastasis, while GSTP1 served as a therapeutic potential target for Decitabine in the treatment of HCC portal vein metastasis. The potential value of these two drug targets for the treatment of HCC metastasis needs to be validated by further clinical and other studies.

Fig. 6
figure 6

Screening of therapeutic agents with high sensitivity to patients with metastasis-associated hepatocytes and high metastatic index. (A, B) Workflow diagram for identifying LNM- and PVM-sensitive drugs. (C, D) Box plot indicating significant drug sensitivity differences for GDSC-derived Dasatinib. (E) Correlation density plot for Dasatinib. (F, G) Boxplots showing CTRP-derived Decitabine with significant drug sensitivity differences. (H) Densitogram of Decitabine. (I) Densitogram of dasatinib associated with KRAS. (I) Densitogram of Decitabine IC50 associated with GSTP1.

Discussion

In this study, we revealed the intrinsic mechanisms and cellular origins of HCC metastasis in different directions by systematically combining cellular distribution characterization, CNV extrapolation, pseudo-temporal trajectory analysis, and cell-to-cell communication based on single-cell transcriptomic data of HCC as well as a large amount of genomic and transcriptomic data. Here, we identified two metastatic subpopulations of HCC: LNMAHs and PVMAHs. Compared with PT, these two cell subpopulations showed significant changes in TF expression and metabolic pathways, accompanied by lower intra-tumor heterogeneity. In addition, patients with high CMI had a poorer survival prognosis and novel changes in SNPs. While many previous studies have focused on target and mechanism analysis based on mRNA data21,22,23, our results suggest that HCC metastasis is not only determined by target genes, but also accompanied by dynamic changes in the tumor cells’ own immune microenvironment. It has been shown that cancer metastasis involves multiple biological processes and that most of these biological processes are no longer seen as a linear cascade of events but as a series of concurrent and partially overlapping processes24. Cancers develop in complex organizational environments on which they depend for their continued growth, invasion and metastasis. And successfully metastasized cells present new specific phenotypes while discarding old behavioral traits25. In our study it is well illustrated that the accumulation and increase of CNV occurs during HCC metastasis, while the tissue environment in which the tumor cells are located exhibits functional expression that promotes tumor proliferation. In addition, Jawad Fares et al. pointed out that invasive tumor cells undergo inter-cellular interactions during their arrival at the target site, which leads to cellular mobility and alteration of the microenvironment of the colonized area to enhance the invasion and proliferation of tumor cells26. It has been suggested that the invaded organ is not the recipient of tumor cells, but rather, cytokines are applied to the tumor cells before they metastasize, making it a “soil” for tumor colonization27. In our study, we found that in the cellular environment where metastasis occurs, Myeloid cells, fibroblasts and endothelial cells receive signals from hepatocytes and exhibit specific functional responses. Early diagnosis of metastatic risk in patients with HCC is essential for clinical management. Suzanne A Eccles et al. suggested that the majority of cancer deaths are due to the occurrence of metastasis, and therefore the most critical to reduce mortality or recurrence is the early prevention or elimination of such disseminated metastases28. However, in the absence of effective biomarkers, these findings cannot accurately guide clinical treatment. It has been claimed that the Wnt/b-Catenin signaling pathway plays an important role in HCC metastasis, and aberrant activation of this pathway contributes to the proliferation and metastasis of tumor cells29. All of the above studies have tried to find biomarkers or pathways for HCC metastasis, but they have neglected the dynamic intercellular process of HCC itself, and the value of clinical application is not clear enough. Our study screened the metastatic cell subpopulation of HCC and used it to analyze the differences between the metastatic subpopulation and the primary subpopulation, and finally obtained its biomarkers, which made up for the shortcomings of previous studies.

Undeniably, this study has some shortcomings. Although we found the differences in the immune microenvironment and metabolism as well as the process of intercellular communication between LNM, PVM and PT in HCC, it needs to be further clinically verified whether the signaling communication between immune and non-immune cells has an effect on the occurrence of metastasis in HCC, and more in-depth experiments are needed to prove that cellular metabolites are significantly expressed in metastatic samples.

Materials and methods

Data collection and pre-processing

We collected scRNA-seq data GSE14961430 in the Gene Expression Omnibus (GEO) database31. we used the R package Seurat v532 to perform a series of processes on the single-cell data. We collected a set of hepatocellular carcinoma data GSE1452033 from the GEO database as well as downloaded the LIHC cohort data from The Cancer Genome Atlas (TCGA) database. The clinical characteristics of the patients in each data set are shown in Table S1 and Table S2.

Gene set variant analysis (GSVA)

All human Hallmark gene sets (v2023.1) obtained from the MsigDB database34. Using the R-package GSVA35 for analysis. The significance threshold value is considered significant when t > 2, indicating that there are significant differences in the signature pathways.

Copy number variation estimation and clonal evolutionary analysis

For single-cell sequence data and chromosome alignment data, we used the R software package inferCNV to calculate copy number variation scores, which provide an estimate of the degree of malignancy of the cells to distinguish between malignant and non-malignant hepatocytes. Then we used the python2-based software UPhyloplot236 to draw a phylogenetic tree to characterize the chromosomal evolutionary structure of the subclonal copy number variation.

Estimation of intra-tumor heterogeneity based on diversity scores

To estimate the degree of intratumor heterogeneity among PT, LNM and PVM, the gene expression data of all malignant cells were subjected to principal component analysis, and the Euclidean distance of each cell from the centroid of the tissue was calculated as the diversity score of the tissue based on the weight values corresponding to the best principal components selected.

Pseudotemporal trajectories to analyze patterns of cellular dynamics

The gene expression levels of malignant hepatocyte subpopulation were estimated using R package monocle237 to obtain the dynamic patterns of malignant hepatocyte subpopulations in PT, LNM and PVM, respectively. Then branch-dependent regulatory genes were searched for as candidate cancer metastasis trajectory-related genes based on the trajectory branching points using the Branched Expression Analysis Modeling (BEAM) statistical algorithm, and we considered the genes with q-value < 0.001 as significant branch-dependent genes.

Intercellular communication analysis of single-cell data

We analyzed the intracellular as well as intercellular communication levels (including ligand, receptor and transcription factor levels) in the LNM subpopulation and PVM subpopulation using the R package CellCall38.

Cancer metastasis index (CMI) calculation for bulk RNA-seq data

The cancer metastasis index of TCGA-LIHC with GSE14520 was calculated using multifactorial logistic regression analysis and was used to represent cancer metastasis-associated hepatocytes (CMAHs) in bulk RNA-seq data. We use the best threshold value division CMAHs relative abundance. In addition, based on the risk of metastasis, we further investigated the association between CMI in the TCGA-LIHC dataset and genomic data (SNPS) and clinicopathological features, especially survival data.

Identifying LNMAHs, PVMAHs, and targeted therapies for patients with high CMI

Gene expression profiling data of human cancer cell lines were obtained from the Genomics of Drug Sensitivity in Cancer v2 (GDSC v2) database39 and the Cancer Therapeutics Response Portal v2 (CTRP v2) database40, as well as the drug’s cellular half maximal inhibitory concentration (IC50). R package oncoPredict41 was utilized to predict drug sensitivity in the CMAHs subpopulation and the high CMI sample group, and the correlation level between drugs and genes was calculated in order to screen targeted drugs against cancer metastasis.

Statistical analysis

All data processing and statistical analyses in this study were performed using R4.3.2 software, and the phylogenetic tree was completed using Python 2.7 software. Mann-Whitney U test was used for comparison between groups, Kruskal-Wallis test was used for comparison between multiple groups, and independent sample t test was used for continuous variable analysis. A two-tailed P value < 0.05 was considered statistically significant.