Man YN, Zhong LY, Wen YL, He ML. Machine learning identifies complement factor I as a shared mediator of periodontitis and ossification of posterior longitudinal ligament. World J Orthop 2026; 17(3): 115770 [DOI: 10.5312/wjo.v17.i3.115770]
Corresponding Author of This Article
Mao-Lin He, PhD, Division of Spinal Surgery, The First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning 530021, Guangxi Zhuang Autonomous Region, China. hemaolin@stu.gxmu.edu.cn
Research Domain of This Article
Orthopedics
Article-Type of This Article
Meta-Analysis
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Mar 18, 2026 (publication date) through Apr 9, 2026
Times Cited of This Article
Times Cited (0)
Journal Information of This Article
Publication Name
World Journal of Orthopedics
ISSN
2218-5836
Publisher of This Article
Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA
Share the Article
Man YN, Zhong LY, Wen YL, He ML. Machine learning identifies complement factor I as a shared mediator of periodontitis and ossification of posterior longitudinal ligament. World J Orthop 2026; 17(3): 115770 [DOI: 10.5312/wjo.v17.i3.115770]
Yu-Nan Man, Lu-Yang Zhong, Yue-Liang Wen, Mao-Lin He, Division of Spinal Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi Zhuang Autonomous Region, China
Author contributions: Man YN and Zhong LY contributed equally to this work; Man YN, Zhong LY, and Wen YL designed the research study; Man YN, Zhong LY, and Wen YL performed the research and analyzed the data; He ML contributed to funding acquisition and supervision; Man YN wrote the manuscript; all authors have read and approved the final manuscript.
Supported by National Natural Science Foundation of China, No. 82160536.
Conflict-of-interest statement: All authors declare no competing interests.
PRISMA 2009 Checklist statement: The authors have read the PRISMA 2009 Checklist, and the manuscript was prepared and revised according to the PRISMA 2009 Checklist.
Corresponding author: Mao-Lin He, PhD, Division of Spinal Surgery, The First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning 530021, Guangxi Zhuang Autonomous Region, China. hemaolin@stu.gxmu.edu.cn
Received: October 27, 2025 Revised: November 29, 2025 Accepted: December 25, 2025 Published online: March 18, 2026 Processing time: 142 Days and 20.9 Hours
Abstract
BACKGROUND
The clinical co-occurrence of periodontitis and ossification of the posterior longitudinal ligament (OPLL) suggests shared pathophysiological mechanisms, which remain poorly understood.
AIM
To elucidate the key molecular and cellular mechanisms between periodontitis and OPLL.
METHODS
Transcriptomic datasets for human periodontitis and OPLL were integrated. We performed differential gene expression analysis, weighted gene co-expression network analysis, and cross-dataset meta-analysis. A comprehensive machine learning framework incorporating 10 algorithms was applied to identify core genes, with model interpretability assessed via SHapley Additive exPlanations. Single-cell RNA sequencing data from periodontitis tissues were used to validate cell-type-specific expression, and functional enrichment analyses were conducted to elucidate relevant pathways.
RESULTS
Multi-step analysis identified complement factor I (CFI) as a principal contributor to both conditions. CFI expression was consistently upregulated and demonstrated strong discriminatory capacity for periodontitis. At single-cell resolution, endothelial cells within the periodontitis microenvironment were observed to express CFI. Functional enrichment analysis indicated that CFI-positive cells were involved in pathways related to cell adhesion (focal adhesion, adherens junctions), inflammatory signaling (PI3K-Akt pathway), and bacterial infection responses.
CONCLUSION
CFI was identified as a pivotal node connecting periodontitis and OPLL, revealing novel mechanisms and suggesting its potential as a biomarker and therapeutic target.
Core Tip: By integrating transcriptomics with machine learning across two distinct conditions, we identified complement factor I (CFI) as a candidate molecular mediator linking periodontitis and ossification of the posterior longitudinal ligament. This study provides the first computational evidence implicating a potential immune-endothelial axis, mediated by CFI, in the shared pathophysiology of these conditions.
Citation: Man YN, Zhong LY, Wen YL, He ML. Machine learning identifies complement factor I as a shared mediator of periodontitis and ossification of posterior longitudinal ligament. World J Orthop 2026; 17(3): 115770
In recent years, advances in understanding disease complexity have led to a shift from single-disease models toward studying multimorbidity and its interactions[1]. Comorbidity poses challenges for healthcare systems and significantly influences patient prognosis and quality of life[2]. Disease associations that transcend traditional disciplinary boundaries, particularly those involving chronic inflammatory conditions, are of increasing interest. Chronic diseases frequently cluster. For example, patients with human immunodeficiency virus exhibit higher rates of non-communicable diseases, such as hypertension and anemia, and those with tuberculosis often co-present with diabetes[1,3]. These patterns suggest that chronic inflammation and immune dysregulation serve as critical pathways linking seemingly distinct diseases[4].
In 2022, a case of coexisting periodontitis and ossification of the posterior longitudinal ligament (OPLL) was reported, proposing metabolic and immunological links between periodontal inflammation and ligament ossification. Zinc was identified as a key trace element regulating both inflammatory responses and tissue mineralization, highlighting previously unrecognized inter-organ pathophysiological connections[5]. OPLL is characterized by sclerosis and calcification of ligament tissue, which can compress the spinal cord and nerve roots, resulting in motor and sensory dysfunction[6]. Its pathogenesis involves genetic, metabolic, and environmental factors, but molecular regulatory mechanisms remain incompletely defined. Periodontal disease, a common chronic inflammatory condition causing inflammation and destruction of periodontal tissues, is associated with various systemic diseases, including cardiovascular disease, diabetes, and rheumatoid arthritis. Emerging evidence indicates that periodontal inflammation may influence bone metabolism and pathology in distant ligamentous tissues through systemic inflammatory responses[7-9]. Although anatomically distinct, periodontitis and OPLL appear to share pathophysiological links mediated by systemic inflammation and immune dysregulation[10]. First, systemic inflammation may be central to their comorbidity. Chronic oral inflammation in periodontitis leads to the release of cytokines [e.g., interleukin-1beta (IL-1β), tumor necrosis factor-alpha, and interleukin (IL)-6] that circulate systemically, triggering inflammatory responses elsewhere[11]. Systemic inflammation directly participates in OPLL pathogenesis and also activates the Smad signaling pathway, inducing fibroblasts to secrete osteogenic factors (e.g., transforming growth factor-beta and bone morphogenic proteins), upregulating the expression of osteogenesis-related genes (e.g., RUNX2 and OSX), and driving fibroblast differentiation toward osteoblast-like cells[12-14]. Furthermore, periodontitis-associated inflammatory mediators also activate hematopoietic stem/progenitor cells in bone marrow, promoting differentiation toward myeloid cells, a process termed “trained myelopoiesis”[15]. This adaptive change may exacerbate systemic inflammation, further promoting OPLL progression through positive feedback loops. Second, immune dysregulation plays a pivotal role in their co-morbidity. Oral microbiota dysbiosis in periodontitis triggers local and systemic immune responses, including T helper 17 activation and IL-17 overexpression[16,17]. IL-17 mediates bone resorption in periodontitis and is implicated in OPLL pathogenesis[18,19]. Periodontitis-induced immune dysregulation may further promote OPLL via complement system activation, as excessive complement is linked to ossification formation[20-22]. Despite these insights, systematic exploration of co-pathogenic mechanisms linking periodontitis and OPLL is limited, particularly studies integrating multiomics data to identify shared regulatory factors.
Machine learning (ML) has increasingly been applied to screen key genes in comorbidity analysis, offering tools to uncover shared molecular mechanisms. ML algorithms, such as LASSO regression, random forests, and support vector machines, efficiently process complex gene expression data to identify biomarkers associated with comorbidity, supporting early diagnosis and targeted treatment strategies. For example, a study evaluating diabetes-kidney stone comorbidity identified key genes (S100A4, ARPC1B, and CEBPD) via transcriptomic integration with multiple ML models, highlighting central pathways and high diagnostic value[23]. Similarly, a study combining ML methods with weighted gene co-expression network analysis (WGCNA) identified shared biomarkers (MMP2, COL1A2, STAT1, and CXCL1) in liver fibrosis and inflammatory bowel disease, demonstrating associations with immune infiltration and excellent diagnostic performance [area under the curve (AUC) > 0.85][24]. These studies underscore the broad applicability of ML in comorbidity gene screening and the pivotal role of ML in advancing precision medicine[25]. Herein we systematically identified consistently upregulated genes across periodontitis and OPLL via differential gene expression analysis, WGCNA, and standardized mean difference (SMD) calculation. We then applied an integrated ML framework of 10 algorithms, interpreting feature importance using SHapley Additive exPlanations (SHAP) analysis. Key genes were validated at single-cell resolution, and enriched pathways were systematically explored, revealing cell-type-specific functional roles and providing molecular insights into the comorbidity of these conditions.
MATERIALS AND METHODS
Periodontitis and OPLL datasets
High-throughput transcriptomic datasets for periodontitis and OPLL were identified through a systematic search of the Gene Expression Omnibus and ArrayExpress databases from inception until July 1, 2025 (Figure 1). The search strategy for periodontitis included synonyms and abbreviations, such as [“periodontitis” (MeSH) OR “chronic periodontitis” OR “aggressive periodontitis”] AND {“Homo sapiens” (Organism) AND [“Expression profiling by array” (Filter) OR “Expression profiling by high throughput sequencing” (Filter)]}. A parallel strategy was applied for OPLL using the terms (“ossification of the posterior longitudinal ligament” OR “OPLL” OR “spinal ligament ossification”). Study screening and data extraction were independently performed by two reviewers (Man YN and Zhong LY) to minimize bias. Discrepancies were resolved through consensus or consultation with a senior investigator. Duplicate records were removed, and studies lacking raw data or appropriate controls were excluded. Six independent datasets met our inclusion criteria: Four bulk transcriptomics datasets for periodontitis (GSE10334, GSE106090, GSE173078, and GSE223924), one single-cell periodontitis dataset (GSE164241), and one OPLL dataset (GSE69787). All datasets included matched controls. For single-cell RNA sequencing analysis, PD16, PD16b, and PD16c samples (representing lesion sites from the same patient) from the GSE164241 dataset were selected to minimize inter-individual variability and maximize statistical power for clustering and differential expression.
Figure 1 PRISMA flow diagram for dataset identification and selection.
The diagram illustrates the systematic process of dataset identification, screening, eligibility assessment, and final inclusion in the integrated analysis, following PRISMA guidelines. OPLL: Ossification of the posterior longitudinal ligament.
Gene expression profile processing
To identify differentially expressed genes in OPLL samples, we performed statistical analysis on preprocessed gene expression count data using the DESeq2 software package. The workflow included the following steps: First, DESeq2 estimated size factors for raw counts to normalize for library size variation and assessed gene dispersion. Next, a negative binomial generalized linear model was fitted for each gene to determine whether expression levels significantly differed between disease and control groups[26]. Genes with adjusted P value < 0.05 and absolute log2-fold change (|Log2foldchange|) ≥ 1 were considered significantly differentially expressed. To systematically identify co-expression gene modules associated with OPLL, we applied WGCNA to the filtered expression matrix. A scale-free topological network was constructed using expression data from all samples. Based on topological overlap measures, a dynamic tree-cutting algorithm clustered genes with highly correlated expression patterns into distinct modules, each assigned a unique color label. Correlations between module signature genes and disease phenotypes were then calculated to identify key modules significantly associated with target phenotypes. Analysis of module membership and gene significance within these modules enabled the identification of hub genes occupying central network positions[27]. To further validate the relevance of these hub genes to periodontitis pathogenesis, we conducted a meta-analysis that integrated four transcriptomic datasets. Pooled effect sizes were estimated as Hedges' g under a random-effects model (restricted maximum likelihood, τ2 estimator) to account for between-study heterogeneity[28], enabling identification of a robust set of key genes consistently upregulated in periodontitis as candidate biomarkers.
Processing and analysis of single-cell transcriptomic data were performed using the Seurat software package[29]. Following quality control criteria reported in the literature, we retained high-quality cells with detected gene counts (nFeature_RNA) between 200 and 4000 and mitochondrial gene percentage (percent.mt) below 20%[30]. Data were standardized using the SCTransform method, which simultaneously corrected for technical variation introduced by mitochondrial gene content and identified highly variable genes for downstream analysis. To integrate samples and eliminate batch effects, the Harmony algorithm was applied using the top 30 principal components, achieving effective alignment of cells from diverse sources in a low-dimensional space[31]. From the principal components analysis -reduced data, a uniform manifold approximation and projection (UMAP) embedding was constructed for nonlinear dimensionality reduction (using the top 30 principal components). Unsupervised cell clustering was performed using FindClusters (resolution = 0.5), enabling identification of distinct cell subpopulations within the UMAP visualization. Each cluster was manually annotated based on expression of known classical marker genes and evidence from the literature. Clustering results and gene expression patterns were visualized using the single-cell pipeline R package[32], while data from pseudotemporal cell trajectory analysis were visualized using the Slingshot R package[33].
ML for key variable selection
We constructed a comprehensive ML framework to develop and validate predictive models for periodontitis. The dataset was split into a dedicated training set for model development and hyperparameter tuning and an independent test set for assessing generalization capability. To ensure robust and reproducible results, all analyses were performed using the R package “caret”. Ten distinct ML algorithms were evaluated: RandomForest, GradientBoosting, LogisticModel, NeighborMethod, PLSModel, BoostingMethod, NeuralNet, BayesMethod, DiscriminantModel, and Lasso. To prevent overfitting and data leakage, we implemented nested resampling: An outer loop of 5-fold cross-validation with three repeats for model validation and an inner loop for hyperparameter optimization. The training set GSE10334 comprised 183 periodontitis samples and 64 healthy controls, while the test set GSE173078 included 12 periodontitis samples and 12 healthy controls. Model performance was assessed using both discrimination (sensitivity, specificity, accuracy, balanced accuracy, and F1-score, derived from confusion matrices) and calibration (Brier score, reflecting the accuracy of probabilistic predictions). Receiver operating characteristic (ROC) curves were generated using the “pROC” package, and model performance between the training and test sets was visualized via forest plots. The optimal model was selected based on balanced performance across key metrics. SHAP analysis was applied to interpret feature contributions. Clinical utility was evaluated using decision curve analysis across probability thresholds ranging from 0.01 to 0.99. All reporting adhered to TRIPOD + AI guidelines.
Identification of key molecular clinical significance
This analysis assessed discriminatory performance within and across datasets rather than formal discriminatory test accuracy, considering the observational nature of transcriptomic studies. To systematically evaluate the discriminatory capability of key genes, we first constructed ROC curves using each dataset, followed by calculating AUC values using “pROC”. A higher AUC value (closer to 1) indicated superior discriminatory performance for a given gene. To integrate results across multiple datasets, we plotted a summary ROC curve to assess the overall discriminative capability[34,35]. Recognizing that strong discriminatory accuracy alone does not ensure clinical utility, we further performed decision curve analysis (DCA). DCA curves were plotted, and the net clinical benefit was quantified to evaluate the practical value of using these key genes as biomarkers[36].
Functional annotation and pathway enrichment analysis
Protein expression patterns were validated using the Human Protein Atlas. Conserved domains and motifs were analyzed via GenDoma (https://ai.citexs.com/homePath). Regulatory networks, including transcription factors, miRNAs, lncRNAs, and downstream associations (compounds and drugs), were constructed. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was performed using the R package “clusterProfiler” to identify significantly enriched signaling pathways and biological processes.
Statistical analysis
Data visualizations were generated in R v4.4.1 using ggplot2, pROC, rmda, and shapviz. Two-group comparisons employed the Wilcoxon test; multiple-group comparisons used the Kruskal-Wallis test. Associations were assessed via Pearson correlation. P < 0.05 was considered statistically significant.
RESULTS
Common core genes between OPLL and periodontitis
The details of inclusion and exclusion in this study are shown in Figure 1. Figure 2A depicts the overall technical approach of this study. In total, 4437 genes were differentially expressed in OPLL (Figure 2B). WGCNA was performed to further analyze their co-expression patterns and identify key gene modules most strongly associated with the clinical features of OPLL. The selection criteria included |module membership| > 0.85 and |gene significance| > 0.9. The “lightpink4” module exhibited the strongest positive correlation with the OPLL phenotype (correlation = 0.86), suggesting that its core gene cluster plays a central role in OPLL pathogenesis (Figure 2C-E, Supplementary Figure 1A and B). To investigate the expression patterns of these core genes in periodontitis, we analyzed SMD across four transcriptomic datasets. Basic information for each dataset is shown in Figure 2F. Ultimately, we identified 45 co-expressed genes showing consistent expression alterations in periodontitis, with their SMD and confidence intervals presented in Figure 2G.
Figure 2 Identification of shared core genes between ossification of the posterior longitudinal ligament and periodontitis via weighted gene co-expression network analysis and meta-analysis.
A: Schematic workflow of the study; B: Differential gene expression analysis in ossification of the posterior longitudinal ligament (OPLL); C-E: Identification of key modules by weighted gene co-expression network analysis; Gene clustering dendrogram (C); Module-trait relationships showing the “lightpink4” module with the strongest association to OPLL (D); Characteristic scatter plot of the “lightpink4” module (E); F: Information of the periodontitis transcriptomic datasets; G: Standardized mean difference analysis of 45 co-expressed genes demonstrating consistent differential expression in periodontitis. SMD: Standardized mean difference.
Feature variable selection
We constructed a discrimination model using periodontitis transcriptomic data from Gene Expression Omnibus. The training set GSE10334 comprised 183 periodontitis samples and 64 healthy controls, while the preliminary validation set GSE173078 included 12 periodontitis samples and 12 healthy controls. Ten ML algorithms were systematically applied, and model parameters were optimized using 5-fold cross-validation with three repeats within a standardized preprocessing and data-partitioning workflow designed to prevent data leakage. Model performance was comprehensively evaluated using both discrimination metrics (AUC, sensitivity, specificity, accuracy) and calibration metrics (Brier score) (Figure 3A). Model performance across evaluation dimensions is summarized in Figure 3B. As evident from the forest plot (Figure 3C) and ROC curves (Figure 3D and E), we comprehensively evaluated the ability of all 10 ML algorithms to discriminate CFI in periodontitis. In the training set, BayesMethod achieved the highest balanced accuracy (0.884), followed closely by RandomForest (0.883) and NeuralNet (0.844), with all models exhibiting balanced accuracies well above random expectation (0.785-0.884). These results highlight the strong discriminatory performance of the models under training conditions. In the test set, despite the small validation sample, the models maintained meaningful discriminatory capability, with balanced accuracies ranging from 0.5 to 0.75. The Lasso model performed the best (0.75), followed by PLSModel (0.667) and BayesMethod (0.708). Its discriminatory capability was further supported by DCA (Figure 3F) and confusion matrix visualization (Figure 3G), achieving 91.5% accuracy in the training set and 75% in the validation set. SHAP analysis identified complement factor I (CFI) as the top-ranking contributor (Figure 3H). Altogether, these results indicated that, despite the limited validation sample size, the integrated ML framework was able to identify a generalizable discriminative model and consistently highlighted CFI as a key discriminatory gene for periodontitis.
Figure 3 Feature variable selection via integrated machine learning algorithms and SHapley Additive exPlanations explainability analysis for disease discrimination.
A: Summary of performance metrics (e.g., accuracy, sensitivity, specificity) for the 10 machine learning models (the training set is on the top and the validation set is on the bottom); B: Line charts depicting model performance across different evaluation dimensions; C and D: Forest plot displaying area under the curve values and their 95% confidence intervals for all models; E: Receiver operating characteristic curves illustrating the predictive performance of the models; F: Decision curve analysis evaluating the clinical utility of the optimal model; G: Confusion matrix visualization for the selected model; H: SHapley Additive exPlanations analysis identifying complement factor I as the top contributing feature in the optimal model.
Clinical significance and biological mechanisms of CFI
To assess the clinical significance of CFI in periodontitis, we analyzed expression correlations between CFI and the 45 co-expressed genes across four transcriptomic datasets. CFI exhibited strong positive correlations with multiple genes (Supplementary Figure 1C), suggesting that it functions as a central regulatory node in periodontitis. Expression analyses confirmed that CFI was consistently expressed and that its expression was significantly upregulated across all four datasets (Figure 4A). Meta-analysis under the random-effects model revealed a large pooled effect size (Hedges’ g = 1.61, 95%CI: 0.81-2.42), accompanied by substantial heterogeneity (I2 = 72.0%, τ2 = 0.451) (Figure 4B). The result remained stable in leave-one-out sensitivity analysis, and no significant publication bias was detected (Egger’s test, P = 0.217), supporting the robustness and reproducibility of CFI overexpression across datasets (Figure 4C). ROC and summary ROC analyses further confirmed the strong discriminatory performance of CFI, yielding an overall AUC of 0.86 (Figure 4D and E). To elucidate potential mechanisms, we constructed a CFI-centered multi-level regulatory network encompassing miRNA-gene, transcription factor-gene, lncRNA-gene, and compound-gene interactions (Figure 4F). Based on protein localization data from the Human Protein Atlas, CFI was identified to be predominantly expressed in endothelial cells, providing histological evidence for its potential role in inflammatory and immune microenvironments (Figure 4G).
Figure 4 Exploration of clinical significance and biological mechanisms of complement factor I.
A: Validation of complement factor I (CFI) upregulation in four independent periodontitis datasets; B and C: The analysis results indicate a large pooled effect size, with a Hedges’ g value of 1.61 (95%CI: 0.81-2.42). Heterogeneity tests indicated substantial heterogeneity among the studies (I2 = 72.0%, τ2 = 0.451). The leave-one-out sensitivity analysis confirmed the robustness of this pooled result. No significant publication bias was detected by Egger’s linear regression test (P = 0.217); D: Receiver operating characteristic (ROC) curve evaluating the discriminatory capability of CFI for periodontitis; E: Summary ROC (sROC) curve with a composite area under the curve of 0.86; F: CFI-centered multi-level regulatory network showing interactions with miRNAs, transcription factors, lncRNAs, and compounds; G: Human protein atlas validation of CFI protein expression localized primarily in endothelial cells.
CFI expression upregulation in endothelial cells during periodontitis
While CFI expression upregulation has been previously reported in patients with periodontitis[37], its single-cell expression landscape remains poorly characterized. Therefore, using single-cell RNA sequencing data, we analyzed CFI expression across annotated cell types. CFI was primarily expressed in endothelial cells (Figure 5A-E). To further identify the primary source of CFI expression, cells were classified into CFI-positive and CFI-negative groups based on expression levels, and the proportional distribution of each cell subset was compared between the groups. Endothelial cells constituted a significantly higher proportion in the CFI-positive group, suggesting that endothelial cells are the primary source of CFI expression (Figure 5F and G). In addition, cell trajectory analysis indicated that CFI expression progressively increased during endothelial cell differentiation, implying a potential role in cellular maturation or state transitions (Figure 5H).
Figure 5 Single-cell characterization of complement factor I expression and functional enrichment in periodontitis.
A and B: Quality control of single cell data; C-E: Expression distribution of complement factor I (CFI) across different cell types; F and G: Identification of the primary cellular source of CFI; F: Proportion of each cell subset in CFI-positive and CFI-negative groups; G: Comparative analysis demonstrating endothelial cells as the dominant source of CFI expression; H: Cell trajectory analysis showing progressive increase of CFI expression during endothelial cell differentiation; I: Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of differentially expressed genes in CFI-positive vs CFI-negative groups, highlighting significant enrichment in pathways related to cell adhesion, bacterial infection, and intracellular signaling.
To elucidate the biological processes associated with high CFI expression, differential expression and KEGG pathway enrichment analyses were performed with the CFI-positive and CFI-negative groups. Genes in the CFI-positive group were significantly enriched in multiple pathways related to cell adhesion, bacterial infection, and intracellular signaling, including focal adhesions, Salmonella infection, bacterial invasion of epithelial cells, PI3K-Akt signaling, and adherens junctions (Figure 5I).
DISCUSSION
Periodontitis is a chronic inflammatory disease caused by dental plaque and characterized by destruction of periodontal tissues and alveolar bone resorption[38]. Chen et al[7] demonstrated that within the periodontal inflammatory microenvironment, periodontal ligament stem cells undergo caspase-4/gasdermin D-mediated pyroptosis, releasing substantial amounts of the proinflammatory cytokine IL-1β. This process not only exacerbates local inflammation and tissue damage but also promotes alveolar bone loss by enhancing osteoclastogenesis and inhibiting osteoblast differentiation, thereby disrupting bone homeostasis. This pyroptosis-triggered chronic inflammatory milieu may further influence distal spinal ligament tissues by inducing systemic inflammatory responses, potentially initiating or accelerating the pathological process of OPLL. However, comprehensive investigations identifying the key molecular mediators and shared pathogenic genes bridging these conditions remain limited.
In this study, we integrated transcriptomic datasets for OPLL and periodontitis to systematically identify key hub genes driving common pathological processes. WGCNA was employed to identify critical co-expression modules. Subsequently, robust gene sets with consistent expression across datasets were identified using SMD calculations. To evaluate discriminatory potential and identify core drivers, multiple ML models were constructed and interpreted using SHAP analysis. This approach identified CFI as a key contributing gene. CFI functions as a negative regulator within the complement system, and its expression is modulated by inflammatory stimuli and immune microenvironment changes. It inhibits the complement cascade by degrading complement component 3b, thereby attenuating inflammatory processes[39,40]. Within the immune microenvironment, CFI orchestrates complex regulatory networks, and upregulation of its expression is correlated with the activation of multiple immune cell types[37]. Furthermore, CFI modulates signaling pathways such as Wnt/β-catenin[41] and promotes tissue remodeling by upregulating the expression of MMPs[42]. To further investigate the potential role of CFI in disease pathogenesis, we performed pathway enrichment analysis. The CFI-positive phenotype was significantly associated with pathways related to bone metabolism, inflammatory responses, and cell adhesion, including focal adhesion, the PI3K-Akt signaling pathway, and adherens junctions. The PI3K-Akt signaling pathway not only regulates cell proliferation and survival but also contributes to various types of pathological ossification and heterotopic bone formation[43,44]. In diverse ossification models, including those involving nerve injury and hemorrhage, sustained PI3K-Akt activation promotes osteogenic gene expression and differentiation of progenitor cells, while inhibition of this pathway mitigates heterotopic ossification[44,45].
Notably, Salmonella infection involves the synergistic activation of inflammatory and nuclear factor-kappaB signaling, which extensively interact with PI3K-Akt to together drive osteogenic cell transformation[46,47]. Our findings suggest that CFI functions as a multifunctional molecular node linking periodontitis and OPLL, potentially contributing to both conditions by coordinating immune microenvironment regulation, cellular metabolism, and tissue remodeling. Single-cell transcriptomic analysis further revealed that CFI is specifically overexpressed in endothelial cells in periodontitis tissues. Integrating this observation with pathway enrichment analysis, we hypothesize that CFI modulates immune cell infiltration by regulating endothelial adhesion and signaling, thus shaping the destructive inflammatory microenvironment characteristic of periodontitis. Endothelial cells act as active “immune sentinels” during inflammation, facilitating immune cell recruitment through the expression of adhesion molecules (VCAM-1 and ICAM-1) and chemokines[48]. In periodontitis, endothelial cells have been shown to abnormally interact with macrophages via chemokine signaling axes, such as CXCL12-CXCR4, promoting inflammatory progression[49].
Importantly, complement system regulation has emerged as a potential strategy to modulate endothelial function and improve infection outcomes, supporting a mechanistic link between CFI and local immune regulation via a “complement-endothelial cell” interaction axis[50,51]. Accordingly, we propose that within the periodontal microenvironment, CFI may influence local immune response intensity by altering endothelial function, vascular barrier integrity, and immune cell recruitment. This endothelial cell-mediated pattern of dysregulated immune signaling may also underlie the shared pathological mechanisms of periodontitis and OPLL, explaining why CFI serves as a common molecular hub linking these conditions. This hypothesis provides a novel theoretical framework for understanding the comorbidity mechanism between periodontitis and OPLL. Notably, CFI expression was significantly elevated in both OPLL and periodontitis, suggesting a shared molecular feature indicative of immune homeostasis imbalance. However, these two conditions do not necessarily co-occur clinically, which can be explained by disease-specific upstream regulatory mechanisms. In OPLL, CFI expression may be primarily induced by mechanical stimuli and osteogenic differentiation signals[52-54], whereas in periodontitis, it is more strongly activated by microbe-driven inflammatory pathways. Thus, while CFI acts as a shared molecular risk node, its pathological effects are context-dependent, shaped by tissue-specific signaling and regulatory networks. Elevated CFI expression should therefore be interpreted as a significant risk marker rather than a deterministic cause, with comorbidity outcomes influenced by individual genetic background, microenvironmental conditions, and compensatory mechanisms.
Prospective validation and translational outlook
While our integrated bioinformatics and ML approach robustly identifies CFI as a shared key molecular player, it is crucial to emphasize that these findings are hypothesis-generating. To advance these insights toward clinical application, particularly in the context of OPLL, prospective and orthogonal validation is required. Future studies should prioritize the following directions: (1) Protein-level validation: Quantification of CFI protein expression and activity in serum/plasma and surgically obtained ligament tissues from well-characterized OPLL patients and matched controls using assays such as ELISA or Western blotting. These analyses will determine whether transcript-level upregulation is reflected at the protein level; (2) Orthopedics-specific clinical cohorts: Establishment of large, prospective, multi-center OPLL cohorts with standardized documentation of periodontal status. Such cohorts will enable the assessment of clinical co-occurrence patterns and evaluation of circulating CFI as a serological biomarker for comorbidity risk stratification; and (3) Functional studies in OPLL models: Experimental studies using in vitro systems (e.g., primary posterior longitudinal ligament cells) and in vivo models to investigate the mechanistic role of CFI in OPLL pathogenesis. Such studies should assess whether modulation of CFI influences osteogenic signaling, inflammation, or ossification dynamics. Only upon rigorous validation through these disease-specific experimental and clinical approaches can the true translational potential of CFI as a biomarker or therapeutic target be fully determined.
This study has some limitations. First, the conclusions are constrained by the limited availability of pathological tissues from patients with OPLL and periodontitis. Moreover, the number and relative homogeneity of the included transcriptomic datasets precluded planned subgroup analyses or meta-regression. Future studies involving larger, more diverse cohorts are warranted to validate our results and explore the influence of technical and tissue-specific factors.
CONCLUSION
By integrating transcriptomic, single-cell, and ML analyses, we identified CFI as a key molecular node potentially linking periodontitis and OPLL. CFI expression was significantly upregulated in both conditions and was predominantly enriched in endothelial cells during periodontitis. These findings suggest that CFI contributes to disease progression by modulating cell adhesion, PI3K-Akt signaling, and inflammatory immune responses. Our results highlight the potential role of CFI in immune microenvironment regulation and tissue remodeling, providing new mechanistic insights into the comorbidity between periodontitis and OPLL. Prospective validation, particularly in orthopedics-specific clinical cohorts and at the protein level, is warranted to confirm the biological and translational relevance of our results.
ACKNOWLEDGEMENTS
We appreciate the free provision of public databases.
Li J, Hao Y, Wu L, Liang H, Ni L, Wang F, Wang S, Duan Y, Xu Q, Xiao J, Yang D, Gao G, Ding Y, Gao C, Xiao J, Zhao H. Exploration of common pathogenesis and candidate hub genes between HIV and monkeypox co-infection using bioinformatics and machine learning.Sci Rep. 2024;14:26701.
[RCA] [PubMed] [DOI] [Full Text][Cited by in RCA: 3][Reference Citation Analysis (0)]
Shi Z, Jia L, Wang B, Wang S, He L, Li Y, Wang G, Song W, He X, Liu Z, Shi C, Tian Y, Zhu K. Integration of Single-Cell and Bulk Transcriptomes to Identify a Poor Prognostic Tumor Subgroup to Predict the Prognosis of Patients with Early-stage Lung Adenocarcinoma.J Cancer. 2025;16:1397-1412.
[RCA] [PubMed] [DOI] [Full Text][Cited by in RCA: 2][Reference Citation Analysis (0)]