BPG is committed to discovery and dissemination of knowledge
Retrospective Study Open Access
Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.
World J Stem Cells. Jun 26, 2026; 18(6): 119550
Published online Jun 26, 2026. doi: 10.4252/wjsc.119550
Cancer stem cell-associated markers and their prognostic value in non-small cell lung cancer
Lin-Lin Luo, Si-Cong Jiang, You-Dan Guo, The Second Department of Respiratory Disease, Jiangxi Provincial People’s Hospital (The First Affiliated Hospital of Nanchang Medical College), Nanchang 330006, Jiangxi Province, China
Shuo Li, Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan 250014, Shandong Province, China
Guang-Yi Zhang, Jian-Jun Tang, Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330209, Jiangxi Province, China
Guang-Yi Zhang, Jian-Jun Tang, National Regional Center for Respiratory Medicine, China-Japan Friendship Jiangxi Hospital, Nanchang 330200, Jiangxi Province, China
ORCID number: Lin-Lin Luo (0009-0004-9069-9990); You-Dan Guo (0009-0004-6984-2377).
Co-first authors: Lin-Lin Luo and Si-Cong Jiang.
Author contributions: Luo LL and Jiang SC contributed equally to this work and are co-first authors. Luo LL and Jiang SC conceived and designed the study, collected and analyzed the clinical and pathological data, performed the immunohistochemical experiments, conducted the statistical analyses, and drafted the manuscript; Li S participated in data acquisition, follow-up organization, and interpretation of survival data; Zhang GY and Tang JJ were involved in pathological evaluation, immunohistochemical assessment, and validation of scoring consistency; Guo YD supervised the entire study, provided critical methodological guidance, revised the manuscript for important intellectual content, and approved the final version for publication.
AI contribution statement: All scientific content, data analysis, explanations and conclusions were completed and verified by the author. All authors are fully responsible for the final paper.
Institutional review board statement: This study has been reviewed and approved by the Medical Ethics Committee of Jiangxi Provincial People’s Hospital, Approval No. 2026-033 IIT.
Informed consent statement: Given the retrospective nature of this study and the use of anonymized clinical data, the requirement for written informed consent was waived by the ethics committee.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: No additional data are available.
Corresponding author: You-Dan Guo, MD, The Second Department of Respiratory, Jiangxi Provincial People’s Hospital (The First Affiliated Hospital of Nanchang Medical College), No 152 Aiguo Road, Nanchang 330006, Jiangxi Province, China. 13970945069@163.com
Received: March 25, 2026
Revised: April 24, 2026
Accepted: May 27, 2026
Published online: June 26, 2026
Processing time: 91 Days and 23.4 Hours

Abstract
BACKGROUND

An urgent clinical need exists to stratify postoperative prognosis in patients with non-small cell lung cancer (NSCLC). However, the prognostic value of the cancer stem cell (CSC) markers CD133, aldehyde dehydrogenase 1A1 (ALDH1A1), and SRY-box transcription factor 2 (SOX2) remains incompletely characterized. These markers are better viewed as complementary biomarkers for postoperative risk enrichment than as replacements for tumor-node-metastasis (TNM) staging or molecular classification.

AIM

To investigate expression levels of CD133, ALDH1A1 and SOX2, markers related to CSC, in NSCLC tissues for postoperative survival and prognosis.

METHODS

A total of 200 patients with pathologically confirmed NSCLC who underwent radical resection at Jiangxi Provincial People’s Hospital (The First Affiliated Hospital of Nanchang Medical College) between January 2023 and December 2025 were included retrospectively. Expressions of CD133, ALDH1A1, and SOX2 in tumor tissues was detected by immunohistochemistry, and the tumors were divided into high- and low-expression groups according to the immunoreactive score. All patients were followed up until December 2025 to record their overall survival (OS) and disease-free survival (DFS). The t-test or χ2 test was used for comparison between groups. The Kaplan-Meier method and log-rank test were used for survival analysis. Univariate and multivariate analyses were performed using a Cox proportional risk model, and a prognostic model including CSC markers was constructed to evaluate the C-index and area under the curve of the receiver operating characteristic curve for 2-year OS. The immunoreactive score cut-off selection was based on prior literature and cohort distribution rather than on receiver operating characteristic derivation. DFS was analyzed as a conventional composite endpoint and model discrimination was internally corrected using bootstrap resampling.

RESULTS

Among 200 patients, the high expression rates of CD133, ALDH1A1, and SOX2 were 44.00% (88/200), 53.00% (106/200), and 40.00% (80/200), respectively. The median follow-up period was 23.50 months (interquartile range: 14.20-31.60 months), resulting in 54 deaths (27.00%) and 70 recurrences/metastases (35.00%). After adjusting for age, gender, smoking history, histological type, differentiation degree, TNM stage, and adjuvant therapy, high expression of CD133 [hazard ratio (HR) = 1.450, 95% confidence interval (CI): 1.033-2.037, P = 0.032], high expression of ALDH1A1 (HR = 1.380, 95%CI: 1.001-1.902, P = 0.049), and TNM stage III (HR = 1.980, 95%CI: 1.235-3.173, P = 0.004 compared to stage I) were independent adverse prognostic factors for OS. Patients with CSC score ≥ 2 had significantly shorter OS (P = 0.004), and this association remained significant in the multivariate model (HR = 1.550, 95%CI: 1.078-2.228, P = 0.018). After adding the CSC score, the predicted area under the curve value for 2-year OS increased from 0.675 to 0.785.

CONCLUSION

High expressions of CD133 and ALDH1A1 in NSCLC suggests a worse survival outcome. Nonetheless, the CSC score should be interpreted together with the TNM stage, histological background, and molecular features rather than used in isolation for clinical decision-making.

Key Words: Non-small cell lung cancer; Cancer stem cells; CD133; Aldehyde dehydrogenase 1A1; SRY-box transcription factor 2; Prognosis

Core Tip: Cancer stem cells (CSCs) are key drivers of tumor recurrence, metastasis, and therapeutic resistance in non-small cell lung cancer. This retrospective study systematically evaluated the prognostic significance of the CSC-related markers CD133, aldehyde dehydrogenase 1A1, and SRY-box transcription factor 2 using immunohistochemistry in surgically resected non-small cell lung cancer tissues. We further developed a composite CSC score integrating multiple stemness markers, which significantly improved the prognostic stratification beyond conventional clinicopathological factors. Its potential clinical value lies in complementary postoperative risk enrichment, closer surveillance planning, and future integration with molecular subtyping, rather than the replacement of existing decision frameworks.



INTRODUCTION

Lung cancer has long been one of the leading causes of death due to malignant tumors, and non-small cell lung cancer (NSCLC) accounts for more than 80% of all lung cancers[1]. Despite continuous improvements in imaging screening, surgical resection, and targeted/immunotherapy, a considerable proportion of patients still experience recurrence and metastasis after radical treatment, suggesting a gap between traditional tumor-node-metastasis (TNM) staging and conventional pathological indicators for individual risk assessment[2]. According to the cancer stem cell (CSC) theory, a small number of cell subsets exist with self-renewal and multi-directional differentiation potential inside the tumor, which play a key role in tumor initiation, invasion and metastasis, and therapeutic resistance, and may explain the late recurrence caused by “tiny residual lesions”[3,4]. Currently, the identification of CSC in lung cancer often relies on surface or functional markers and stemness-related transcription factors, including CD133, the aldehyde dehydrogenase 1A1 (ALDH1) family, CD44, epithelial cell adhesion molecule, SRY-box transcription factor 2 (SOX2), octamer binding transcription factor-4, and NANOG[5,6]. Previous studies have suggested that the expression levels of ALDH1A1 and CD133 in early NSCLC and their co-expression were significantly related to postoperative recurrence and survival, and independently indicated a poor prognosis in a multi-factor model[7]. For example, Alamgeer et al[8] performed immunohistochemistry (IHC) in NSCLC and found that high expression levels of both ALDH1A1 and CD133 was related to poor overall survival (OS), and the co-expression group of the two had the worst prognosis. Relevant systematic reviews have also indicated that positive or high expression of CD133 is associated with poor survival and unfavorable clinicopathological features in patients with NSCLC. At the mechanistic level, ALDH1A1 is involved in oxidative stress and metabolic reprogramming; CD133 is related to cell adhesion and signal transduction; and SOX2, an important transcription factor that maintains stemness, can promote tumor progression by regulating the cell cycle, epithelial-mesenchymal transition, and immune escape[9]. However, existing evidence shows great differences in marker thresholds, detection methods, and population structures, and most studies have focused on a single marker. The gain of combined multi-marker stratification (such as the CSC score) still requires further verification using real-world clinical data[10]. Therefore, we retrospectively analyzed the expression of CD133, ALDH1A1, and SOX2 in the tumor tissues of patients with NSCLC; systematically evaluated their relationships with clinicopathological characteristics, postoperative OS, and disease-free survival (DFS); and attempted to construct a CSC score to optimize prognostic stratification. Accordingly, the CSC score is intended to supplement rather than replace the standard clinicopathological assessment and molecular risk frameworks, with potential value in refining surveillance intensity and postoperative management discussions.

MATERIALS AND METHODS
Research subjects

This was a single-center, retrospective study. Patients who underwent radical resection at Jiangxi Provincial People’s Hospital (The First Affiliated Hospital of Nanchang Medical College) between January 2023 and December 2025 and were confirmed by pathology were included. The inclusion criteria were as follows: (1) Pathological diagnosis after surgery for lung adenocarcinoma or lung squamous cell carcinoma; (2) Patients who did not receive neoadjuvant radiotherapy/chemotherapy/immunotherapy before surgery; (3) Complete clinical and pathological data and follow-up information; and (4) Paraffin-embedded tumor tissues that could be used for IHC. The exclusion criteria were as follows: (1) Combination with other primary malignant tumors; (2) Death in the perioperative period; and (3) Missing follow-up or missing key variables. Finally, 200 patients were included in this study. This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Jiangxi Provincial People’s Hospital (The First Affiliated Hospital of Nanchang Medical College). Given the retrospective nature of the study and the use of anonymized clinical and pathological data, the requirement for written informed consent was waived by the Ethics Committee. All patient data were de-identified prior to the analysis, and no information that could potentially identify individual participants was disclosed.

Data collection and variable definition

Data were obtained from electronic medical record systems, surgical anesthesia records, imaging reports, pathological reports, and discharge follow-up databases. Baseline demographic and clinical information were collected, including age, sex, history of smoking (never/previous or current), major comorbidities (hypertension, diabetes, chronic obstructive pulmonary disease, etc.), and perioperative information (date of surgery, method of surgery, postoperative adjuvant therapy, etc.). Tumor-related variables included maximum tumor diameter, histological type (adenocarcinoma/squamous cell carcinoma), degree of differentiation (poorly differentiated/moderately well-differentiated), lymph node metastasis status, resection margin, and TNM staging. TNM staging was determined according to the American Joint Committee on Cancer 8th edition staging system, in combination with postoperative pathological and imaging data. The maximum tumor diameter was measured from the resected specimen in the pathology report. Clinical variables were independently extracted and crosschecked by two researchers to reduce information bias. Records with ambiguity were confirmed after reviewing the original examination list and the original pathology report.

Tissue specimen processing and IHC detection

Paraffin-embedded specimens of surgically removed tumor tissues were obtained from all patients. The sections were routinely fixed in 10% neutral formalin and embedded in paraffin, and the thickness of successive sections was about 4 μm. The routine two-step method (EnVision or equivalent method) was adopted for IHC: The sections were deparaffinized and hydrated, followed by antigen repair (citric acid buffer pH = 6.0 or EDTA buffer pH = 9.0 was selected according to the antibody instruction), endogenous peroxidase was blocked by 3% hydrogen peroxide, and subsequently primary antibodies (CD133, ALDH1A1 and SOX2) were added for incubation. The secondary antibody polymer was then added for DAB coloration, hematoxylin counterstaining, gradient alcohol dehydration, and transparentization, followed by sealing. A positive control (positive tissue) and negative control (phosphate buffered saline instead of primary antibody) were set for each batch of staining, and all specimens were stained as much as possible in the same batch or under the same conditions to reduce the difference between batches. The staining results were independently interpreted by two pathologists blinded to the clinical outcome, if any, through a joint review to reach an agreement.

Construction of immunoreactive score and CSC score

The IHC results were semi-quantitatively assessed using immunoreactive score (IRS) scores. The staining intensity was calculated according to 0-3 points (0 = no staining, 1 = pale yellow, 2 = brownish yellow, 3 = dark brown), and proportion of positive cells was 0-4 points (0 = < 5%, 1 = 5%-25%, 2 = 26%-50%, 3 = 51%-75%, and 4 = > 75%). The IRS is the product of two terms (0-12 points). IRS ≥ 6 was defined as high expression and IRS < 6 as low expression, with reference to previous studies and in combination with this cohort distribution. The IRS ≥ 6 threshold was selected with reference to prior studies and the distribution of this cohort rather than being derived from receiver operating characteristic (ROC) analysis or the maximum Youden index; therefore, its cross-cohort reproducibility should be interpreted with caution. On this basis, CD133, ALDH1A1, and SOX2 were assigned a high expression of 1 point and low expression of 0 points. The CSC score was obtained by adding three points (0-3 points). To facilitate clinical stratification, 0-1 score was defined as the low CSC group and 2-3 scores were defined as the high CSC group.

Follow-up and outcome indicators

The starting point of follow-up was the date of surgery, and the follow-up methods included outpatient follow-up, telephone follow-up, and review of inpatient and outpatient records. Follow-up through December 31, 2025, or if the patient died. Based on actual completion records, follow-up visits are recommended every 3 months for the first 2 years after surgery and every 6 months thereafter. The primary outcome was OS, defined as the time from the day of surgery to death. Those who were still alive at the end of the follow-up visit were truncated at the date of the last follow-up. The secondary outcome was DFS, defined as the time from the surgery date to the first imaging- or pathology-confirmed recurrence/metastasis or death (whichever occurred first); those who did not occur were also censored. DFS was analyzed as a conventional composite postoperative endpoint. Since non-cancer deaths could not be fully adjudicated as competing events in all retrospective records, formal competing risk modeling was not performed, and the DFS results should be interpreted in this context.

Statistical analysis

SPSS, R, or an equivalent statistical software was used for the analysis. If the approximate normal distribution of measurement data was expressed as (mean ± SD), independent sample t test would be used for comparison between groups. Non-normally distributed data were expressed as median (interquartile range), and comparisons between groups were performed using the Mann-Whitney U test. Enumeration data are expressed as n (%). Intergroup comparisons were performed using the χ2 test or Fisher’s exact test. Survival curves were drawn using the Kaplan-Meier method in survival analysis, and the difference between groups was examined using the log-rank test. The Cox proportional hazards model was used to assess the factors influencing OS and DFS. First, single-factor screening was performed (P < 0.10 or those with clear clinical significance were entered into the multi-factor model). Key clinicopathological variables and CSC marker/CSC score were simultaneously included in the multi-factor model. The proportional risk hypothesis was tested using the Schoenfeld residual or log-log survival curves. Model discrimination was assessed using Harrell’s C-index method. The discriminatory power of OS at 2 years was evaluated using the area under the curve (AUC) of the time-dependent ROC, and internal re-sampling correction was conducted using the bootstrap method. To minimize bias in this retrospective study, eligible cases were consecutively included, variables were independently extracted and cross-checked by two researchers, and the main clinicopathological confounders were incorporated into multivariable models. However, no formal propensity score analysis or additional sensitivity analysis was performed. In the bilateral test, the difference was statistically significant (P < 0.05).

RESULTS
General information and expression of CSC markers

A total of 200 patients [63.10 ± 9.02 years, 126 males (63.00%)] with NSCLC were included in the study. Histological types included adenocarcinoma (128, 64.00%) and squamous cell carcinoma (72, 36.00%). 78 cases (39.00%) were in stage I, 64 (32.00%) were in stage II, and 58 (29.00%) were in stage III. The expression rates of CD133, ALDH1A1, and SOX2 were 44.00%, 53.00%, and 40.00%, respectively. For clinical stratification, patients scoring 0-1 points were categorized as the low CSC group (85 cases, 42.50%), while those scoring 2-3 points were classified as the high CSC group (115 cases, 57.50%), as illustrated in Table 1.

Table 1 Comparison of general information in different cancer stem cell groups, n (%)/mean ± SD.
Variables
Low CSC group (n = 85)
High CSC group (n = 115)
Statistical values
P value
Age (years)62.05 ± 8.8564.15 ± 9.10t = 1.6320.104
Gender
Male58 (68.24)68 (59.13)χ2 = 1.7380.187
Female27 (31.76)47 (40.87)
History of smoking
Yes48 (56.47)70 (60.87)χ2 = 0.3910.532
No37 (43.53)45 (39.13)
Maximum tumor diameter (cm)3.15 ± 1.203.75 ± 1.35t = 3.2550.001
Histological type
Adenocarcinoma70 (82.35)58 (50.43)χ2 = 21.611< 0.001
Non-adenocarcinoma15 (17.65)57 (49.57)
Differentiation degree
Poorly24 (28.24)48 (41.74)χ2 = 3.8680.049
Moderately to well61 (71.76)67 (58.26)
Lymph node metastasis
Positive28 (32.94)56 (48.70)χ2 = 4.9800.026
Negative57 (67.06)59 (51.30)
TNM staging
Stage I-II67 (78.82)75 (65.22)χ2 = 4.3940.036
Stage III18 (21.18)40 (34.78)
Postoperative adjuvant therapy
Yes38 (44.71)48 (41.74)χ2 = 0.1760.675
No47 (55.29)67 (58.26)
Relationship between expression of CSC markers and clinical pathological characteristics

Patients with high CD133 and ALDH1A1 expression had a higher proportion of poorly differentiated tumors, more common lymph node metastases, and later TNM staging than those with low expression levels. High expression of SOX2 is also related to lymph node metastasis and advanced stage, but there is a difference in the distribution among different histological types (squamous cell carcinoma is higher), as shown in Table 2. Notably, SOX2 high expression was more frequent in squamous cell carcinoma than in adenocarcinoma, supporting the possibility that its clinicopathological significance is histologically dependent rather than uniformly prognostic across all NSCLC subtypes.

Table 2 Correlation between expression of cancer stem cell markers and clinical pathological indices, n (%).
Indices
Differentiation
CD133
χ2
P value
ALDH1A1
χ2
P value
SOX2
χ2
P value
High expression
Low expression
High expression
Low expression
High expression
Low expression
DifferentiationPoorly40 (55.56)32 (44.44)6.0970.01446 (63.89)26 (36.11)5.3550.02130 (41.67)42 (58.33)0.1300.718
Moderately-well48 (37.50)80 (62.50)60 (46.88)68 (53.12)50 (39.06)78 (60.94)
Lymph node metastasisPositive46 (54.76)38 (45.24)6.8080.00956 (66.67)28 (33.33)10.8590.00140 (47.62)44 (52.38)3.5030.061
Negative42 (36.21)74 (63.79)50 (43.10)66 (56.90)40 (34.48)76 (65.52)
TNM stageStage I-II56 (39.44)86 (60.56)70 (49.30)72 (50.70)54 (38.03)88 (61.97)
Stage III32 (55.17)26 (44.83)4.1380.04236 (62.07)22 (37.93)2.6970.10126 (44.83)32 (55.17)0.7930.373
Histological typeSquamous cell carcinoma34 (47.22)38 (52.78)0.4740.49132 (44.44)40 (55.56)3.3060.06940 (55.56)32 (44.44)11.3430.001
Adenocarcinoma54 (42.19)74 (57.81)74 (57.81)54 (42.19)40 (31.25)88 (68.75)
Survival outcomes and Kaplan-Meier analysis

The median follow-up period for the 200 patients was 23.50 months (interquartile range: 14.20-31.60 months). At the end of the follow-up, 54 deaths (27.00%) and 70 relapses/metastases (35.00%) had occurred. The 1- and 2-year OS rates for the entire cohort were 92.00% and 76.00%, respectively, and the DFS rates were 84.00% and 68.00%, respectively. The 3-year OS and DFS rates at the Kaplan-Meier tail were estimated to be approximately 65.00% and 58.00%, respectively. Kaplan-Meier analysis showed that both OS and DFS were significantly shortened in the CD133 high-expression group (log-rank: P = 0.021 and P = 0.036, respectively), and OS and DFS were worse in the ALDH1A1 high-expression group (P = 0.028 and P = 0.022, respectively). There were no significant differences in OS or DFS between the high SOX2 expression groups (P = 0.118 and P = 0.094, respectively). According to CSC score stratification, patients in the high CSC group (2-3 points) had significantly shorter OS and DFS than those in the low CSC group (0-1 point) (P < 0.05), as shown in Figure 1.

Figure 1
Figure 1 Kaplan-Meier survival curves of different cancer stem cell markers and cancer stem cell score. A: The overall survival (OS) curve stratified by CD133 expression levels; B: The disease free survival (DFS) curve stratified by CD133 expression levels; C: The OS curve stratified by aldehyde dehydrogenase 1A1 expression level; D: The DFS curve stratified by aldehyde dehydrogenase 1A1 expression level; E: The OS curve stratified by SRY-box transcription factor 2 expression level; F: The DFS curve stratified by SRY-box transcription factor 2 expression level; G: The OS curve stratified by cancer stem cell score; H: The DFS curve stratified by cancer stem cell score. ALDH1A1: Aldehyde dehydrogenase 1A1; SOX2: SRY-box transcription factor 2; CSC: Cancer stem cell.
Cox regression analysis affecting the prognosis of patients with NSCLC

Univariate Cox regression analysis showed that lymph node metastasis, TNM stage, poorly differentiation, high CD133 expression, high ALDH1A1 expression, high SOX2 expression, and high CSC score were associated with poor OS (P < 0.05). After the multivariate model including age, sex, smoking history, histological type, degree of differentiation, TNM stage, adjuvant therapy, CSC marker variables, CD133 high expression, ALDH1A1 high expression and stage III remained independent risk factors for OS (Table 3). High ALDH1A1 expression and CSC scores were independent risk factors for DFS (Table 4).

Table 3 Cox regression analysis for influence on overall survival.
Variables
Univariate HR (95%CI)
P value
Multivariate HR (95%CI)
P value
Age (10 years per increase)1.200 (0.950-1.520)0.1201.160 (0.910-1.490)0.220
Male1.120 (0.740-1.700)0.5901.080 (0.700-1.670)0.730
Smoking history1.300 (0.910-1.860)0.1501.180 (0.830-1.690)0.350
Poorly differentiated1.550 (1.060-2.260)0.0241.220 (0.850-1.760)0.280
TNM stage II (vs stage I)1.400 (0.900-2.180)0.1301.220 (0.770-1.930)0.390
TNM stage III (vs stage I)2.400 (1.520-3.780)< 0.0011.980 (1.235-3.173)0.004
Postoperative adjuvant therapy0.850 (0.560-1.300)0.4600.820 (0.530-1.270)0.370
CD133 high expression1.520 (1.070-2.170)0.0181.450 (1.033-2.037)0.032
ALDH1A1 high expression1.440 (1.050-1.980)0.0261.380 (1.001-1.902)0.049
SOX2 high expression1.280 (0.930-1.760)0.1201.150 (0.850-1.560)0.360
CSC score (1 point for each increase)1.360 (1.120-1.650)0.0021.250 (1.050-1.490)0.012
High CSC group (2-3 points)1.780 (1.200-2.640)0.0041.550 (1.078-2.228)0.018
Table 4 Cox regression analysis for influence on disease-free survival.
Variables
Univariate HR (95%CI)
P value
Multivariate HR (95%CI)
P value
TNM stage II (vs stage I)1.320 (0.900-1.930)0.1501.180 (0.830-1.680)0.350
TNM stage III (vs stage I)2.100 (1.450-3.040)< 0.0011.740 (1.210-2.510)0.003
Poorly differentiated1.400 (1.010-1.950)0.0431.150 (0.810-1.630)0.430
CD133 high expression1.280 (0.950-1.730)0.1001.120 (0.840-1.500)0.410
ALDH1A1 high expression1.460 (1.060-2.010)0.0201.350 (1.000-1.820)0.048
SOX2 high expression1.200 (0.920-1.580)0.1901.050 (0.800-1.380)0.780
High CSC group (2-3 points)1.600 (1.150-2.230)0.0061.420 (1.040-1.930)0.028
Construction of risk prediction model and prediction performance analysis

The CSC score was based on the high expression scores of the three markers (0-3 points). The OS of patients with the score ≥ 2 points was significantly shortened (P = 0.004), and the independent correlation was maintained in the multi-factor model [hazard ratio (HR) = 1.550, 95% confidence interval (CI): 1.078-2.228, P = 0.018]. The AUC value of the baseline model C-index, including clinical variables (age, sex, smoking history, histological type, degree of differentiation, TNM stage, and adjuvant therapy), for predicting the 2-year OS was 0.675, and the AUC value of the model increased to 0.785 after adding the CSC score (Figure 2). Therefore, the CSC score provides incremental discriminatory value, but it should not be regarded as an autonomous treatment selection tool. Its more realistic clinical role is to refine postoperative risk enrichment among patients with otherwise similar conventional risk profiles and support closer follow-up planning or multidisciplinary review when needed.

Figure 2
Figure 2 Model prediction performance analysis. AUC: Area under the curve; CSC: Cancer stem cell.
DISCUSSION

In this study, the expression of the CSC-related markers CD133, ALDH1A1, and SOX2 in NSCLC was systematically evaluated. The main findings included the following: (1) High expression of these three markers was associated with more aggressive clinicopathological features, particularly lymph node metastasis, advanced TNM stage, and poor differentiation; (2) After adjusting for major clinical factors, CD133 and ALDH1A1 remained independently associated with worse OS; and (3) The composite CSC score improved the discriminatory performance of the clinical model for 2-year OS. Importantly, the magnitude of the HRs in the present study was moderate rather than extreme. Therefore, the CSC score should not be interpreted as a stand-alone trigger for treatment escalation, but rather as a clinically accessible tool for biological risk enrichment within the same stage-defined postoperative population. Practically, its potential value lies in identifying patients who may warrant closer imaging surveillance, more careful recurrence monitoring, and earlier multidisciplinary discussions, especially when conventional clinicopathological features alone do not fully explain the heterogeneity of recurrence.

CD133 is a membrane protein widely used to identify stem-like tumor cell populations. CD133-positive lung cancer cells exhibit sphere-forming ability, tumorigenicity, and differentiation potential[11,12]. Previous systematic reviews have mainly focused on the association between high CD133 expression and adverse outcomes[13-15]. Consistent with this literature, CD133 high expression in our cohort was associated with shorter OS and DFS and remained independently significant in the multivariable model. These findings support the view that CD133 represents a persistent biological subpopulation associated with postoperative recurrence. However, the moderate effect size observed in our study suggests that CD133 is better incorporated as a component of composite risk assessment than when used alone to guide management decisions. Differences in antibody clones, recognized epitopes, and scoring methods may partly explain the variability across published studies[16,17], underscoring the need for standardized staining and scoring procedures.

As a member of the aldehyde dehydrogenase family, ALDH1A1 exhibits stemness-related metabolic characteristics and is linked to oxidative stress tolerance, drug resistance, and DNA damage repair[18,19]. In our cohort, high ALDH1A1 expression was more frequent in poorly differentiated tumors and in node-positive diseases, providing direct clinicopathological support for its association with a more invasive phenotype. The independent association of ALDH1A1 with both OS and DFS further suggests that this marker may identify a subgroup with enhanced residual disease potential after surgery. Because detailed data on local-vs-distant recurrence patterns were not sufficiently granular in this retrospective dataset, we could not determine whether ALDH1A1 was linked to a specific recurrence route[20-22]. This issue should be addressed in future studies.

SOX2 is an important transcription factor involved in maintaining stemness and lineage specification[23-25]. In the present study, SOX2 was associated with adverse clinicopathological features but did not retain independent prognostic significance after multivariate adjustment. This pattern is more consistent with the histology-dependent biological behavior than with a simple collinearity explanation. In our cohort, SOX2 expression was enriched in squamous cell carcinoma, suggesting a lineage-related role in the selected pathological contexts. Therefore, SOX2 may contribute to aggressive phenotypes in particular molecular or histological backgrounds without functioning as a stable pan-NSCLC predictor of postoperative outcomes.

The phenotype of CSCs is not determined by a single molecule, but by coordinated functional traits such as self-renewal, therapeutic resistance, and immune escape, together with multiple signaling networks[26-28]. This is the rationale for using a composite score rather than relying on any single marker. In our study, the addition of the CSC score increased the AUC of the clinical model from 0.675 to 0.785. This suggests that the CSC score may help distinguish patients with similar TNM stages but different biological recurrence tendencies. In clinical practice, such information may support closer follow-up planning, prioritization of postoperative surveillance, and earlier multidisciplinary review of borderline-risk cases[29-32]. Nevertheless, the score should not replace the guideline-based adjuvant treatment selection and should be regarded as a supplementary layer of risk interpretation. Their independent clinical value ultimately depends on whether they add discriminatory information when interpreted alongside established molecular and immune markers[33-35].

The present study has certain limitations. First, selection bias was unavoidable owing to the single-center retrospective design. Although consecutive case inclusion, double data extraction, multivariate adjustment, and bootstrap internal correction were used to improve robustness, no formal propensity score analysis or additional sensitivity analysis was performed. Second, the IRS ≥ 6 threshold was literature-informed and cohort-adapted rather than ROC-derived, which may limit cross-cohort reproducibility. Third, DFS was defined as recurrence, metastasis, or death as a composite endpoint, and competing risk modelling for non-cancer death was not available in the present dataset. Fourth, detailed recurrence-pattern data and molecular information, such as epidermal growth factor receptor/anaplastic lymphoma kinase status or programmed death-ligand 1 expression, were not fully integrated, limiting both the mechanistic interpretation and embedding of the CSC score into the current molecular decision framework. Fifth, this study lacked an external validation cohort. Therefore, future research should focus on multicenter prospective validation, standardized IHC scoring, and the integration of CSC markers with molecular subtype information, immune biomarkers, and other postoperative residual disease assessment tools.

CONCLUSION

In summary, high expression of CD133 and ALDH1A1 in NSCLC tumor tissue was significantly associated with poor OS and DFS and remained an independent adverse prognostic factor after multivariable adjustment. The CSC score constructed from multiple markers improved prognostic stratification and prediction of 2-year OS beyond conventional clinicopathological factors. Clinically, it may assist in postoperative risk enrichment, follow-up planning, and multidisciplinary interpretation; however, it should ideally be interpreted together with the TNM stage, histological background, and molecular marker information.

ACKNOWLEDGEMENTS

We thank all clinicians and staff who contributed to case screening, data collection, follow-up organization, and immunohistochemical assessment for this study.

References
1.  Guo L, Mohanty A, Singhal S, Srivastava S, Nam A, Warden C, Ramisetty S, Yuan YC, Cho H, Wu X, Li A, Vohra M, Saladi SV, Wheeler D, Arvanitis L, Massarelli E, Kulkarni P, Zeng Y, Salgia R. Targeting ITGB4/SOX2-driven lung cancer stem cells using proteasome inhibitors. iScience. 2023;26:107302.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 22]  [Reference Citation Analysis (0)]
2.  Huang X, Zhang S, Tang J, Tian T, Pan Y, Wu L, Zhang J, Liu Y, Huang J, Dai H, Xu W, Zhang Y, Chen J, Cao M, Zhang L, Qiu X. A Self-Propagating c-Met-SOX2 Axis Drives Cancer-Derived IgG Signaling That Promotes Lung Cancer Cell Stemness. Cancer Res. 2023;83:1866-1882.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 31]  [Reference Citation Analysis (0)]
3.  Li D, Cao Y, Luo CW, Zhang LP, Zou YB. The Clinical Significance and Prognostic Value of ALDH1 Expression in Non-small Cell Lung Cancer: A Systematic Review and Meta-analysis. Recent Pat Anticancer Drug Discov. 2024;19:599-609.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 4]  [Article Influence: 2.0]  [Reference Citation Analysis (0)]
4.  Guan S, Huangfu J, Zhu X, Ge Y, Ding Y, Chen T, Zhang Y, Yang T, Liu H, Zhang L, Chen X, Zhou J. Combined Detection of Tumor Stem Cell Markers CD133 and OCT4 in Early Non-Small Cell Lung Cancer Screening and Prognostic Evaluation. Cancer Manag Res. 2025;17:2077-2087.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
5.  Li W, Zhao J, Lan W, Ye X, Ying K. Depleting CBR1 increases chemosensitivity by reducing stemness and quiescence traits in non-small cell lung cancer. J Zhejiang Univ Sci B. 2025;26:1216-1232.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
6.  Xiao Z, Ding L, Yu Y, Ma C, Lei C, Liu Y, Chang X, Chen Y, He Y, Zhu Y, Zhang H. Tanreqing injection inhibits stemness and enhances sensitivity of non-small cell lung cancer models to gefitinib through ROS/STAT3 signaling pathway. J Cancer. 2024;15:4259-4274.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 6]  [Reference Citation Analysis (0)]
7.  Zhao Z, Feng X, Wu H, Chen S, Ma C, Guan Z, Lei L, Tang K, Chen X, Dong Y, Tang Y. Construction of a lung cancer 3D culture model based on alginate/gelatin micro-beads for drug evaluation. Transl Lung Cancer Res. 2024;13:2698-2712.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 5]  [Reference Citation Analysis (0)]
8.  Alamgeer M, Ganju V, Szczepny A, Russell PA, Prodanovic Z, Kumar B, Wainer Z, Brown T, Schneider-Kolsky M, Conron M, Wright G, Watkins DN. The prognostic significance of aldehyde dehydrogenase 1A1 (ALDH1A1) and CD133 expression in early stage non-small cell lung cancer. Thorax. 2013;68:1095-1104.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 43]  [Cited by in RCA: 56]  [Article Influence: 4.3]  [Reference Citation Analysis (0)]
9.  Grossi A, Fulghieri P, Aduvaliev A, Soffiantini K, Oldrati I, Cavallo M, Biggiogera M, Pellavio G, Laforenza U, Savio M, Sottile V. Differentiation Treatment Applied to Lung Cancer Model Reduces Pathogenic Traits in Vitro. Adv Biol (Weinh). 2026;10:e00371.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
10.  Lei J, Ji H, Guo J, Liu M, Su D, Zheng Y, Xu L, Cao Q, Ren T, Gui J, Wen Z. Cancer Stem Cells Shift Metabolite Acetyl-Coenzyme A to Abrogate the Differentiation of CD103(+) T Cells. Adv Sci (Weinh). 2026;13:e13535.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
11.  Gao L, Xie Z, Lin S, Lv Z, Zhou W, Chen J, Zhu L, Zhang L, Zeng P, Huang X, Yan W, Chen Y, Lu D, Zhang S, Guo W, Li P, Zhang X. [SWI/SNF Complex Gene Mutations Promote the Liver Metastasis of Non-small Cell Lung Cancer Cells in NSI Mice]. Zhongguo Fei Ai Za Zhi. 2023;26:753-764.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
12.  Niharika, Roy A, Sadhukhan R, Patra SK. Screening and identification of gene expression in large cohorts of clinical tissue samples unveils the major involvement of EZH2 and SOX2 in lung cancer. Cancer Genet. 2025;290-291:16-35.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
13.  Huo H, Zhang X, Zhang Q, Lv Z, Xie P, Zhang K, Zhang W, Mao Y. Diagnostic and Predictive Value of CD133-Positive Circulating Tumor Cells as an Indicator of Pathological High-Risk Factors for Stage I Non-Small Cell Lung Cancer. Cancer Med. 2025;14:e71303.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
14.  Wu H, Qi XW, Yan GN, Zhang QB, Xu C, Bian XW. Is CD133 expression a prognostic biomarker of non-small-cell lung cancer? A systematic review and meta-analysis. PLoS One. 2014;9:e100168.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 27]  [Cited by in RCA: 32]  [Article Influence: 2.7]  [Reference Citation Analysis (0)]
15.  Qu H, Li R, Liu Z, Zhang J, Luo R. Prognostic value of cancer stem cell marker CD133 expression in non-small cell lung cancer: a systematic review. Int J Clin Exp Pathol. 2013;6:2644-2650.  [PubMed]  [DOI]
16.  He X, Ma Y, Wen Y, Zhang R, Zhao D, Wang G, Wang W, Huang Z, Guo G, Zhang X, Lin H, Zhang L. Tumor-derived apoptotic extracellular vesicle-mediated intercellular communication promotes metastasis and stemness of lung adenocarcinoma. Bioact Mater. 2024;36:238-255.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 5]  [Cited by in RCA: 12]  [Article Influence: 6.0]  [Reference Citation Analysis (0)]
17.  Bae SH, Lee KY, Han S, Yun CW, Park C, Jang H. SOX2 Expression Does Not Guarantee Cancer Stem Cell-like Characteristics in Lung Adenocarcinoma. Cells. 2024;13:216.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 7]  [Article Influence: 3.5]  [Reference Citation Analysis (0)]
18.  Kang G, Song H, Bo L, Liu Q, Li Q, Li J, Pan P, Wang J, Jia Y, Sun H, Ma X. Nicotine promotes M2 macrophage polarization through α5-nAChR/SOX2/CSF-1 axis in lung adenocarcinoma. Cancer Immunol Immunother. 2024;74:11.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 8]  [Reference Citation Analysis (0)]
19.  Jin W, Sun Y, Wang J, Wang Y, Chen D, Fang M, He J, Zhong L, Ren H, Zhang Y, Yin H, Wu S, Chen R, Yan W. Arsenic trioxide suppresses lung adenocarcinoma stem cell stemness by inhibiting m6A modification to promote ferroptosis. Am J Cancer Res. 2024;14:507-525.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 15]  [Cited by in RCA: 14]  [Article Influence: 7.0]  [Reference Citation Analysis (0)]
20.  Livraghi V, Grossi A, Scopelliti A, Senise G, Gamboa LA, Solito S, Stivala LA, Sottile V, Savio M. Stilbene Treatment Reduces Stemness Features in Human Lung Adenocarcinoma Model. Int J Mol Sci. 2024;25:10390.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 3]  [Reference Citation Analysis (0)]
21.  Wei D, Peng JJ, Gao H, Zhang T, Tan Y, Hu YH. ALDH1 Expression and the Prognosis of Lung Cancer: A Systematic Review and Meta-Analysis. Heart Lung Circ. 2015;24:780-788.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 22]  [Cited by in RCA: 39]  [Article Influence: 3.5]  [Reference Citation Analysis (0)]
22.  Huo W, Du M, Pan X, Zhu X, Li Z. Prognostic value of ALDH1 expression in lung cancer: a meta-analysis. Int J Clin Exp Med. 2015;8:2045-2051.  [PubMed]  [DOI]
23.  Wu JL, Xu CF, Yang XH, Wang MS. Fibronectin promotes tumor progression through integrin αvβ3/PI3K/AKT/SOX2 signaling in non-small cell lung cancer. Heliyon. 2023;9:e20185.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 21]  [Cited by in RCA: 21]  [Article Influence: 7.0]  [Reference Citation Analysis (0)]
24.  Hao L, Chen H, Wang L, Zhou H, Zhang Z, Han J, Hou J, Zhu Y, Zhang H, Wang Q. Transformation or tumor heterogeneity: Mutations in EGFR, SOX2, TP53, and RB1 persist in the histological rapid conversion from lung adenocarcinoma to small-cell lung cancer. Thorac Cancer. 2023;14:1036-1041.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 17]  [Reference Citation Analysis (0)]
25.  Gao H, Li C, Sun J, Deng L, Li J, Wu Z, Chen H. SOX2 transactivates NRF2 to promote carboplatin resistance in lung squamous cell carcinoma. Acta Biochim Biophys Sin (Shanghai). 2025;58:681-690.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
26.  Yu S, Qian L, Xu L, Ma J. Pan-cancer analysis of SOX2: Prognostic implications and potential as a therapeutic target in immune checkpoint modulation. Heliyon. 2025;11:e42200.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2]  [Cited by in RCA: 2]  [Article Influence: 2.0]  [Reference Citation Analysis (0)]
27.  Zang K, Yu ZH, Wang M, Huang Y, Zhu XX, Yao B. SOX2 como posible biomarcador pronóstico y diana molecular en el cáncer de pulmón: metaanálisis. Rev Clin Esp (Barc). 2022;222:584-592.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
28.  Wan X, Ma D, Song G, Tang L, Jiang X, Tian Y, Yi Z, Jiang C, Jin Y, Hu A, Bai Y. The SOX2/PDIA6 axis mediates aerobic glycolysis to promote stemness in non-small cell lung cancer cells. J Bioenerg Biomembr. 2024;56:323-332.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 8]  [Reference Citation Analysis (0)]
29.  Acero-Riaguas L, Griso-Acevedo AB, SanLorenzo-Vaquero A, Ibáñez-Herrera B, Fernandez-Diaz SM, Mascaraque M, Sánchez-Siles R, López-García I, Benítez-Buelga C, Bravo-Burguillos ER, Castelo B, Cebrián-Carretero JL, Perona R, Sastre L, Sastre-Perona A. DUSP1 and SOX2 expression determine squamous cell carcinoma of the salivary gland progression. Sci Rep. 2024;14:15007.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 4]  [Reference Citation Analysis (0)]
30.  Centeno PP, Chester C, Kanellos G, Ford CA, Cammareri P, Inman GJ, Jamieson T, Ridgway RA, Marais R, Campbell AD, Sansom OJ. SOX2 confers tumour permissiveness in a specific skin progenitor population. Nat Commun. 2026;17:304.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2]  [Cited by in RCA: 3]  [Article Influence: 3.0]  [Reference Citation Analysis (0)]
31.  Philipp LM, Hoffmann P, Hattingen L, Modi A, Sebens S. Nestin and SOX2 Maintain self-renewal Abilities of Different Pancreatic Cancer Stem Cell Populations. Stem Cell Rev Rep. 2026;22:620-635.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
32.  Mauro-Lizcano M, Sotgia F, Lisanti MP. SOX2-high cancer cells exhibit an aggressive phenotype, with increases in stemness, proliferation and invasion, as well as higher metabolic activity and ATP production. Aging (Albany NY). 2022;14:9877-9889.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 13]  [Reference Citation Analysis (0)]
33.  Zeng Z, Fu M, Hu Y, Wei Y, Wei X, Luo M. Regulation and signaling pathways in cancer stem cells: implications for targeted therapy for cancer. Mol Cancer. 2023;22:172.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 99]  [Reference Citation Analysis (0)]
34.  Hoseinian SN, Saeedi M, Saravani ME, Zenoozi S, Mehranfar F, Pouyan A. Navigating the Molecular Signaling: Deciphering Cancer Stem Cell Self-Renewal Pathways. Int J Mol Cell Med. 2025;14:735-776.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
35.  Lin T, Jiang SC, He XM, Xu WZ, Jin CJ, Guo YD. Expression of cancer stem cell markers and their prognostic significance in stage IIIA non-small cell lung cancer. World J Stem Cells. 2025;17:106381.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
Footnotes

Peer review: Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Cell and tissue engineering

Country of origin: China

Peer-review report’s classification

Scientific quality: Grade B, Grade C

Novelty: Grade B, Grade B

Creativity or innovation: Grade B, Grade C

Scientific significance: Grade C, Grade C

P-Reviewer: Fernandes MR, PhD, Brazil; Patrice N, PhD, France S-Editor: Wang JJ L-Editor: A P-Editor: Liu JH

Write to the Help Desk