INTRODUCTION
There has been tremendous growth in the use of artificial intelligence as a clinical and translational modality in medicine[1]. Its advantageous profile allows artificial intelligence to amplify the identification of trends, patterns, and relationships within data in conjunction with traditional statistical analytics[2]. In nephrology, artificial intelligence provides clinicians with a powerful tool to aid their clinical decision-making across a breadth of pathology, including in hemodialysis management and transplantation medicine[3-7]. This contributes towards the direction towards more precision-based healthcare for this population of patients given the need for more personalized patient care in the management of renal function and the public health burden of kidney disease. However, as this direction continues, there remains a need to evaluate modalities for creating appropriate research investigations in nephrology and hypertension research. The aim of this article will be to describe an emerging research tool in synthetic healthcare data for improving precision-based healthcare as well as develop a framework for its application in nephrology and hypertension research.
SNAPSHOT OF HEALTHCARE DATASETS AND SYNTHETIC DATA
Predictive modeling of renal disease has historically involved data from clinician experience and statistical analysis[8-10]. While these statistical models have been the foundation for numerous guideline-based interventions and management strategies, there remains the presence of obstacles. For example, the presence of nonresponses in data collection responses, or the presence of nonadherence to assigned treatments in clinical trials[11,12]. This leads to adjustments in statistical analyses to account for such obstacles while limiting the capability of a study’s conclusions. However, as global integration grows, clinician and researcher collaboration by using large-scale datasets or “Big Data” assist in reducing these obstacles to improve epidemiological surveillance and predictive analytics as well as genomic and translational research[13].
However, the development and maintenance of these large-scale datasets can require a degree of financial and time investment[14,15]. Let alone, the presence of these resources for clinical investigation in nephrology and hypertension lags behind other clinical interests[16,17]. Similarly, in designing these statistical models in nephrology, a common issue encountered in the data itself is the availability of data which protects patient privacy and consistent ability to keep data de-identified for an individual. This can create a degree of restriction for users to share data and collaborate with outside parties, and therefore increase the timeline towards potential research investigation breakthroughs.
In designing these statistical models in nephrology, a common issue encountered in the data itself is the availability of data which protects patient privacy and consistent ability to keep data de-identified for an individual. This can create a degree of restriction for users to share data and collaborate with outside parties, and therefore increase the timeline towards potential research investigation breakthroughs.
One potential tool to overcoming such obstacles which has been growing in clinical evidence is the use of synthetic data in research investigation. Briefly, this type of data is developed using statistical algorithmic modeling using real-world healthcare data[18,19]. Moreover, this real data is used to train artificial intelligence and deep learning to generate a new dataset (e.g., synthetic data) which aims to preserve patterns and structures found within the original dataset[18-20]. In essence, synthetic data provides the ability to move beyond the patient deidentification process as there no longer remains the direct correspondence with a patient’s data and protects privacy. Likewise, since the barrier of potential patient reidentification is somewhat alleviated in the synthetic dataset, this characteristic creates more ability for cross-collaboration for clinicians and researchers with outside parties to improve research productivity.
DISCUSSION
The application of synthetic data in nephrology and hypertension research may present an advantageous profile for clinicians and researchers to consider. However, the current body of literature which applies synthetic data in nephrology and hypertension is limited compared to other internal medicine areas. Given its emergence, this paper suggests creating a framework across several research interests.
Renal cell carcinoma
There are over 400000 newly diagnosed cases of renal cell carcinoma (RCC) annually[21]. Moreover, a bibliometric analysis of RCC has suggested that some of the most in-demand topics within RCC include drug-related clinical trials and immunotherapy[22]. Given the general tendencies of clinical trials to be exposed to the presence of nonadherence to assigned treatments, there is potential for the generation of synthetic data to help further support, or at least compare to, the findings of clinical trials. Moreover, in a study by Sabharwal 2023[23], the development of a synthetic image generation tool which was trained using surgical resection pathological slides which can aid in the detection of RCC, and this adds to the current literature on artificial intelligence in renal histopathology[24,25]. Given this current evidence, future studies could consider designing models which can compare synthetic histological data of RCC to clinical data to clinical trial data to further justify its utility.
Chronic kidney disease
The presence of chronic kidney disease affects approximately 1 in every 7 individuals in the United States[26]. Moreover, a bibliometric analysis of chronic kidney disease from 2011 to 2020 suggests that modifiable risk factors including diet management and obesity have been areas of clinical investigation[27]. Given the characteristics of these clinical studies to use electronic health records data, there is heightened awareness for the need to protect patient privacy and compliant de-identification. This provides an opportunity to use synthetic data to be used as a research tool. Moreover, in a recent study that evaluated the performance of synthetically generated data using multiple supervised machine learning algorithms compared to real patient data suggested impressive accuracy for a model[28]. Given this current evidence, the framework for future clinical studies ought to consider utilizing synthetically generated data to evaluate currently established trends in the literature. Moreover, this is imperative given the leveraging of nephology research using synthetically generated datasets is continuing to emerge across related concepts including in dialysis and kidney transplantation[29,30].
Blood pressure and hypertension
The presence of elevated blood pressure and hypertension is well established in clinical literature, affecting over 1 billion individuals worldwide[31]. Bibliometric analytics show over a 40% increase in published research articles related to hypertension in the previous 2 decades[32]. This tremendous body of research has been heavily contributed by individuals in the United States alone compared to other countries. Additionally, given the various epidemiological contributors of hypertension, which can vary across countries (i.e., modifiable risk factors, socioeconomics, etc.), the use of a large-scale dataset from one country may not be as clinically applicable to other country populations[33,34]. Let alone, the current evidence of synthetic data suggests it has equally accurate capability in blood pressure monitoring and prediction but requires further evaluation before greater clinical correlation or ability to apply across multiple study populations[33-35]. However, the use of synthetic data can assist in further refining and identifying these contributors across other countries where modifiable risk factors and socioeconomics vary comparably (i.e., developing vs. developed countries). This can be achieved by synthetic data due to the ability to generate artificial intelligence to create large-scale datasets that may not be as freely accessible across countries. Finally, the use of synthetic data is primarily a research tool at this time, as utilizing these datasets to guide clinical decision-making would be premature without effective clinician involvement.
CONCLUSION
The application of artificial intelligence, machine learning, and deep learning continues to emerge in nephrology and hypertension research. Synthetic data can serve as an appropriate research tool to further enrich this body of literature from histopathology to population health. Specifically, the framework for research investigation has focused on renal oncology, chronic kidney disease, and blood pressure and hypertension. However, there are other avenues for strong implementation of synthetic data in nephrology and hypertension research, which were not discussed due to a relative paucity of literature on synthetic data applications such as autoimmune kidney diseases, dialysis, or renal-associated syndromes. This may create ample opportunity to accelerate research development.
Provenance and peer review: Unsolicited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Medical laboratory technology
Country of origin: United States
Peer-review report’s classification
Scientific Quality: Grade B, Grade C
Novelty: Grade B, Grade B
Creativity or Innovation: Grade B, Grade B
Scientific Significance: Grade B, Grade B
P-Reviewer: Watanabe T S-Editor: Qu XL L-Editor: A P-Editor: Zheng XM