1
|
Barua B, Chyrmang G, Bora K, Saikia MJ. Optimizing colorectal cancer segmentation with MobileViT-UNet and multi-criteria decision analysis. PeerJ Comput Sci 2024; 10:e2633. [PMID: 39896394 PMCID: PMC11784762 DOI: 10.7717/peerj-cs.2633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 12/05/2024] [Indexed: 02/04/2025]
Abstract
Colorectal cancer represents a significant health challenge as one of the deadliest forms of malignancy. Manual examination methods are subjective, leading to inconsistent interpretations among different examiners and compromising reliability. Additionally, process is time-consuming and labor-intensive, necessitating the development of computer-aided diagnostic systems. This study investigates the segmentation of colorectal cancer regions of normal tissue, polyps, high-grade intraepithelial neoplasia, low-grade intraepithelial neoplasia, adenocarcinoma, and serrated Adenoma, using proposed segmentation models: VGG16-UNet, ResNet50-UNet, MobileNet-UNet, and MobileViT-UNet. This is the first study to integrate MobileViT as a UNet encoder. Each model was trained with two distinct loss functions, binary cross-entropy and dice loss, and evaluated using metrics including Dice ratio, Jaccard index, precision, and recall. The MobileViT-UNet+Dice loss emerged as the leading model in colorectal histopathology segmentation, consistently achieving high scores across all evaluation metrics. Specifically, it achieved a Dice ratio of 0.944 ± 0.030 and a Jaccard index of 0.897 ± 0.049, with precision at 0.955 ± 0.046 and Recall at 0.939 ± 0.038 across all classes. To further obtain the best performing model, we employed multi-criteria decision analysis (MCDA) using the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). This analysis revealed that the MobileViT-UNet+Dice model achieved the highest TOPSIS scores of 1, thereby attaining the highest ranking among all models. Our comparative analysis includes benchmarking with existing works, the results highlight that our best-performing model (MobileViT-UNet+Dice) significantly outperforms existing models, showcasing its potential to enhance the accuracy and efficiency of colorectal cancer segmentation.
Collapse
Affiliation(s)
- Barun Barua
- Department of Computer Science and Information Technology, Cotton University, Guwahati, Assam, India
| | - Genevieve Chyrmang
- Department of Computer Science and Information Technology, Cotton University, Guwahati, Assam, India
| | - Kangkana Bora
- Department of Computer Science and Information Technology, Cotton University, Guwahati, Assam, India
| | - Manob Jyoti Saikia
- Electrical and Computer Engineering Department, University of Memphis, Memphis, TN, United States of America
- Biomedical Sensors & Systems Lab, University of Memphis, Memphis, TN, United States of America
| |
Collapse
|
2
|
Azad R, Aghdam EK, Rauland A, Jia Y, Avval AH, Bozorgpour A, Karimijafarbigloo S, Cohen JP, Adeli E, Merhof D. Medical Image Segmentation Review: The Success of U-Net. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:10076-10095. [PMID: 39167505 DOI: 10.1109/tpami.2024.3435571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model has received tremendous attention from academic and industrial researchers who have extended it to address the scale and complexity created by medical tasks. These extensions are commonly related to enhancing the U-Net's backbone, bottleneck, or skip connections, or including representation learning, or combining it with a Transformer architecture, or even addressing probabilistic prediction of the segmentation map. Having a compendium of different previously proposed U-Net variants makes it easier for machine learning researchers to identify relevant research questions and understand the challenges of the biological tasks that challenge the model. In this work, we discuss the practical aspects of the U-Net model and organize each variant model into a taxonomy. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. Furthermore, we provide a comprehensive implementation library with trained models. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation.
Collapse
|
3
|
Dong S, Feng J. SGDBNet: A scene-class guided dual branch network for port UAV images oil spill detection. MARINE POLLUTION BULLETIN 2024; 208:117019. [PMID: 39326329 DOI: 10.1016/j.marpolbul.2024.117019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 09/15/2024] [Accepted: 09/15/2024] [Indexed: 09/28/2024]
Abstract
The unmanned aerial vehicle (UAV) is usually flexible and frequently low-altitude flying without the influence of clouds and severe weather, and it is widely used for port oil spill detection (OSD). However, the background of the port is usually complex, the oil spills in UAV images are usually small and irregular, as well as the oil boundary is fuzzy, which has led to the failure of existing methods in accurately detecting the port oil spill. Here, we propose a scene-class guided dual branch network for port OSD based on UAV images, which can locate the oil spill areas of different sizes and suppress the influence caused by complex backgrounds. Specifically, the dual-branch network consists of semantic segmentation and image classification branches. The image classification branch utilizes the scene-class as the label and further can extract the feature attention, which can guide the semantic segmentation branch to learn the key area features. Second, we propose a multi-scale arbitrary shape convolution module, which can address the challenges caused by fuzzy oil boundaries and irregular small objects. Finally, due to the imbalance between oil spill pixels and other pixels, we design a joint loss to optimize the network. We evaluate our proposed method on a public UAV OSD dataset. The results show that our method is superior to the state-of-the-art method, achieving mIoU of 90.22 %, A of 96.03 %, P of 91.99 %, R of 92.56 %, and F1 of 92.28 %, which represents the feasibility of our method in port OSD and its potential to save a lot of manpower and material resources. The ablation experiment further demonstrates the effectiveness of each designed part.
Collapse
Affiliation(s)
- Shaokang Dong
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Jiangfan Feng
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; Key Laboratory of Tourism Multisource Data Perception and Decision, Ministry of Culture and Tourism (TMDPD, MCT), Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| |
Collapse
|
4
|
Karn PK, Abdulla WH. Precision Segmentation of Subretinal Fluids in OCT Using Multiscale Attention-Based U-Net Architecture. Bioengineering (Basel) 2024; 11:1032. [PMID: 39451407 PMCID: PMC11504175 DOI: 10.3390/bioengineering11101032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 10/01/2024] [Accepted: 10/10/2024] [Indexed: 10/26/2024] Open
Abstract
This paper presents a deep-learning architecture for segmenting retinal fluids in patients with Diabetic Macular Oedema (DME) and Age-related Macular Degeneration (AMD). Accurate segmentation of multiple fluid types is critical for diagnosis and treatment planning, but existing techniques often struggle with precision. We propose an encoder-decoder network inspired by U-Net, processing enhanced OCT images and their edge maps. The encoder incorporates Residual and Inception modules with an autoencoder-based multiscale attention mechanism to extract detailed features. Our method shows superior performance across several datasets. On the RETOUCH dataset, the network achieved F1 Scores of 0.82 for intraretinal fluid (IRF), 0.93 for subretinal fluid (SRF), and 0.94 for pigment epithelial detachment (PED). The model also performed well on the OPTIMA and DUKE datasets, demonstrating high precision, recall, and F1 Scores. This architecture significantly enhances segmentation accuracy and edge precision, offering a valuable tool for diagnosing and managing retinal diseases. Its integration of dual-input processing, multiscale attention, and advanced encoder modules highlights its potential to improve clinical outcomes and advance retinal disease treatment.
Collapse
Affiliation(s)
- Prakash Kumar Karn
- Department of Electrical, Computer, and Software Engineering, The University of Auckland, Auckland 1010, New Zealand
| | - Waleed H. Abdulla
- Department of Electrical, Computer, and Software Engineering, The University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
5
|
Henson WH, Li X, Lin Z, Guo L, Mazzá C, Dall’Ara E. Automatic segmentation of lower limb muscles from MR images of post-menopausal women based on deep learning and data augmentation. PLoS One 2024; 19:e0299099. [PMID: 38564618 PMCID: PMC10986986 DOI: 10.1371/journal.pone.0299099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 02/05/2024] [Indexed: 04/04/2024] Open
Abstract
Individual muscle segmentation is the process of partitioning medical images into regions representing each muscle. It can be used to isolate spatially structured quantitative muscle characteristics, such as volume, geometry, and the level of fat infiltration. These features are pivotal to measuring the state of muscle functional health and in tracking the response of the body to musculoskeletal and neuromusculoskeletal disorders. The gold standard approach to perform muscle segmentation requires manual processing of large numbers of images and is associated with significant operator repeatability issues and high time requirements. Deep learning-based techniques have been recently suggested to be capable of automating the process, which would catalyse research into the effects of musculoskeletal disorders on the muscular system. In this study, three convolutional neural networks were explored in their capacity to automatically segment twenty-three lower limb muscles from the hips, thigh, and calves from magnetic resonance images. The three neural networks (UNet, Attention UNet, and a novel Spatial Channel UNet) were trained independently with augmented images to segment 6 subjects and were able to segment the muscles with an average Relative Volume Error (RVE) between -8.6% and 2.9%, average Dice Similarity Coefficient (DSC) between 0.70 and 0.84, and average Hausdorff Distance (HD) between 12.2 and 46.5 mm, with performance dependent on both the subject and the network used. The trained convolutional neural networks designed, and data used in this study are openly available for use, either through re-training for other medical images, or application to automatically segment new T1-weighted lower limb magnetic resonance images captured with similar acquisition parameters.
Collapse
Affiliation(s)
- William H. Henson
- Department of Mechanical Engineering, The University of Sheffield, Sheffield, United Kingdom
- INSIGNEO Institute for in silico Medicine, The University of Sheffield, Sheffield, United Kingdom
| | - Xinshan Li
- Department of Mechanical Engineering, The University of Sheffield, Sheffield, United Kingdom
- INSIGNEO Institute for in silico Medicine, The University of Sheffield, Sheffield, United Kingdom
| | - Zhicheng Lin
- INSIGNEO Institute for in silico Medicine, The University of Sheffield, Sheffield, United Kingdom
- Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield, United Kingdom
| | - Lingzhong Guo
- INSIGNEO Institute for in silico Medicine, The University of Sheffield, Sheffield, United Kingdom
- Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield, United Kingdom
| | - Claudia Mazzá
- Department of Mechanical Engineering, The University of Sheffield, Sheffield, United Kingdom
- INSIGNEO Institute for in silico Medicine, The University of Sheffield, Sheffield, United Kingdom
| | - Enrico Dall’Ara
- INSIGNEO Institute for in silico Medicine, The University of Sheffield, Sheffield, United Kingdom
- Division of Clinical Medicine, The University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
6
|
Zhao J, Sun L, Sun Z, Zhou X, Si H, Zhang D. MSEF-Net: Multi-scale edge fusion network for lumbosacral plexus segmentation with MR image. Artif Intell Med 2024; 148:102771. [PMID: 38325928 DOI: 10.1016/j.artmed.2024.102771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 12/08/2023] [Accepted: 01/14/2024] [Indexed: 02/09/2024]
Abstract
Nerve damage of spine areas is a common cause of disability and paralysis. The lumbosacral plexus segmentation from magnetic resonance imaging (MRI) scans plays an important role in many computer-aided diagnoses and surgery of spinal nerve lesions. Due to the complex structure and low contrast of the lumbosacral plexus, it is difficult to delineate the regions of edges accurately. To address this issue, we propose a Multi-Scale Edge Fusion Network (MSEF-Net) to fully enhance the edge feature in the encoder and adaptively fuse multi-scale features in the decoder. Specifically, to highlight the edge structure feature, we propose an edge feature fusion module (EFFM) by combining the Sobel operator edge detection and the edge-guided attention module (EAM), respectively. To adaptively fuse the multi-scale feature map in the decoder, we introduce an adaptive multi-scale fusion module (AMSF). Our proposed MSEF-Net method was evaluated on the collected spinal MRI dataset with 89 patients (a total of 2848 MR images). Experimental results demonstrate that our MSEF-Net is effective for lumbosacral plexus segmentation with MR images, when compared with several state-of-the-art segmentation methods.
Collapse
Affiliation(s)
- Junyong Zhao
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, the Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing 211106, China
| | - Liang Sun
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, the Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing 211106, China; Nanjing University of Aeronautics and Astronautics Shenzhen Research Institute, Shenzhen 518063, China.
| | - Zhi Sun
- Department of Medical Imaging, Shandong Provincial Hospital, Jinan 250021, China
| | - Xin Zhou
- Department of Orthopedics, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
| | - Haipeng Si
- Department of Orthopedics, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250012, China.
| | - Daoqiang Zhang
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, the Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing 211106, China; Nanjing University of Aeronautics and Astronautics Shenzhen Research Institute, Shenzhen 518063, China.
| |
Collapse
|
7
|
Das N, Das S. Attention-UNet architectures with pretrained backbones for multi-class cardiac MR image segmentation. Curr Probl Cardiol 2024; 49:102129. [PMID: 37866419 DOI: 10.1016/j.cpcardiol.2023.102129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 10/05/2023] [Accepted: 10/14/2023] [Indexed: 10/24/2023]
Abstract
Segmentation architectures based on deep learning proficient extraordinary results in medical imaging technologies. Computed tomography (CT) images and Magnetic Resonance Imaging (MRI) in diagnosis and treatment are increasing and significantly support the diagnostic process by removing the bottlenecks of manual segmentation. Cardiac Magnetic Resonance Imaging (CMRI) is a state-of-the-art imaging technique used to acquire vital heart measurements and has received extensive attention from researchers for automatic segmentation. Deep learning methods offer high-precision segmentation but still pose several difficulties, such as pixel homogeneity in nearby organs. The motivated study using the attention mechanism approach was introduced for medical images for automated algorithms. The experiment focuses on observing the impact of the attention mechanism with and without pretrained backbone networks on the UNet model. For the same, three networks are considered: Attention-UNet, Attention-UNet with resnet50 pretrained backbone and Attention-UNet with densenet121 pretrained backbone. The experiments are performed on the ACDC Challenge 2017 dataset. The performance is evaluated by conducting a comparative analysis based on the Dice Coefficient, IoU Coefficient, and cross-entropy loss calculations. The Attention-UNet, Attention-UNet with resnet50 pretrained backbone, and Attention-UNet with densenet121 pretrained backbone networks obtained Dice Coefficients of 0.9889, 0.9720, and 0.9801, respectively, along with corresponding IoU scores of 0.9781, 0.9457, and 0.9612. Results compared with the state-of-the-art methods indicate that the methods are on par with, or even superior in terms of both the Dice coefficient and Intersection-over-union.
Collapse
Affiliation(s)
- Niharika Das
- Department of Mathematics & Computer Application, Maulana Azad National Institute of Technology, India.
| | - Sujoy Das
- Department of Mathematics & Computer Application, Maulana Azad National Institute of Technology, India
| |
Collapse
|
8
|
Hettihewa K, Kobchaisawat T, Tanpowpong N, Chalidabhongse TH. MANet: a multi-attention network for automatic liver tumor segmentation in computed tomography (CT) imaging. Sci Rep 2023; 13:20098. [PMID: 37973987 PMCID: PMC10654423 DOI: 10.1038/s41598-023-46580-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 11/02/2023] [Indexed: 11/19/2023] Open
Abstract
Automatic liver tumor segmentation is a paramount important application for liver tumor diagnosis and treatment planning. However, it has become a highly challenging task due to the heterogeneity of the tumor shape and intensity variation. Automatic liver tumor segmentation is capable to establish the diagnostic standard to provide relevant radiological information to all levels of expertise. Recently, deep convolutional neural networks have demonstrated superiority in feature extraction and learning in medical image segmentation. However, multi-layer dense feature stacks make the model quite inconsistent in imitating visual attention and awareness of radiological expertise for tumor recognition and segmentation task. To bridge that visual attention capability, attention mechanisms have developed for better feature selection. In this paper, we propose a novel network named Multi Attention Network (MANet) as a fusion of attention mechanisms to learn highlighting important features while suppressing irrelevant features for the tumor segmentation task. The proposed deep learning network has followed U-Net as the basic architecture. Moreover, residual mechanism is implemented in the encoder. Convolutional block attention module has split into channel attention and spatial attention modules to implement in encoder and decoder of the proposed architecture. The attention mechanism in Attention U-Net is integrated to extract low-level features to combine with high-level ones. The developed deep learning architecture is trained and evaluated on the publicly available MICCAI 2017 Liver Tumor Segmentation dataset and 3DIRCADb dataset under various evaluation metrics. MANet demonstrated promising results compared to state-of-the-art methods with comparatively small parameter overhead.
Collapse
Affiliation(s)
- Kasun Hettihewa
- Perceptual Intelligent Computing Laboratory, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
| | | | - Natthaporn Tanpowpong
- Department of Radiology, Faculty of Medicine, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Thanarat H Chalidabhongse
- Perceptual Intelligent Computing Laboratory, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand.
- Applied Digital Technology in Medicine (ATM) Research Group, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand.
| |
Collapse
|
9
|
AL Qurri A, Almekkawy M. Improved UNet with Attention for Medical Image Segmentation. SENSORS (BASEL, SWITZERLAND) 2023; 23:8589. [PMID: 37896682 PMCID: PMC10611347 DOI: 10.3390/s23208589] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/01/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023]
Abstract
Medical image segmentation is crucial for medical image processing and the development of computer-aided diagnostics. In recent years, deep Convolutional Neural Networks (CNNs) have been widely adopted for medical image segmentation and have achieved significant success. UNet, which is based on CNNs, is the mainstream method used for medical image segmentation. However, its performance suffers owing to its inability to capture long-range dependencies. Transformers were initially designed for Natural Language Processing (NLP), and sequence-to-sequence applications have demonstrated the ability to capture long-range dependencies. However, their abilities to acquire local information are limited. Hybrid architectures of CNNs and Transformer, such as TransUNet, have been proposed to benefit from Transformer's long-range dependencies and CNNs' low-level details. Nevertheless, automatic medical image segmentation remains a challenging task due to factors such as blurred boundaries, the low-contrast tissue environment, and in the context of ultrasound, issues like speckle noise and attenuation. In this paper, we propose a new model that combines the strengths of both CNNs and Transformer, with network architectural improvements designed to enrich the feature representation captured by the skip connections and the decoder. To this end, we devised a new attention module called Three-Level Attention (TLA). This module is composed of an Attention Gate (AG), channel attention, and spatial normalization mechanism. The AG preserves structural information, whereas channel attention helps to model the interdependencies between channels. Spatial normalization employs the spatial coefficient of the Transformer to improve spatial attention akin to TransNorm. To further improve the skip connection and reduce the semantic gap, skip connections between the encoder and decoder were redesigned in a manner similar to that of the UNet++ dense connection. Moreover, deep supervision using a side-output channel was introduced, analogous to BASNet, which was originally used for saliency predictions. Two datasets from different modalities, a CT scan dataset and an ultrasound dataset, were used to evaluate the proposed UNet architecture. The experimental results showed that our model consistently improved the prediction performance of the UNet across different datasets.
Collapse
|
10
|
Das R, Bose S, Chowdhury RS, Maulik U. Dense Dilated Multi-Scale Supervised Attention-Guided Network for histopathology image segmentation. Comput Biol Med 2023; 163:107182. [PMID: 37379615 DOI: 10.1016/j.compbiomed.2023.107182] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/24/2023] [Accepted: 06/13/2023] [Indexed: 06/30/2023]
Abstract
Over the last couple of decades, the introduction and proliferation of whole-slide scanners led to increasing interest in the research of digital pathology. Although manual analysis of histopathological images is still the gold standard, the process is often tedious and time consuming. Furthermore, manual analysis also suffers from intra- and interobserver variability. Separating structures or grading morphological changes can be difficult due to architectural variability of these images. Deep learning techniques have shown great potential in histopathology image segmentation that drastically reduces the time needed for downstream tasks of analysis and providing accurate diagnosis. However, few algorithms have clinical implementations. In this paper, we propose a new deep learning model Dense Dilated Multiscale Supervised Attention-Guided (D2MSA) Network for histopathology image segmentation that makes use of deep supervision coupled with a hierarchical system of novel attention mechanisms. The proposed model surpasses state-of-the-art performance while using similar computational resources. The performance of the model has been evaluated for the tasks of gland segmentation and nuclei instance segmentation, both of which are clinically relevant tasks to assess the state and progress of malignancy. Here, we have used histopathology image datasets for three different types of cancer. We have also performed extensive ablation tests and hyperparameter tuning to ensure the validity and reproducibility of the model performance. The proposed model is available at www.github.com/shirshabose/D2MSA-Net.
Collapse
Affiliation(s)
- Rangan Das
- Department of Computer Science Engineering, Jadavpur University, Kolkata, 700032, West Bengal, India.
| | - Shirsha Bose
- Department of Informatics, Technical University of Munich, Munich, Bavaria 85748, Germany.
| | - Ritesh Sur Chowdhury
- Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata, 700032, West Bengal, India.
| | - Ujjwal Maulik
- Department of Computer Science Engineering, Jadavpur University, Kolkata, 700032, West Bengal, India.
| |
Collapse
|
11
|
Shi H, Chen H, Zhang Y, Wang Z. A lightweight pathological image segmentation framework based on heterogeneous cross-layer sampling. 2023 IEEE SMART WORLD CONGRESS (SWC) 2023:1-8. [DOI: 10.1109/swc57546.2023.10448845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Affiliation(s)
- Hai Shi
- Hainan University,School of Computer Science and Technology,Haikou,China
| | - Hailong Chen
- Hainan University,School of Computer Science and Technology,Haikou,China
| | - Yangyang Zhang
- Hainan University,School of Computer Science and Technology,Haikou,China
| | - Zhengxia Wang
- Hainan University,School of Computer Science and Technology,Haikou,China
| |
Collapse
|
12
|
Islam Sumon R, Bhattacharjee S, Hwang YB, Rahman H, Kim HC, Ryu WS, Kim DM, Cho NH, Choi HK. Densely Convolutional Spatial Attention Network for nuclei segmentation of histological images for computational pathology. Front Oncol 2023; 13:1009681. [PMID: 37305563 PMCID: PMC10248729 DOI: 10.3389/fonc.2023.1009681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 05/05/2023] [Indexed: 06/13/2023] Open
Abstract
Introduction Automatic nuclear segmentation in digital microscopic tissue images can aid pathologists to extract high-quality features for nuclear morphometrics and other analyses. However, image segmentation is a challenging task in medical image processing and analysis. This study aimed to develop a deep learning-based method for nuclei segmentation of histological images for computational pathology. Methods The original U-Net model sometime has a caveat in exploring significant features. Herein, we present the Densely Convolutional Spatial Attention Network (DCSA-Net) model based on U-Net to perform the segmentation task. Furthermore, the developed model was tested on external multi-tissue dataset - MoNuSeg. To develop deep learning algorithms for well-segmenting nuclei, a large quantity of data are mandatory, which is expensive and less feasible. We collected hematoxylin and eosin-stained image data sets from two hospitals to train the model with a variety of nuclear appearances. Because of the limited number of annotated pathology images, we introduced a small publicly accessible data set of prostate cancer (PCa) with more than 16,000 labeled nuclei. Nevertheless, to construct our proposed model, we developed the DCSA module, an attention mechanism for capturing useful information from raw images. We also used several other artificial intelligence-based segmentation methods and tools to compare their results to our proposed technique. Results To prioritize the performance of nuclei segmentation, we evaluated the model's outputs based on the Accuracy, Dice coefficient (DC), and Jaccard coefficient (JC) scores. The proposed technique outperformed the other methods and achieved superior nuclei segmentation with accuracy, DC, and JC of 96.4% (95% confidence interval [CI]: 96.2 - 96.6), 81.8 (95% CI: 80.8 - 83.0), and 69.3 (95% CI: 68.2 - 70.0), respectively, on the internal test data set. Conclusion Our proposed method demonstrates superior performance in segmenting cell nuclei of histological images from internal and external datasets, and outperforms many standard segmentation algorithms used for comparative analysis.
Collapse
Affiliation(s)
- Rashadul Islam Sumon
- Department of Digital Anti-Aging Healthcare, Ubiquitous-Anti-aging-Healthcare Research Center (u-AHRC), Inje University, Gimhae, Republic of Korea
| | - Subrata Bhattacharjee
- Department of Computer Engineering, Ubiquitous-Anti-aging-Healthcare Research Center (u-AHRC), Inje University, Gimhae, Republic of Korea
| | - Yeong-Byn Hwang
- Department of Digital Anti-Aging Healthcare, Ubiquitous-Anti-aging-Healthcare Research Center (u-AHRC), Inje University, Gimhae, Republic of Korea
| | - Hafizur Rahman
- Department of Digital Anti-Aging Healthcare, Ubiquitous-Anti-aging-Healthcare Research Center (u-AHRC), Inje University, Gimhae, Republic of Korea
| | - Hee-Cheol Kim
- Department of Digital Anti-Aging Healthcare, Ubiquitous-Anti-aging-Healthcare Research Center (u-AHRC), Inje University, Gimhae, Republic of Korea
| | - Wi-Sun Ryu
- Artificial Intelligence R&D Center, JLK Inc., Seoul, Republic of Korea
| | - Dong Min Kim
- Artificial Intelligence R&D Center, JLK Inc., Seoul, Republic of Korea
| | - Nam-Hoon Cho
- Department of Pathology, Yonsei University Hospital, Seoul, Republic of Korea
| | - Heung-Kook Choi
- Department of Computer Engineering, Ubiquitous-Anti-aging-Healthcare Research Center (u-AHRC), Inje University, Gimhae, Republic of Korea
- Artificial Intelligence R&D Center, JLK Inc., Seoul, Republic of Korea
| |
Collapse
|
13
|
Ahmed MR, Ashrafi AF, Ahmed RU, Shatabda S, Islam AKMM, Islam S. DoubleU-NetPlus: a novel attention and context-guided dual U-Net with multi-scale residual feature fusion network for semantic segmentation of medical images. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08493-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
14
|
Lee KG, Song SJ, Lee S, Yu HG, Kim DI, Lee KM. A deep learning-based framework for retinal fundus image enhancement. PLoS One 2023; 18:e0282416. [PMID: 36928209 PMCID: PMC10019688 DOI: 10.1371/journal.pone.0282416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 02/14/2023] [Indexed: 03/18/2023] Open
Abstract
PROBLEM Low-quality fundus images with complex degredation can cause costly re-examinations of patients or inaccurate clinical diagnosis. AIM This study aims to create an automatic fundus macular image enhancement framework to improve low-quality fundus images and remove complex image degradation. METHOD We propose a new deep learning-based model that automatically enhances low-quality retinal fundus images that suffer from complex degradation. We collected a dataset, comprising 1068 pairs of high-quality (HQ) and low-quality (LQ) fundus images from the Kangbuk Samsung Hospital's health screening program and ophthalmology department from 2017 to 2019. Then, we used these dataset to develop data augmentation methods to simulate major aspects of retinal image degradation and to propose a customized convolutional neural network (CNN) architecture to enhance LQ images, depending on the nature of the degradation. Peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), r-value (linear index of fuzziness), and proportion of ungradable fundus photographs before and after the enhancement process are calculated to assess the performance of proposed model. A comparative evaluation is conducted on an external database and four different open-source databases. RESULTS The results of the evaluation on the external test dataset showed an significant increase in PSNR and SSIM compared with the original LQ images. Moreover, PSNR and SSIM increased by over 4 dB and 0.04, respectively compared with the previous state-of-the-art methods (P < 0.05). The proportion of ungradable fundus photographs decreased from 42.6% to 26.4% (P = 0.012). CONCLUSION Our enhancement process improves LQ fundus images that suffer from complex degradation significantly. Moreover our customized CNN achieved improved performance over the existing state-of-the-art methods. Overall, our framework can have a clinical impact on reducing re-examinations and improving the accuracy of diagnosis.
Collapse
Affiliation(s)
- Kang Geon Lee
- Department of Electrical and Computer Engineering, ASRI, Seoul National University, Seoul, South Korea
| | - Su Jeong Song
- Department of Ophthalmology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, South Korea
- Biomedical Institute for Convergence (BICS), Sungkyunkwan University, Suwon, South South Korea
| | - Soochahn Lee
- School of Electrical Engineering, Kookmin University, Seoul, South Korea
| | | | | | - Kyoung Mu Lee
- Department of Electrical and Computer Engineering, ASRI, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea
| |
Collapse
|
15
|
Chen Y, Tang Y, Huang J, Xiong S. Multi-scale Triplet Hashing for Medical Image Retrieval. Comput Biol Med 2023; 155:106633. [PMID: 36827786 DOI: 10.1016/j.compbiomed.2023.106633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 01/12/2023] [Accepted: 02/04/2023] [Indexed: 02/10/2023]
Abstract
For medical image retrieval task, deep hashing algorithms are widely applied in large-scale datasets for auxiliary diagnosis due to the retrieval efficiency advantage of hash codes. Most of which focus on features learning, whilst neglecting the discriminate area of medical images and hierarchical similarity for deep features and hash codes. In this paper, we tackle these dilemmas with a new Multi-scale Triplet Hashing (MTH) algorithm, which can leverage multi-scale information, convolutional self-attention and hierarchical similarity to learn effective hash codes simultaneously. The MTH algorithm first designs multi-scale DenseBlock module to learn multi-scale information of medical images. Meanwhile, a convolutional self-attention mechanism is developed to perform information interaction of the channel domain, which can capture the discriminate area of medical images effectively. On top of the two paths, a novel loss function is proposed to not only conserve the category-level information of deep features and the semantic information of hash codes in the learning process, but also capture the hierarchical similarity for deep features and hash codes. Extensive experiments on the Curated X-ray Dataset, Skin Cancer MNIST Dataset and COVID-19 Radiography Dataset illustrate that the MTH algorithm can further enhance the effect of medical retrieval compared to other state-of-the-art medical image retrieval algorithms.
Collapse
Affiliation(s)
- Yaxiong Chen
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China; Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572000, China; Wuhan University of Technology Chongqing Research Institute, Chongqing 401120, China
| | - Yibo Tang
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
| | - Jinghao Huang
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China; Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572000, China
| | - Shengwu Xiong
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China; Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572000, China.
| |
Collapse
|
16
|
Dabass M, Dabass J. An Atrous Convolved Hybrid Seg-Net Model with residual and attention mechanism for gland detection and segmentation in histopathological images. Comput Biol Med 2023; 155:106690. [PMID: 36827788 DOI: 10.1016/j.compbiomed.2023.106690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 02/06/2023] [Accepted: 02/14/2023] [Indexed: 02/21/2023]
Abstract
PURPOSE A clinically compatible computerized segmentation model is presented here that aspires to supply clinical gland informative details by seizing every small and intricate variation in medical images, integrate second opinions, and reduce human errors. APPROACH It comprises of enhanced learning capability that extracts denser multi-scale gland-specific features, recover semantic gap during concatenation, and effectively handle resolution-degradation and vanishing gradient problems. It is having three proposed modules namely Atrous Convolved Residual Learning Module in the encoder as well as decoder, Residual Attention Module in the skip connection paths, and Atrous Convolved Transitional Module as the transitional and output layer. Also, pre-processing techniques like patch-sampling, stain-normalization, augmentation, etc. are employed to develop its generalization capability. To verify its robustness and invigorate network invariance against digital variability, extensive experiments are carried out employing three different public datasets i.e., GlaS (Gland Segmentation Challenge), CRAG (Colorectal Adenocarcinoma Gland) and LC-25000 (Lung Colon-25000) dataset and a private HosC (Hospital Colon) dataset. RESULTS The presented model accomplished combative gland detection outcomes having F1-score (GlaS(Test A(0.957), Test B(0.926)), CRAG(0.935), LC 25000(0.922), HosC(0.963)); and gland segmentation results having Object-Dice Index (GlaS(Test A(0.961), Test B(0.933)), CRAG(0.961), LC-25000(0.940), HosC(0.929)), and Object-Hausdorff Distance (GlaS(Test A(21.77) and Test B(69.74)), CRAG(87.63), LC-25000(95.85), HosC(83.29)). In addition, validation score (GlaS (Test A(0.945), Test B(0.937)), CRAG(0.934), LC-25000(0.911), HosC(0.928)) supplied by the proficient pathologists is integrated for the end segmentation results to corroborate the applicability and appropriateness for assistance at the clinical level applications. CONCLUSION The proposed system will assist pathologists in devising precise diagnoses by offering a referential perspective during morphology assessment of colon histopathology images.
Collapse
Affiliation(s)
- Manju Dabass
- EECE Deptt, The NorthCap University, Gurugram, India.
| | - Jyoti Dabass
- DBT Centre of Excellence Biopharmaceutical Technology, IIT, Delhi, India
| |
Collapse
|
17
|
Boundary Aware U-Net for Medical Image Segmentation. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-022-07431-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
18
|
Dabass M, Vashisth S, Vig R. MTU: A multi-tasking U-net with hybrid convolutional learning and attention modules for cancer classification and gland Segmentation in Colon Histopathological Images. Comput Biol Med 2022; 150:106095. [PMID: 36179516 DOI: 10.1016/j.compbiomed.2022.106095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/31/2022] [Accepted: 09/10/2022] [Indexed: 11/17/2022]
Abstract
A clinically comparable multi-tasking computerized deep U-Net-based model is demonstrated in this paper. It intends to offer clinical gland morphometric information and cancer grade classification to be provided as referential opinions for pathologists in order to abate human errors. It embraces enhanced feature learning capability that aids in extraction of potent multi-scale features; efficacious semantic gap recovery during feature concatenation; and successful interception of resolution-degradation and vanishing gradient problems while performing moderate computations. It is proposed by integrating three unique novel structural components namely Hybrid Convolutional Learning Units in the encoder and decoder, Attention Learning Units in skip connection, and Multi-Scalar Dilated Transitional Unit as the transitional layer in the traditional U-Net architecture. These units are composed of the amalgamated phenomenon of multi-level convolutional learning through conventional, atrous, residual, depth-wise, and point-wise convolutions which are further incorporated with target-specific attention learning and enlarged effectual receptive field size. Also, pre-processing techniques of patch-sampling, augmentation (color and morphological), stain-normalization, etc. are employed to burgeon its generalizability. To build network invariance towards digital variability, exhaustive experiments are conducted using three public datasets (Colorectal Adenocarcinoma Gland (CRAG), Gland Segmentation (GlaS) challenge, and Lung Colon-25000 (LC-25K) dataset)) and then its robustness is verified using an in-house private dataset of Hospital Colon (HosC). For the cancer classification, the proposed model achieved results of Accuracy (CRAG(95%), GlaS(97.5%), LC-25K(99.97%), HosC(99.45%)), Precision (CRAG(0.9678), GlaS(0.9768), LC-25K(1), HosC(1)), F1-score (CRAG(0.968), GlaS(0.977), LC 25K(0.9997), HosC(0.9965)), and Recall (CRAG(0.9677), GlaS(0.9767), LC-25K(0.9994), HosC(0.9931)). For the gland detection and segmentation, the proposed model achieved competitive results of F1-score (CRAG(0.924), GlaS(Test A(0.949), Test B(0.918)), LC-25K(0.916), HosC(0.959)); Object-Dice Index (CRAG(0.959), GlaS(Test A(0.956), Test B(0.909)), LC-25K(0.929), HosC(0.922)), and Object-Hausdorff Distance (CRAG(90.47), GlaS(Test A(23.17), Test B(71.53)), LC-25K(96.28), HosC(85.45)). In addition, the activation mappings for testing the interpretability of the classification decision-making process are reported by utilizing techniques of Local Interpretable Model-Agnostic Explanations, Occlusion Sensitivity, and Gradient-Weighted Class Activation Mappings. This is done to provide further evidence about the model's self-learning capability of the comparable patterns considered relevant by pathologists without any pre-requisite for annotations. These activation mapping visualization outcomes are evaluated by proficient pathologists, and they delivered these images with a class-path validation score of (CRAG(9.31), GlaS(9.25), LC-25K(9.05), and HosC(9.85)). Furthermore, the seg-path validation score of (GlaS (Test A(9.40), Test B(9.25)), CRAG(9.27), LC-25K(9.01), HosC(9.19)) given by multiple pathologists is included for the final segmented outcomes to substantiate the clinical relevance and suitability for facilitation at the clinical level. The proposed model will aid pathologists to formulate an accurate diagnosis by providing a referential opinion during the morphology assessment of histopathology images. It will reduce unintentional human error in cancer diagnosis and consequently will enhance patient survival rate.
Collapse
Affiliation(s)
- Manju Dabass
- EECE Deptt, The NorthCap University, Gurugram, 122017, India.
| | - Sharda Vashisth
- EECE Deptt, The NorthCap University, Gurugram, 122017, India
| | - Rekha Vig
- EECE Deptt, The NorthCap University, Gurugram, 122017, India
| |
Collapse
|
19
|
Chen Y, Zhou T, Chen Y, Feng L, Zheng C, Liu L, Hu L, Pan B. HADCNet: Automatic segmentation of COVID-19 infection based on a hybrid attention dense connected network with dilated convolution. Comput Biol Med 2022; 149:105981. [PMID: 36029749 PMCID: PMC9391231 DOI: 10.1016/j.compbiomed.2022.105981] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 08/03/2022] [Accepted: 08/14/2022] [Indexed: 12/01/2022]
Abstract
the automatic segmentation of lung infections in CT slices provides a rapid and effective strategy for diagnosing, treating, and assessing COVID-19 cases. However, the segmentation of the infected areas presents several difficulties, including high intraclass variability and interclass similarity among infected areas, as well as blurred edges and low contrast. Therefore, we propose HADCNet, a deep learning framework that segments lung infections based on a dual hybrid attention strategy. HADCNet uses an encoder hybrid attention module to integrate feature information at different scales across the peer hierarchy to refine the feature map. Furthermore, a decoder hybrid attention module uses an improved skip connection to embed the semantic information of higher-level features into lower-level features by integrating multi-scale contextual structures and assigning the spatial information of lower-level features to higher-level features, thereby capturing the contextual dependencies of lesion features across levels and refining the semantic structure, which reduces the semantic gap between feature maps at different levels and improves the model segmentation performance. We conducted fivefold cross-validations of our model on four publicly available datasets, with final mean Dice scores of 0.792, 0.796, 0.785, and 0.723. These results show that the proposed model outperforms popular state-of-the-art semantic segmentation methods and indicate its potential use in the diagnosis and treatment of COVID-19.
Collapse
Affiliation(s)
- Ying Chen
- School of Software, Nanchang Hangkong University, Nanchang, 330063, PR China.
| | - Taohui Zhou
- School of Software, Nanchang Hangkong University, Nanchang, 330063, PR China.
| | - Yi Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, 325035, PR China.
| | - Longfeng Feng
- School of Software, Nanchang Hangkong University, Nanchang, 330063, PR China.
| | - Cheng Zheng
- School of Software, Nanchang Hangkong University, Nanchang, 330063, PR China.
| | - Lan Liu
- Department of Radiology, Jiangxi Cancer Hospital, Nanchang, 330029, PR China.
| | - Liping Hu
- Department of Radiology, Jiangxi Cancer Hospital, Nanchang, 330029, PR China.
| | - Bujian Pan
- Department of Hepatobiliary Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, Zhejiang, 325000, PR China.
| |
Collapse
|
20
|
Lei B, Zhang Y, Liu D, Xu Y, Yue G, Cao J, Hu H, Yu S, Yang P, Wang T, Qiu Y, Xiao X, Wang S. Longitudinal study of early mild cognitive impairment via similarity-constrained group learning and self-attention based SBi-LSTM. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
21
|
Mehboob F, Rauf A, Jiang R, Saudagar AKJ, Malik KM, Khan MB, Hasnat MHA, AlTameem A, AlKhathami M. Towards robust diagnosis of COVID-19 using vision self-attention transformer. Sci Rep 2022; 12:8922. [PMID: 35618740 PMCID: PMC9134987 DOI: 10.1038/s41598-022-13039-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 05/16/2022] [Indexed: 01/31/2023] Open
Abstract
The outbreak of COVID-19, since its appearance, has affected about 200 countries and endangered millions of lives. COVID-19 is extremely contagious disease, and it can quickly incapacitate the healthcare systems if infected cases are not handled timely. Several Conventional Neural Networks (CNN) based techniques have been developed to diagnose the COVID-19. These techniques require a large, labelled dataset to train the algorithm fully, but there are not too many labelled datasets. To mitigate this problem and facilitate the diagnosis of COVID-19, we developed a self-attention transformer-based approach having self-attention mechanism using CT slices. The architecture of transformer can exploit the ample unlabelled datasets using pre-training. The paper aims to compare the performances of self-attention transformer-based approach with CNN and Ensemble classifiers for diagnosis of COVID-19 using binary Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection and multi-class Hybrid-learning for UnbiaSed predicTion of COVID-19 (HUST-19) CT scan dataset. To perform this comparison, we have tested Deep learning-based classifiers and ensemble classifiers with proposed approach using CT scan images. Proposed approach is more effective in detection of COVID-19 with an accuracy of 99.7% on multi-class HUST-19, whereas 98% on binary class SARS-CoV-2 dataset. Cross corpus evaluation achieves accuracy of 93% by training the model with Hust19 dataset and testing using Brazilian COVID dataset.
Collapse
Affiliation(s)
| | | | - Richard Jiang
- LIRA Center, Lancaster University, Lancaster, LA1 4YW, UK
| | - Abdul Khader Jilani Saudagar
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia.
| | - Khalid Mahmood Malik
- Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA
| | - Muhammad Badruddin Khan
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| | - Mozaherul Hoque Abdul Hasnat
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| | - Abdullah AlTameem
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| | - Mohammed AlKhathami
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| |
Collapse
|
22
|
Karthik R, Menaka R, M H, Won D. Contour-enhanced attention CNN for CT-based COVID-19 segmentation. PATTERN RECOGNITION 2022; 125:108538. [PMID: 35068591 PMCID: PMC8767763 DOI: 10.1016/j.patcog.2022.108538] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 09/14/2021] [Accepted: 01/14/2022] [Indexed: 05/14/2023]
Abstract
Accurate detection of COVID-19 is one of the challenging research topics in today's healthcare sector to control the coronavirus pandemic. Automatic data-powered insights for COVID-19 localization from medical imaging modality like chest CT scan tremendously augment clinical care assistance. In this research, a Contour-aware Attention Decoder CNN has been proposed to precisely segment COVID-19 infected tissues in a very effective way. It introduces a novel attention scheme to extract boundary, shape cues from CT contours and leverage these features in refining the infected areas. For every decoded pixel, the attention module harvests contextual information in its spatial neighborhood from the contour feature maps. As a result of incorporating such rich structural details into decoding via dense attention, the CNN is able to capture even intricate morphological details. The decoder is also augmented with a Cross Context Attention Fusion Upsampling to robustly reconstruct deep semantic features back to high-resolution segmentation map. It employs a novel pixel-precise attention model that draws relevant encoder features to aid in effective upsampling. The proposed CNN was evaluated on 3D scans from MosMedData and Jun Ma benchmarked datasets. It achieved state-of-the-art performance with a high dice similarity coefficient of 85.43% and a recall of 88.10%.
Collapse
Affiliation(s)
- R Karthik
- Centre for Cyber Physical Systems (CCPS), Vellore Institute of Technology, Chennai, India
| | - R Menaka
- Centre for Cyber Physical Systems (CCPS), Vellore Institute of Technology, Chennai, India
| | - Hariharan M
- School of Computing Sciences and Engineering, Vellore Institute of Technology, Chennai, India
| | - Daehan Won
- System Sciences and Industrial Engineering, Binghamton University, United States
| |
Collapse
|
23
|
MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The advanced development of deep learning methods has recently made significant improvements in medical image segmentation. Encoder–decoder networks, such as U-Net, have addressed some of the challenges in medical image segmentation with an outstanding performance, which has promoted them to be the most dominating deep learning architecture in this domain. Despite their outstanding performance, we argue that they still lack some aspects. First, there is incompatibility in U-Net’s skip connection between the encoder and decoder features due to the semantic gap between low-processed encoder features and highly processed decoder features, which adversely affects the final prediction. Second, it lacks capturing multi-scale context information and ignores the contribution of all semantic information through the segmentation process. Therefore, we propose a model named MDA-Unet, a novel multi-scale deep learning segmentation model. MDA-Unet improves upon U-Net and enhances its performance in segmenting medical images with variability in the shape and size of the region of interest. The model is integrated with a multi-scale spatial attention module, where spatial attention maps are derived from a hybrid hierarchical dilated convolution module that captures multi-scale context information. To ease the training process and reduce the gradient vanishing problem, residual blocks are deployed instead of the basic U-net blocks. Through a channel attention mechanism, the high-level decoder features are used to guide the low-level encoder features to promote the selection of meaningful context information, thus ensuring effective fusion. We evaluated our model on 2 different datasets: a lung dataset of 2628 axial CT images and an echocardiographic dataset of 2000 images, each with its own challenges. Our model has achieved a significant gain in performance with a slight increase in the number of trainable parameters in comparison with the basic U-Net model, providing a dice score of 98.3% on the lung dataset and 96.7% on the echocardiographic dataset, where the basic U-Net has achieved 94.2% on the lung dataset and 93.9% on the echocardiographic dataset.
Collapse
|
24
|
Wu Y, Cheng M, Huang S, Pei Z, Zuo Y, Liu J, Yang K, Zhu Q, Zhang J, Hong H, Zhang D, Huang K, Cheng L, Shao W. Recent Advances of Deep Learning for Computational Histopathology: Principles and Applications. Cancers (Basel) 2022; 14:1199. [PMID: 35267505 PMCID: PMC8909166 DOI: 10.3390/cancers14051199] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 02/16/2022] [Accepted: 02/22/2022] [Indexed: 01/10/2023] Open
Abstract
With the remarkable success of digital histopathology, we have witnessed a rapid expansion of the use of computational methods for the analysis of digital pathology and biopsy image patches. However, the unprecedented scale and heterogeneous patterns of histopathological images have presented critical computational bottlenecks requiring new computational histopathology tools. Recently, deep learning technology has been extremely successful in the field of computer vision, which has also boosted considerable interest in digital pathology applications. Deep learning and its extensions have opened several avenues to tackle many challenging histopathological image analysis problems including color normalization, image segmentation, and the diagnosis/prognosis of human cancers. In this paper, we provide a comprehensive up-to-date review of the deep learning methods for digital H&E-stained pathology image analysis. Specifically, we first describe recent literature that uses deep learning for color normalization, which is one essential research direction for H&E-stained histopathological image analysis. Followed by the discussion of color normalization, we review applications of the deep learning method for various H&E-stained image analysis tasks such as nuclei and tissue segmentation. We also summarize several key clinical studies that use deep learning for the diagnosis and prognosis of human cancers from H&E-stained histopathological images. Finally, online resources and open research problems on pathological image analysis are also provided in this review for the convenience of researchers who are interested in this exciting field.
Collapse
Affiliation(s)
- Yawen Wu
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| | - Michael Cheng
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA; (M.C.); (J.Z.); (K.H.)
- Regenstrief Institute, Indiana University, Indianapolis, IN 46202, USA
| | - Shuo Huang
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| | - Zongxiang Pei
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| | - Yingli Zuo
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| | - Jianxin Liu
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| | - Kai Yang
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| | - Qi Zhu
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| | - Jie Zhang
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA; (M.C.); (J.Z.); (K.H.)
- Regenstrief Institute, Indiana University, Indianapolis, IN 46202, USA
| | - Honghai Hong
- Department of Clinical Laboratory, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou 510006, China;
| | - Daoqiang Zhang
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| | - Kun Huang
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA; (M.C.); (J.Z.); (K.H.)
- Regenstrief Institute, Indiana University, Indianapolis, IN 46202, USA
| | - Liang Cheng
- Departments of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Wei Shao
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (Y.W.); (S.H.); (Z.P.); (Y.Z.); (J.L.); (K.Y.); (Q.Z.); (D.Z.)
| |
Collapse
|
25
|
Arora R, Saini I, Sood N. Multi-label segmentation and detection of COVID-19 abnormalities from chest radiographs using deep learning. OPTIK 2021; 246:167780. [PMID: 34393275 PMCID: PMC8349421 DOI: 10.1016/j.ijleo.2021.167780] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 07/09/2021] [Accepted: 08/03/2021] [Indexed: 06/01/2023]
Abstract
Due to COVID-19, demand for Chest Radiographs (CXRs) have increased exponentially. Therefore, we present a novel fully automatic modified Attention U-Net (CXAU-Net) multi-class segmentation deep model that can detect common findings of COVID-19 in CXR images. The architectural design of this model includes three novelties: first, an Attention U-net model with channel and spatial attention blocks is designed that precisely localize multiple pathologies; second, dilated convolution applied improves the sensitivity of the model to foreground pixels with additional receptive fields valuation, and third a newly proposed hybrid loss function combines both area and size information for optimizing model. The proposed model achieves average accuracy, DSC, and Jaccard index scores of 0.951, 0.993, 0.984, and 0.921, 0.985, 0.973 for image-based and patch-based approaches respectively for multi-class segmentation on Chest X-ray 14 dataset. Also, average DSC and Jaccard index scores of 0.998, 0.989 are achieved for binary-class segmentation on the Japanese Society of Radiological Technology (JSRT) CXR dataset. These results illustrate that the proposed model outperformed the state-of-the-art segmentation methods.
Collapse
Affiliation(s)
- Ruchika Arora
- Department of Electronics and Communication Engineering, Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Jalandhar 144011, India
| | - Indu Saini
- Department of Electronics and Communication Engineering, Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Jalandhar 144011, India
| | - Neetu Sood
- Department of Electronics and Communication Engineering, Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Jalandhar 144011, India
| |
Collapse
|
26
|
Dong S, Hangel G, Bogner W, Trattnig S, Rossler K, Widhalm G, De Feyter HM, De Graaf RA, Duncan JS. High-Resolution Magnetic Resonance Spectroscopic Imaging using a Multi-Encoder Attention U-Net with Structural and Adversarial Loss. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:2891-2895. [PMID: 34891851 DOI: 10.1109/embc46164.2021.9630146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Common to most medical imaging techniques, the spatial resolution of Magnetic Resonance Spectroscopic Imaging (MRSI) is ultimately limited by the achievable SNR. This work presents a deep learning method for 1H-MRSI spatial resolution enhancement, based on the observation that multi-parametric MRI images provide relevant spatial priors for MRSI enhancement. A Multi-encoder Attention U-Net (MAU-Net) architecture was constructed to process a MRSI metabolic map and three different MRI modalities through separate encoding paths. Spatial attention modules were incorporated to automatically learn spatial weights that highlight salient features for each MRI modality. MAU-Net was trained based on in vivo brain imaging data from patients with high-grade gliomas, using a combined loss function consisting of pixel, structural and adversarial loss. Experimental results showed that the proposed method is able to reconstruct high-quality metabolic maps with a high-resolution of 64×64 from a low-resolution of 16 × 16, with better performance compared to several baseline methods.
Collapse
|
27
|
Canayaz M. C+EffxNet: A novel hybrid approach for COVID-19 diagnosis on CT images based on CBAM and EfficientNet. CHAOS, SOLITONS, AND FRACTALS 2021; 151:111310. [PMID: 34376926 PMCID: PMC8339545 DOI: 10.1016/j.chaos.2021.111310] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 07/14/2021] [Accepted: 07/28/2021] [Indexed: 05/03/2023]
Abstract
COVID-19, one of the biggest diseases of our age, continues to spread rapidly around the world. Studies continue rapidly for the diagnosis and treatment of this disease. It is of great importance that individuals who are infected with this virus be isolated from the rest of the society so that the disease does not spread further. In addition to the tests performed in the detection process of the patients, X-ray and computed tomography are also used. In this study, a new hybrid model that can diagnose COVID-19 from computed tomography images created using EfficientNet, one of the current deep learning models, with a model consisting of attention blocks is proposed. In the first step of this new model, channel attention, spatial attention, and residual blocks are used to extract the most important features from the images. The extracted features are combined in accordance with the hyper-column technique. The combined features are given as input to the EfficientNet models in the second step of the model. The deep features obtained from this proposed hybrid model were classified with the Support Vector Machine classifier after feature selection. Principal Components Analysis was used for feature selection. The approach can accurately predict COVID-19 with a 99% accuracy rate. The first four versions of EfficientNet are used in the approach. In addition, Bayesian optimization was used in the hyper parameter estimation of the Support Vector Machine classifier. Comparative performance analysis of the approach with other approaches in the field is given.
Collapse
Affiliation(s)
- Murat Canayaz
- Department of Computer Engineering,Van Yuzuncu Yil University,65100,Van,Turkey
| |
Collapse
|
28
|
Thiam P, Hihn H, Braun DA, Kestler HA, Schwenker F. Multi-Modal Pain Intensity Assessment Based on Physiological Signals: A Deep Learning Perspective. Front Physiol 2021; 12:720464. [PMID: 34539444 PMCID: PMC8440852 DOI: 10.3389/fphys.2021.720464] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 07/30/2021] [Indexed: 11/13/2022] Open
Abstract
Traditional pain assessment approaches ranging from self-reporting methods, to observational scales, rely on the ability of an individual to accurately assess and successfully report observed or experienced pain episodes. Automatic pain assessment tools are therefore more than desirable in cases where this specific ability is negatively affected by various psycho-physiological dispositions, as well as distinct physical traits such as in the case of professional athletes, who usually have a higher pain tolerance as regular individuals. Hence, several approaches have been proposed during the past decades for the implementation of an autonomous and effective pain assessment system. These approaches range from more conventional supervised and semi-supervised learning techniques applied on a set of carefully hand-designed feature representations, to deep neural networks applied on preprocessed signals. Some of the most prominent advantages of deep neural networks are the ability to automatically learn relevant features, as well as the inherent adaptability of trained deep neural networks to related inference tasks. Yet, some significant drawbacks such as requiring large amounts of data to train deep models and over-fitting remain. Both of these problems are especially relevant in pain intensity assessment, where labeled data is scarce and generalization is of utmost importance. In the following work we address these shortcomings by introducing several novel multi-modal deep learning approaches (characterized by specific supervised, as well as self-supervised learning techniques) for the assessment of pain intensity based on measurable bio-physiological data. While the proposed supervised deep learning approach is able to attain state-of-the-art inference performances, our self-supervised approach is able to significantly improve the data efficiency of the proposed architecture by automatically generating physiological data and simultaneously performing a fine-tuning of the architecture, which has been previously trained on a significantly smaller amount of data.
Collapse
Affiliation(s)
- Patrick Thiam
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany.,Institute of Neural Information Processing, Ulm University, Ulm, Germany
| | - Heinke Hihn
- Institute of Neural Information Processing, Ulm University, Ulm, Germany
| | - Daniel A Braun
- Institute of Neural Information Processing, Ulm University, Ulm, Germany
| | - Hans A Kestler
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | | |
Collapse
|
29
|
Kobayashi S, Saltz JH, Yang VW. State of machine and deep learning in histopathological applications in digestive diseases. World J Gastroenterol 2021; 27:2545-2575. [PMID: 34092975 PMCID: PMC8160628 DOI: 10.3748/wjg.v27.i20.2545] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/27/2021] [Accepted: 04/29/2021] [Indexed: 02/06/2023] Open
Abstract
Machine learning (ML)- and deep learning (DL)-based imaging modalities have exhibited the capacity to handle extremely high dimensional data for a number of computer vision tasks. While these approaches have been applied to numerous data types, this capacity can be especially leveraged by application on histopathological images, which capture cellular and structural features with their high-resolution, microscopic perspectives. Already, these methodologies have demonstrated promising performance in a variety of applications like disease classification, cancer grading, structure and cellular localizations, and prognostic predictions. A wide range of pathologies requiring histopathological evaluation exist in gastroenterology and hepatology, indicating these as disciplines highly targetable for integration of these technologies. Gastroenterologists have also already been primed to consider the impact of these algorithms, as development of real-time endoscopic video analysis software has been an active and popular field of research. This heightened clinical awareness will likely be important for future integration of these methods and to drive interdisciplinary collaborations on emerging studies. To provide an overview on the application of these methodologies for gastrointestinal and hepatological histopathological slides, this review will discuss general ML and DL concepts, introduce recent and emerging literature using these methods, and cover challenges moving forward to further advance the field.
Collapse
Affiliation(s)
- Soma Kobayashi
- Department of Biomedical Informatics, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY 11794, United States
| | - Joel H Saltz
- Department of Biomedical Informatics, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY 11794, United States
| | - Vincent W Yang
- Department of Medicine, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY 11794, United States
- Department of Physiology and Biophysics, Renaissance School of Medicine, Stony Brook University, Stony Brook , NY 11794, United States
| |
Collapse
|
30
|
Dabass M, Vashisth S, Vig R. Attention-Guided deep atrous-residual U-Net architecture for automated gland segmentation in colon histopathology images. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100784] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
31
|
Seo M, Kim M. Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5559. [PMID: 32998382 PMCID: PMC7583996 DOI: 10.3390/s20195559] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/25/2020] [Accepted: 09/26/2020] [Indexed: 11/16/2022]
Abstract
Speech emotion recognition (SER) classifies emotions using low-level features or a spectrogram of an utterance. When SER methods are trained and tested using different datasets, they have shown performance reduction. Cross-corpus SER research identifies speech emotion using different corpora and languages. Recent cross-corpus SER research has been conducted to improve generalization. To improve the cross-corpus SER performance, we pretrained the log-mel spectrograms of the source dataset using our designed visual attention convolutional neural network (VACNN), which has a 2D CNN base model with channel- and spatial-wise visual attention modules. To train the target dataset, we extracted the feature vector using a bag of visual words (BOVW) to assist the fine-tuned model. Because visual words represent local features in the image, the BOVW helps VACNN to learn global and local features in the log-mel spectrogram by constructing a frequency histogram of visual words. The proposed method shows an overall accuracy of 83.33%, 86.92%, and 75.00% in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Berlin Database of Emotional Speech (EmoDB), and Surrey Audio-Visual Expressed Emotion (SAVEE), respectively. Experimental results on RAVDESS, EmoDB, SAVEE demonstrate improvements of 7.73%, 15.12%, and 2.34% compared to existing state-of-the-art cross-corpus SER approaches.
Collapse
Affiliation(s)
| | - Myungho Kim
- Department of Software Convergence, Soongsil University, 369, Sangdo-ro, Dongjak-gu, Seoul 06978, Korea;
| |
Collapse
|