Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common Objects in Context. Computer Vision – ECCV 2014 2014. [DOI: 10.1007/978-3-319-10602-1_48] [Citation(s) in RCA: 5986] [Impact Index Per Article: 544.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

For:	Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common Objects in Context. Computer Vision – ECCV 2014 2014. [DOI: 10.1007/978-3-319-10602-1_48] [Citation(s) in RCA: 5986] [Impact Index Per Article: 544.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Number

Cited by Other Article(s)

Zhang H, Hu H, Zhou D, Zhang X, Cao B. Compact CNN module balancing between feature diversity and redundancy. Neural Netw 2025;188:107456. [PMID: 40220561 DOI: 10.1016/j.neunet.2025.107456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 03/29/2025] [Accepted: 03/31/2025] [Indexed: 04/14/2025]

Ahsan MJ, Abdel-Aty M, Abdelrahman AS. Can mid-block pedestrian signals (MPS) provide greater safety benefits than other mid-block pedestrian crossings? ACCIDENT; ANALYSIS AND PREVENTION 2025;218:108105. [PMID: 40373590 DOI: 10.1016/j.aap.2025.108105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2025] [Revised: 05/06/2025] [Accepted: 05/07/2025] [Indexed: 05/17/2025]

Abstract

The Florida Department of Transportation (FDOT) has recently implemented a new midblock signal system known as the Midblock Pedestrian Signal (MPS) to enhance pedestrian safety. This study evaluates the effectiveness of MPSs by comparing their safety performance with other existing midblock crossing treatments. Portable CCTV video data were collected from 14 MPS equipped locations, in addition to five reference sites, to calculate Conflict Modification Factors (CoMFs) using vehicle-pedestrian conflict data. Advanced computer vision techniques, specifically the RT-DETR model for object detection and the ByteTrack algorithm for tracking were utilized to process the video data. The study employed both Cross-Sectional (CS) and Before-After methods, incorporating the Comparison Group (CG) and Empirical Bayesian (EB) approaches to evaluate the safety impacts of MPSs. To address repeated observations at the same locations and minimize bias, Safety Performance Functions (SPFs) were developed using Generalized Estimating Equations (GEE) with a Negative Binomial distribution, which proved more robust than traditional Generalized Linear Models (GLMs). The results demonstrate that MPS systems outperform Rectangular Rapid Flashing Beacons (RRFBs) and Flashing Beacons in reducing pedestrian-vehicle conflicts. Furthermore, when compared to Pedestrian Hybrid Beacons (PHBs) which share similar functionalities but differ in signal phase management-MPS systems provided additional safety benefits. Compared to PHBs, MPS systems reduced serious and all conflicts by between 26-33% and 31-33%, respectively, using the EB, CG and the CS methods. These reductions highlight the superior safety performance of MPS systems compared to PHBs and other midblock crossing treatments. With their adaptability, cost-effectiveness, and enhanced safety benefits, MPS systems are a promising alternative for upgrading existing pedestrian crossings or installing new signal systems to improve pedestrian safety at midblock locations.

Collapse

Pawar P, McManus B, Anthony T, Yang J, Kerwin T, Stavrinos D. Artificial intelligence automated solution for hazard annotation and eye tracking in a simulated environment. ACCIDENT; ANALYSIS AND PREVENTION 2025;218:108075. [PMID: 40339543 PMCID: PMC12123859 DOI: 10.1016/j.aap.2025.108075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Accepted: 04/27/2025] [Indexed: 05/10/2025]

Chen H, Wang Z, Tao R, Wei H, Xie X, Sugiyama M, Raj B, Wang J. Impact of Noisy Supervision in Foundation Model Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025;47:5690-5707. [PMID: 40117144 DOI: 10.1109/tpami.2025.3552309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2025]

Abstract

Foundation models are usually pre-trained on large-scale datasets and then adapted to different downstream tasks through tuning. This pre-training and then fine-tuning paradigm has become a standard practice in deep learning. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the generalization of the model and pose unexpected risks. This paper stands out as the first work to comprehensively understand and analyze the nature of noise in pre-training datasets and then effectively mitigate its impacts on downstream tasks. Specifically, through extensive experiments of fully-supervised and image-text contrastive pre-training on synthetic noisy ImageNet-1 K, YFCC15 M, and CC12 M datasets, we demonstrate that, while slight noise in pre-training can benefit in-domain (ID) performance, where the training and testing data share a similar distribution, it always deteriorates out-of-domain (OOD) performance, where training and testing distributions are significantly different. These observations are agnostic to scales of pre-training datasets, pre-training noise types, model architectures, pre-training objectives, downstream tuning methods, and downstream applications. We empirically ascertain that the reason behind this is that the pre-training noise shapes the feature space differently. We then propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization, which is applicable in both parameter-efficient and black-box tuning manners, considering one may not be able to access or fully fine-tune the pre-trained models. We additionally conduct extensive experiments on popular vision and language models, including APIs, which are supervised and self-supervised pre-trained on realistic noisy data for evaluation. Our analysis and results demonstrate the importance of this novel and fundamental research direction, which we term as Noisy Model Transfer Learning.

Collapse

Zhou W, Lin K, Zheng Z, Chen D, Su T, Hu H. DRTN: Dual Relation Transformer Network with feature erasure and contrastive learning for multi-label image classification. Neural Netw 2025;187:107309. [PMID: 40048756 DOI: 10.1016/j.neunet.2025.107309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 12/14/2024] [Accepted: 02/21/2025] [Indexed: 04/29/2025]

Abstract

The objective of multi-label image classification (MLIC) task is to simultaneously identify multiple objects present in an image. Several researchers directly flatten 2D feature maps into 1D grid feature sequences, and utilize Transformer encoder to capture the correlations of grid features to learn object relationships. Although obtaining promising results, these Transformer-based methods lose spatial information. In addition, current attention-based models often focus only on salient feature regions, but ignore other potential useful features that contribute to MLIC task. To tackle these problems, we present a novel Dual Relation Transformer Network (DRTN) for MLIC task, which can be trained in an end-to-end manner. Concretely, to compensate for the loss of spatial information of grid features resulting from the flattening operation, we adopt a grid aggregation scheme to generate pseudo-region features, which does not need to make additional expensive annotations to train object detector. Then, a new dual relation enhancement (DRE) module is proposed to capture correlations between objects using two different visual features, thereby complementing the advantages provided by both grid and pseudo-region features. After that, we design a new feature enhancement and erasure (FEE) module to learn discriminative features and mine additional potential valuable features. By using attention mechanism to discover the most salient feature regions and removing them with region-level erasure strategy, our FEE module is able to mine other potential useful features from the remaining parts. Further, we devise a novel contrastive learning (CL) module to encourage the foregrounds of salient and potential features to be closer, while pushing their foregrounds further away from background features. This manner compels our model to learn discriminative and valuable features more comprehensively. Extensive experiments demonstrate that DRTN method surpasses current MLIC models on three challenging benchmarks, i.e., MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE datasets.

Collapse

Wu X, Wang L, Huang J. AnimalRTPose: Faster cross-species real-time animal pose estimation. Neural Netw 2025;190:107685. [PMID: 40516380 DOI: 10.1016/j.neunet.2025.107685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 05/15/2025] [Accepted: 05/26/2025] [Indexed: 06/16/2025]

Abstract

Recent advancements in computer vision have facilitated the development of sophisticated tools for analyzing complex animal behaviors, yet the diversity of animal morphology and environmental complexities present significant challenges to real-time animal pose estimation. To address these challenges, we introduce AnimalRTPose, a one-stage model designed for cross-species real-time animal pose estimation. At its core, AnimalRTPose leverages CSPNeXt†, a novel backbone network that integrates depthwise separable convolution with skip connections for high-frequency feature extraction, a channel attention mechanism (CAM) to enhance the fusion of high-frequency and low-frequency features, and spatial pyramid pooling (SPP) to capture multi-scale contextual information. This architecture enables robust feature representation across varying spatial resolutions, enhancing adaptability to diverse species and environments. Additionally, AnimalRTPose incorporates an efficient multi-scale feature fusion module that dynamically balances local detail and global structural consistency, ensuring high accuracy and robustness in pose estimation. Designed for scalability and versatility, AnimalRTPose supports single-animal, multi-animal, cross-species, and few-shot scenarios. Specifically, AnimalRTPose-N achieves 476 FPS on NVIDIA RTX 2080Ti, 769 FPS on NVIDIA RTX 3090, and 1111 FPS on NVIDIA A800, while demonstrating high throughput on edge devices with 196 FPS on the NVIDIA Jetson™ AGX Orin Developer Kit (275 TOPS, 15 W to 60 W), 77 FPS on the Raspberry Pi 5 with AI HAT+ (26 TOPS, 25 W), and 64 FPS on the Atlas 200I Developer Kit A2 (8 TOPS, 24 W), all with a 640 × 640 input resolution. These results surpass all existing one-stage models, showcasing its superior performance in real-time animal pose estimation. AnimalRTPose is thus highly applicable for scenarios requiring real-time animal behavior monitoring. Further details on the model configuration and dataset are available on the AnimalRTPose project website.

Collapse

Vogg R, Lüddecke T, Henrich J, Dey S, Nuske M, Hassler V, Murphy D, Fischer J, Ostner J, Schülke O, Kappeler PM, Fichtel C, Gail A, Treue S, Scherberger H, Wörgötter F, Ecker AS. Computer vision for primate behavior analysis in the wild. Nat Methods 2025;22:1154-1166. [PMID: 40211003 DOI: 10.1038/s41592-025-02653-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 02/28/2025] [Indexed: 04/12/2025]

Affiliation(s)

Richard Vogg Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
Timo Lüddecke Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
Jonathan Henrich Chairs of Statistics and Econometrics and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
Sharmita Dey Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
Matthias Nuske Department for Computational Neuroscience, Third Physics Institute, University of Göttingen, Göttingen, Germany
Valentin Hassler Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
Derek Murphy Cognitive Ethology Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Department for Primate Cognition, Johann-Friedrich-Blumenbach Institute, University of Göttingen, Göttingen, Germany
Julia Fischer Cognitive Ethology Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Department for Primate Cognition, Johann-Friedrich-Blumenbach Institute, University of Göttingen, Göttingen, Germany Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany
Julia Ostner Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Behavioral Ecology Department, University of Göttingen, Göttingen, Germany Social Evolution in Primates Group, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
Oliver Schülke Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Behavioral Ecology Department, University of Göttingen, Göttingen, Germany Social Evolution in Primates Group, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
Peter M Kappeler Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Behavioral Ecology & Sociobiology Unit, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Department of Sociobiology/Anthropology, University of Göttingen, Göttingen, Germany
Claudia Fichtel Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Behavioral Ecology & Sociobiology Unit, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
Alexander Gail Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany Sensorimotor Group, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Sensorimotor Neuroscience and Neuroprosthetics, Georg-Elias-Müller Institute, University of Göttingen, Göttingen, Germany
Stefan Treue Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany Cognitive Neuroscience Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Biological Psychology & Cognitive Neuroscience, Georg-Elias-Müller-Institute of Psychology, University of Göttingen, Göttingen, Germany
Hansjörg Scherberger Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany Primate Neurobiology, Johann-Friedrich-Blumenbach-Institute for Zoology & Anthropology, University of Göttingen, Göttingen, Germany Neurobiology Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
Florentin Wörgötter Department for Computational Neuroscience, Third Physics Institute, University of Göttingen, Göttingen, Germany Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany
Alexander S Ecker Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany. Leibniz ScienceCampus, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany. Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany. Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany.

Collapse

Li H, Lo JTY. A review on the use of top-view surveillance videos for pedestrian detection, tracking and behavior recognition across public spaces. ACCIDENT; ANALYSIS AND PREVENTION 2025;215:107986. [PMID: 40081266 DOI: 10.1016/j.aap.2025.107986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 01/03/2025] [Accepted: 02/05/2025] [Indexed: 03/15/2025]

Gao G, Lv Z, Zhang Y, Qin AK. Advertising or adversarial? AdvSign: Artistic advertising sign camouflage for target physical attacking to object detector. Neural Netw 2025;186:107271. [PMID: 40010291 DOI: 10.1016/j.neunet.2025.107271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 01/07/2025] [Accepted: 02/11/2025] [Indexed: 02/28/2025]

Abstract

Deep learning models are often vulnerable to adversarial attacks in both digital and physical environments. Particularly challenging are physical attacks that involve subtle, unobtrusive modifications to objects, such as patch-sticking or light-shooting, designed to maliciously alter the model's output when the scene is captured and fed into the model. Developing physical adversarial attacks that are robust, flexible, inconspicuous, and difficult to trace remains a significant challenge. To address this issue, we propose an artistic-based camouflage named Adversarial Advertising Sign (AdvSign) for object detection task, especially in autonomous driving scenarios. Generally, artistic patterns, such as brand logos and advertisement signs, always have a high tolerance for visual incongruity and are widely exist with strong unobtrusiveness. We design these patterns into advertising signs that can be attached to various mobile carriers, such as carry-bags and vehicle stickers, to create adversarial camouflage with strong untraceability. This method is particularly effective at misleading self-driving cars, for instance, causing them to misidentify these signs as 'stop' signs. Our approach combines a trainable adversarial patch with various signs of artistic patterns to create advertising patches. By leveraging the diversity and flexibility of these patterns, we draw attention away from the conspicuous adversarial elements, enhancing the effectiveness and subtlety of our attacks. We then use the CARLA autonomous-driving simulator to place these synthesized patches onto 3D flat surfaces in different traffic scenes, rendering 2D composite scene images from various perspectives. These varied scene images are then input into the target detector for adversarial training, resulting in the final trained adversarial patch. In particular, we introduce a novel loss with artistic pattern constraints, designed to differentially adjust pixels within and outside the advertising sign during training. Extensive experiments in both simulated (composite scene images with AdvSign) and real-world (printed AdvSign images) environments demonstrate the effectiveness of AdvSign in executing physical attacks on state-of-the-art object detectors, such as YOLOv5. Our training strategy, leveraging diverse scene images and varied artistic transformations to adversarial patches, enables seamless integration with multiple patterns. This enhances attack effectiveness across various physical settings and allows easy adaptation to new environments and artistic patterns.

Collapse

Falisse A, Uhlrich SD, Chaudhari AS, Hicks JL, Delp SL. Marker Data Enhancement for Markerless Motion Capture. IEEE Trans Biomed Eng 2025;72:2013-2022. [PMID: 40031222 DOI: 10.1109/tbme.2025.3530848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]

Xu X, Wang C, Yi Q, Ye J, Kong X, Ashraf SQ, Dearn KD, Hajiyavand AM. MedBin: A lightweight End-to-End model-based method for medical waste management. WASTE MANAGEMENT (NEW YORK, N.Y.) 2025;200:114742. [PMID: 40088805 DOI: 10.1016/j.wasman.2025.114742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 03/04/2025] [Accepted: 03/07/2025] [Indexed: 03/17/2025]

Guo L, Chang R, Wang J, Narayanan A, Qian P, Leong MC, Kundu PP, Senthilkumar S, Garlapati SC, Yong ECK, Pahwa RS. Artificial intelligence-enhanced 3D gait analysis with a single consumer-grade camera. J Biomech 2025;187:112738. [PMID: 40378677 DOI: 10.1016/j.jbiomech.2025.112738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2025] [Revised: 04/21/2025] [Accepted: 04/29/2025] [Indexed: 05/19/2025]

Abstract

Gait analysis is crucial for diagnosing and monitoring various healthcare conditions, but traditional marker-based motion capture (MoCap) systems require expensive equipment, extensive setup, and trained personnel, limiting their accessibility in clinical and home settings. Markerless systems reduce setup complexity but often require multiple cameras, fixed calibration, and are not designed for widespread clinical adoption. This study introduces 3DGait, an artificial intelligence-enhanced markerless 3-Dimensional gait analysis system that operates with a single consumer-grade depth camera, providing a streamlined, accessible alternative. The system integrates advanced machine learning algorithms to produce 49 angular, spatial, and temporal gait biomarkers commonly used in mobility analysis. We validated 3DGait against a marker-based MoCap (OptiTrack) using 16 trials from 8 healthy adults performing the Timed Up and Go (TUG) test. The system achieved an overall average mean absolute error (MAE) of 2.3°, with all MAE under 5.2°, and a Pearson's correlation coefficient (PCC) of 0.75 for angular biomarkers. All spatiotemporal biomarkers had errors no greater than 15 %. Temporal biomarkers (excluding TUG time) had errors under 0.03 s, corresponding to one video frame at 30 frames per second. These results demonstrate that 3DGait provides clinically acceptable gait metrics relative to marker-based MoCap, while eliminating the need for markers, calibration, or fixed camera placement. 3DGait's accessible, non-invasive and single camera design makes it practical for use in non-specialist clinics and home settings, supporting patient monitoring and chronic disease management. Future research will focus on validating 3DGait with diverse populations, including individuals with gait abnormalities, to broaden its clinical applications.

Collapse

Tamin O, Moung EG, Dargham JA, Karim SAA, Ibrahim AO, Adam N, Osman HA. RGB and RGNIR image dataset for machine learning in plastic waste detection. Data Brief 2025;60:111524. [PMID: 40275976 PMCID: PMC12020901 DOI: 10.1016/j.dib.2025.111524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 03/14/2025] [Accepted: 03/24/2025] [Indexed: 04/26/2025] Open

Abstract

The increasing volume of plastic waste is an environmental issue that demands effective sorting methods for different types of plastic. While spectral imaging offers a promising solution, it has several drawbacks, such as complexity, high cost, and limited spatial resolution. Machine learning has emerged as a potential solution for plastic waste due to its ability to analyse and interpret large volumes of data using algorithms. However, developing an efficient machine learning model requires a comprehensive dataset with information on the size, shape, colour, texture, and other features of plastic waste. Moreover, incorporating near-infrared (NIR) spectral data into machine learning models can reveal crucial information about plastic waste composition and structure that remains invisible in standard RGB images. Despite this potential, no publicly available dataset currently combines RGB with NIR spectral information for plastic waste detection. To address this research gap, we introduce a comprehensive dataset of plastic waste images captured onshore using both standard RGB and RGNIR (red, green, near-infrared) channels. Each of the two-colour space datasets include 405 images that were taken along riverbanks and beaches. Both datasets underwent further pre-processing to ensure proper labelling and annotations to prepare them for training machine learning models. In total, there are 1,344 plastic waste objects that have been annotated. The proposed dataset offers a unique resource for researchers to train machine learning models for plastic waste detection. While there are existing datasets on plastic waste, the proposed dataset aims to set itself apart by offering a more comprehensive dataset with unique spectral information in the near-infrared region. It is hopeful that these datasets will contribute to the advancement of the field of plastic waste detection and encourage further research in this area.

Collapse

Guru D, N S. Banana bunch image and video dataset for variety classification and grading. Data Brief 2025;60:111478. [PMID: 40231149 PMCID: PMC11994905 DOI: 10.1016/j.dib.2025.111478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 02/26/2025] [Accepted: 03/13/2025] [Indexed: 04/16/2025] Open

Ebert N, Stricker D, Wasenmüller O. Enhancing robustness and generalization in microbiological few-shot detection through synthetic data generation and contrastive learning. Comput Biol Med 2025;191:110141. [PMID: 40253923 DOI: 10.1016/j.compbiomed.2025.110141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 02/25/2025] [Accepted: 04/03/2025] [Indexed: 04/22/2025]

Abstract

In many medical and pharmaceutical processes, continuous hygiene monitoring is crucial, often involving the manual detection of microorganisms in agar dishes by qualified personnel. Although deep learning methods hold promise for automating this task, they frequently encounter a shortage of sufficient training data, a prevalent challenge in colony detection. To overcome this limitation, we propose a novel pipeline that combines generative data augmentation with few-shot detection. Our approach aims to significantly enhance detection performance, even with (very) limited training data. A main component of our method is a diffusion-based generator model that inpaints synthetic bacterial colonies onto real agar plate backgrounds. This data augmentation technique enhances the diversity of training data, allowing for effective model training with only 25 real images. Our method outperforms common training-techniques, demonstrating a +0.45 mAP improvement compared to training from scratch, and a +0.15 mAP advantage over the current SOTA in synthetic data augmentation. Additionally, we integrate a decoupled feature classification strategy, where class-agnostic detection is followed by lightweight classification via a feed-forward network, making it possible to detect and classify colonies with minimal examples. This approach achieves an AP50 score of 0.7 in a few-shot scenario on the AGAR dataset. Our method also demonstrates robustness to various image corruptions, such as noise and blur, proving its applicability in real-world scenarios. By reducing the need for large labeled datasets, our pipeline offers a scalable, efficient solution for colony detection in hygiene monitoring and biomedical research, with potential for broader applications in fields where rapid detection of new colony types is required.

Collapse

Dong X, Zhang C, Wang P, Chen D, Tu GJ, Zhao S, Xiang T. A Novel Dual-Network Approach for Real-Time Liveweight Estimation in Precision Livestock Management. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025;12:e2417682. [PMID: 40285549 PMCID: PMC12165045 DOI: 10.1002/advs.202417682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2024] [Revised: 04/03/2025] [Indexed: 04/29/2025]

Chen C, Lv F, Guan Y, Wang P, Yu S, Zhang Y, Tang Z. Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025;31:3809-3821. [PMID: 40323760 DOI: 10.1109/tvcg.2025.3567053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2025]

Yip HF, Li Z, Zhang L, Lyu A. Large Language Models in Integrative Medicine: Progress, Challenges, and Opportunities. J Evid Based Med 2025;18:e70031. [PMID: 40384541 PMCID: PMC12086751 DOI: 10.1111/jebm.70031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Revised: 04/11/2025] [Accepted: 05/05/2025] [Indexed: 05/20/2025]

Cui H, Huang D, Feng W, Li Z, Ouyang Q, Zhong C. FIAEPI-KD: A novel knowledge distillation approach for precise detection of missing insulators in transmission lines. PLoS One 2025;20:e0324524. [PMID: 40445919 PMCID: PMC12124544 DOI: 10.1371/journal.pone.0324524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Accepted: 04/25/2025] [Indexed: 06/02/2025] Open

Abstract

Ensuring transmission line safety is crucial. Detecting insulator defects is a key task. UAV-based insulator detection faces challenges: complex backgrounds, scale variations, and high computational costs. To address these, we propose FIAEPI-KD, a knowledge distillation framework integrating Feature Indicator Attention (FIA) and Edge Preservation Index (EPI). The method employs ResNet and FPN for multi-scale feature extraction. The FIA module dynamically focuses on multi-scale insulator edges via dual-path attention mechanisms, suppressing background interference. The EPI module quantifies edge differences between teacher and student models through gradient-aware distillation. The training objective combines Euclidean distance, KL divergence, and FIA-EPI losses to align feature-space similarities and edge details, enabling multi-level knowledge distillation. Experiments demonstrate significant improvements on our custom dataset containing farmland and waterbody scenarios. The RetinaNet-ResNet18 student model achieves a 10.5% mAP improvement, rising from 42.7% to 53.2%. Meanwhile, the Faster R-CNN-ResNet18 model achieves a 7.4% mAP improvement, rising from 42.7% to 50.1%. Additionally, the RepPoints-ResNet18 model achieves a 7.7% mAP improvement, rising from 49.6% to 57.3%. These results validate the effectiveness of FIAEPI-KD in enhancing detection accuracy across diverse detector architectures and backbone networks. On the MSCOCO dataset, FIAEPI-KD outperformed mainstream distillation methods like FKD and PKD. Ablation studies confirmed FIA's role in feature focus and EPI's edge difference quantification. Using FIA alone increased RetinaNet-ResNet50's mAP by 0.9%. Combined FIA+EPI achieved a total 3.0% improvement, the method utilizes a lightweight student model for efficient deployment, providing an effective solution for detecting insulation defects in transmission lines.

Collapse

Sirimewan D, Dayarathna S, Raman S, Bai Y, Arashpour M. A benchmark dataset for class-wise segmentation of construction and demolition waste in cluttered environments. Sci Data 2025;12:885. [PMID: 40436975 PMCID: PMC12120074 DOI: 10.1038/s41597-025-05243-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Accepted: 05/20/2025] [Indexed: 06/01/2025] Open

Rajabi N, Zanettin I, Ribeiro AH, Vasco M, Björkman M, Lundström JN, Kragic D. Exploring the feasibility of olfactory brain-computer interfaces. Sci Rep 2025;15:18404. [PMID: 40419502 DOI: 10.1038/s41598-025-01488-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 05/05/2025] [Indexed: 05/28/2025] Open

Alabi O, Toe KKZ, Zhou Z, Budd C, Raison N, Shi M, Vercauteren T. CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery. Sci Data 2025;12:825. [PMID: 40394065 PMCID: PMC12092654 DOI: 10.1038/s41597-025-05163-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 05/08/2025] [Indexed: 05/22/2025] Open

Day AL, Wahl CB, Dos Reis R, Liao WK, Li Y, Kilic MNT, Mirkin CA, Dravid VP, Choudhary A, Agrawal A. Automated image segmentation for accelerated nanoparticle characterization. Sci Rep 2025;15:17180. [PMID: 40382402 PMCID: PMC12085630 DOI: 10.1038/s41598-025-01337-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 04/25/2025] [Indexed: 05/20/2025] Open

Wang J, Wang T, Xu Q, Gao L, Gu G, Jia L, Yao C. RP-DETR: end-to-end rice pests detection using a transformer. PLANT METHODS 2025;21:63. [PMID: 40382633 DOI: 10.1186/s13007-025-01381-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2025] [Accepted: 04/25/2025] [Indexed: 05/20/2025]

Elnady M, Abdelmunim HE. A novel YOLO LSTM approach for enhanced human action recognition in video sequences. Sci Rep 2025;15:17036. [PMID: 40379779 DOI: 10.1038/s41598-025-01898-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2025] [Accepted: 05/09/2025] [Indexed: 05/19/2025] Open

Bondarenko A, Jumutc V, Netter A, Duchateau F, Abrão HM, Noorzadeh S, Giacomello G, Ferrari F, Bourdel N, Kirk UB, Bļizņuks D. Object Detection in Laparoscopic Surgery: A Comparative Study of Deep Learning Models on a Custom Endometriosis Dataset. Diagnostics (Basel) 2025;15:1254. [PMID: 40428247 PMCID: PMC12110204 DOI: 10.3390/diagnostics15101254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2025] [Revised: 05/07/2025] [Accepted: 05/07/2025] [Indexed: 05/29/2025] Open

Abstract

Background: Laparoscopic surgery for endometriosis presents unique challenges due to the complexity of and variability in lesion appearances within the abdominal cavity. This study investigates the application of deep learning models for object detection in laparoscopic videos, aiming to assist surgeons in accurately identifying and localizing endometriosis lesions and related anatomical structures. A custom dataset was curated, comprising of 199 video sequences and 205,725 frames. Of these, 17,560 frames were meticulously annotated by medical professionals. The dataset includes object detection annotations for 10 object classes relevant to endometriosis, alongside segmentation masks for some classes. Methods: To address the object detection task, we evaluated the performance of two deep learning models-FasterRCNN and YOLOv9-under both stratified and non-stratified training scenarios. Results: The experimental results demonstrated that stratified training significantly reduced the risk of data leakage and improved model generalization. The best-performing FasterRCNN object detection model achieved a high average test precision of 0.9811 ± 0.0084, recall of 0.7083 ± 0.0807, and mAP50 (mean average precision at 50% overlap) of 0.8185 ± 0.0562 across all presented classes. Despite these successes, the study also highlights the challenges posed by the weak annotations and class imbalances in the dataset, which impacted overall model performances. Conclusions: In conclusion, this study provides valuable insights into the application of deep learning for enhancing laparoscopic surgical precision in endometriosis treatment. The findings underscore the importance of robust dataset curation and advanced training strategies in developing reliable AI-assisted tools for surgical interventions. The latter could potentially improve the guidance of surgical interventions and prevent blind spots occurring in difficult to reach abdominal regions. Future work will focus on refining the dataset and exploring more sophisticated model architectures to further improve detection accuracy.

Collapse

Vinken K, Sharma S, Livingstone MS. Mapping Macaque to Human Cortex with Natural Scene Responses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.05.11.653327. [PMID: 40462947 PMCID: PMC12132291 DOI: 10.1101/2025.05.11.653327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2025]

Deng K, Wen Q, Yang F, Ouyang H, Shi Z, Shuai S, Wu Z. OS-DETR: End-to-end brain tumor detection framework based on orthogonal channel shuffle networks. PLoS One 2025;20:e0320757. [PMID: 40359502 PMCID: PMC12074655 DOI: 10.1371/journal.pone.0320757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 02/21/2025] [Indexed: 05/15/2025] Open

Zhao P, Wang X, Yu S, Dong X, Li B, Wang H, Chen G. An open paradigm dataset for intelligent monitoring of underground drilling operations in coal mines. Sci Data 2025;12:780. [PMID: 40355463 PMCID: PMC12069595 DOI: 10.1038/s41597-025-05118-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 05/01/2025] [Indexed: 05/14/2025] Open

Ruarte G, Bujia G, Care D, Ison MJ, Kamienkowski JE. Integrating Bayesian and neural networks models for eye movement prediction in hybrid search. Sci Rep 2025;15:16482. [PMID: 40355508 PMCID: PMC12069626 DOI: 10.1038/s41598-025-00272-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Accepted: 04/28/2025] [Indexed: 05/14/2025] Open

Alzahrani N, Bchir O, Ismail MMB. YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences. SENSORS (BASEL, SWITZERLAND) 2025;25:3013. [PMID: 40431808 PMCID: PMC12115296 DOI: 10.3390/s25103013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2025] [Revised: 05/04/2025] [Accepted: 05/08/2025] [Indexed: 05/29/2025]

Fincato M, Vezzani R. DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation. SENSORS (BASEL, SWITZERLAND) 2025;25:2997. [PMID: 40431791 PMCID: PMC12114973 DOI: 10.3390/s25102997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2025] [Revised: 05/05/2025] [Accepted: 05/06/2025] [Indexed: 05/29/2025]

Tu J, Liu X, Huang Z, Hao Y, Hong R, Wang M. Cross-Modal Hashing via Diverse Instances Matching. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025;34:2737-2749. [PMID: 40266858 DOI: 10.1109/tip.2025.3561659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2025]

Zhou Q, Pan Z, Niu B. SFEF-Net: Scattering Feature Extraction and Fusion Network for Aircraft Detection in SAR Images. SENSORS (BASEL, SWITZERLAND) 2025;25:2988. [PMID: 40431781 PMCID: PMC12114894 DOI: 10.3390/s25102988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Revised: 04/27/2025] [Accepted: 05/07/2025] [Indexed: 05/29/2025]

Abo-Zahhad MM, Abo-Zahhad M. Real time intelligent garbage monitoring and efficient collection using Yolov8 and Yolov5 deep learning models for environmental sustainability. Sci Rep 2025;15:16024. [PMID: 40341180 PMCID: PMC12062267 DOI: 10.1038/s41598-025-99885-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Accepted: 04/23/2025] [Indexed: 05/10/2025] Open

Abstract

Effective waste management is currently one of the most influential factors in enhancing the quality of life. Increased garbage production has been identified as a significant problem for many cities worldwide and a crucial issue for countries experiencing rapid urban population growth. According to the World Bank Organization, global waste production is projected to increase from 2.01 billion tonnes in 2018 to 3.4 billion tonnes by 2050 (Kaza et al. in What a Waste 2.0: A Global Snapshot of Solid Waste Management to 2050, The World Bank Group, Washington, DC, USA, 2018). In many cities, growing waste is the primary driver of environmental pollution. Nationally, governments have initiated several programs to improve cleanliness by developing systems that alert businesses when it's time to empty the bins. Current research proposes an enhanced, accurate, real-time object detection system to address the problem of trash accumulating around containers. This system involves numerous trash cans scattered across the city, each equipped with a low-cost device that measures the amount of trash inside. When a certain threshold is reached, the device sends a message with a unique identifier, prompting the appropriate authorities to take action. The system also triggers alerts if individuals throw trash bags outside the container or if the bin overflows, sending a message with a unique identifier to the authorities. Additionally, this paper addresses the need for efficient garbage classification while reducing computing costs to improve resource utilization. Two-stage lightweight deep learning models based on YOLOv5 and YOLOv8 are adopted to significantly decrease the number of parameters and processes, thereby reducing hardware requirements. In this study, trash is first classified into primary categories, which are further subdivided. The primary categories include full trash containers, trash bags, trash outside containers, and wet trash containers. YOLOv5 is particularly effective for classifying small objects, achieving high accuracy in identifying and categorizing different types of waste products on hardware without GPU capabilities. Each main class is further subdivided using YOLOv8 to facilitate recycling. A comparative study of YOLOv8, YOLOv5, and EfficientNet models on public and newly constructed garbage datasets shows that YOLOv8 and YOLOv5 have good accuracy for most classes, with the full-trash bin class achieving the highest accuracy and the wet trash container class the lowest compared to the EfficientNet model. The results demonstrate that the system effectively addresses the reliability issues of previously proposed systems, including detecting whether a trash bin is full, identifying trash outside the bin, and ensuring proper communication with authorities for necessary actions. Further research is recommended to enhance garbage management and collection, considering target occlusion, CPU and GPU hardware optimization, and robotic integration with the proposed system.

Collapse

Wu M, Sharapov J, Anderson M, Lu Y, Wu Y. Quantifying dislocation-type defects in post irradiation examination via transfer learning. Sci Rep 2025;15:15889. [PMID: 40335501 PMCID: PMC12059087 DOI: 10.1038/s41598-025-00238-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Accepted: 04/25/2025] [Indexed: 05/09/2025] Open

Zheng S, Wu Z, Xu Y, He C, Wei Z. Detector With Classifier²: An End-to-End Multi-Stream Feature Aggregation Network for Fine-Grained Object Detection in Remote Sensing Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025;34:2707-2720. [PMID: 40305241 DOI: 10.1109/tip.2025.3563708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2025]

Abstract

Fine-grained object detection (FGOD) fundamentally comprises two primary tasks: object detection and fine-grained classification. In natural scenes, most FGOD methods benefit from higher instance resolution and fewer environmental variation, attributing more commonly associated with the latter task. In this paper, we propose a unified paradigm named Detector with Classifier2 (DC2), which provides a holistic paradigm by explicitly considering the end-to-end integration of object detection and fine-grained classification tasks, rather than prioritizing one aspect. Initially, our detection sub-network is restricted to only determining whether the proposal is a coarse-category and does not delve into the specific sub-categories. Moreover, in order to reduce redundant pixel-level calculation, we propose an instance-level feature enhancement (IFE) module to model the semantic similarities among proposals, which poses great potential for locating more instances in remote sensing images (RSIs). After obtaining the coarse detection predictions, we further construct a classification sub-network, which is built on top of the former branch to determine the specific sub-categories of the aforementioned predictions. Importantly, the detection network is performed on the complete image, while the classification network conducts secondary modeling for the detected regions. These operations can be denoted as the global contextual information and local intrinsic cues extractions for each instance. Therefore, we propose a multi-stream feature aggregation (MSFA) module to integrate global-stream semantic information and local-stream discriminative cues. Our whole DC2 network follows an end-to-end learning fashion, which effectively excavates the internal correlation between detection and fine-grained classification networks. We evaluate the performance of our DC2 network on two benchmarks SAT-MTB and HRSC2016 datasets. Importantly, our method achieves the new state-of-the-art results compared with recent works (approximately 7% mAP gains on SAT-MTB) and improves baseline by a significant margin (43.2% $v.s.~36.7$ %) without any complicated post-processing strategies. Source codes of the proposed methods are available at https://github.com/zhengshangdong/DC2.

Collapse

Gao C, Ajith S, Peelen MV. Object representations drive emotion schemas across a large and diverse set of daily-life scenes. Commun Biol 2025;8:697. [PMID: 40325234 PMCID: PMC12053605 DOI: 10.1038/s42003-025-08145-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Accepted: 04/29/2025] [Indexed: 05/07/2025] Open

Haupt M, Garrett DD, Cichy RM. Healthy aging delays and dedifferentiates high-level visual representations. Curr Biol 2025;35:2112-2127.e6. [PMID: 40239656 DOI: 10.1016/j.cub.2025.03.062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 01/23/2025] [Accepted: 03/25/2025] [Indexed: 04/18/2025]

Toosi A, Harsini S, Divband G, Bénard F, Uribe CF, Oviedo F, Dodhia R, Weeks WB, Lavista Ferres JM, Rahmim A. Computer-Aided Detection (CADe) of Small Metastatic Prostate Cancer Lesions on 3D PSMA PET Volumes Using Multi-Angle Maximum Intensity Projections. Cancers (Basel) 2025;17:1563. [PMID: 40361490 PMCID: PMC12071532 DOI: 10.3390/cancers17091563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2025] [Revised: 04/28/2025] [Accepted: 04/29/2025] [Indexed: 05/15/2025] Open

Demircioğlu A, Bos D, Quinsten AS, Umutlu L, Bruder O, Forsting M, Nassenstein K. Detecting the left atrial appendage in CT localizers using deep learning. Sci Rep 2025;15:15333. [PMID: 40316718 PMCID: PMC12048584 DOI: 10.1038/s41598-025-99701-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Accepted: 04/22/2025] [Indexed: 05/04/2025] Open

Yang L, He L, Hu D, Liu Y, Peng Y, Chen H, Zhou M. Variational Transformer: A Framework Beyond the Tradeoff Between Accuracy and Diversity for Image Captioning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:9500-9511. [PMID: 39374280 DOI: 10.1109/tnnls.2024.3440872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]

Tang X, Ye S, Shi Y, Hu T, Peng Q, You X. Filter Pruning Based on Information Capacity and Independence. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:8401-8413. [PMID: 39231052 DOI: 10.1109/tnnls.2024.3415068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]

Liu C, Li B, Shi M, Chen X, Ye Q, Ji X. Explicit Margin Equilibrium for Few-Shot Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:8072-8084. [PMID: 38980785 DOI: 10.1109/tnnls.2024.3422216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]

Song Y, Liu Z, Li G, Xie J, Wu Q, Zeng D, Xu L, Zhang T, Wang J. EMS: A Large-Scale Eye Movement Dataset, Benchmark, and New Model for Schizophrenia Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:9451-9462. [PMID: 39178070 DOI: 10.1109/tnnls.2024.3441928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]

Xu M, Dai N, Jiang L, Fu Y, Deng X, Li S. Recruiting Teacher IF Modality for Nephropathy Diagnosis: A Customized Distillation Method With Attention-Based Diffusion Network. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025;44:2028-2040. [PMID: 40030767 DOI: 10.1109/tmi.2024.3524544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]

Zhou H, Yang R, Zhang Y, Duan H, Huang Y, Hu R, Li X, Zheng Y. UniHead: Unifying Multi-Perception for Detection Heads. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:9565-9576. [PMID: 38905097 DOI: 10.1109/tnnls.2024.3412947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]

Bao J, Zhang J, Zhang C, Bao L. DCTCNet: Sequency discrete cosine transform convolution network for visual recognition. Neural Netw 2025;185:107143. [PMID: 39847941 DOI: 10.1016/j.neunet.2025.107143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 01/01/2025] [Accepted: 01/09/2025] [Indexed: 01/25/2025]

Zhang K, Zhu D, Min X, Zhai G. Unified Approach to Mesh Saliency: Evaluating Textured and Non-Textured Meshes Through VR and Multifunctional Prediction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025;31:3151-3160. [PMID: 40063447 DOI: 10.1109/tvcg.2025.3549550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2025]

Luo X, Duan Z, Qin A, Tian Z, Xie T, Zhang T, Tang YY. Layer-Wise Mutual Information Meta-Learning Network for Few-Shot Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:9684-9698. [PMID: 39255186 DOI: 10.1109/tnnls.2024.3438771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]