Abstract
Deepfakes, enabled by recent advances in generative models, pose significant ethical, societal, and security risks. Although many detection methods achieve strong intradataset performance, they often degrade on low-quality or cross-domain data due to compression artifacts and unseen manipulations. To address this, we introduce LGSFNet, a robust deepfake detection framework that fuses local and global forgery semantics in a dual-path architecture. The design integrates a Spatial Resolution Adapter (SRA) to extract local low-level features and a novel Local Semantic Fusion Adapter (LSFA) to inject these cues into the DINOv3 transformer backbone for multi-stage feature fusion with parameter-efficient training. Experiments on FaceForensics++ demonstrate state-of-the-art results across all four manipulation types, achieving up to 99.98% AUC. Cross-corpora evaluations on Celeb-DF, DFD, and DFDC further highlight strong generalization, with improvements of up to +11.2% AUC over prior methods. A t-SNE visualization confirms discriminative representation of forgery features, while ablation studies validate that three LSFA modules achieve the best trade-off between performance and complexity. Overall, LGSFNet provides a robust, efficient, and generalizable solution for detecting low-quality and unseen deepfakes, moving toward reliable real-world deployment. The source code can be accessed using the link: https: //github.com/zulkaifsajjad/LGSFNet
| Original language | English |
|---|---|
| Title of host publication | British Machine Vision Conference |
| Edition | 1 |
| Publication status | Published - 24 Nov 2025 |
Fields of science
- 102003 Image processing
- 202002 Audiovisual media
- 102001 Artificial intelligence
- 102015 Information systems
- 102 Computer Sciences
JKU Focus areas
- Digital Transformation