4.6 Article

Deep Convolutional Pooling Transformer for Deepfake Detection

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3588574

Keywords

Deepfake detection; image keyframes; transformer

Ask authors/readers for more resources

Deepfake technology has attracted significant attention in social media digital forensics due to security and privacy concerns. The increasing realism of Deepfake videos on the Internet has made it difficult for traditional detection techniques to differentiate between real and fake. To address this issue, we propose a deep convolutional Transformer that incorporates both local and global key image features, enhances the extracted features through convolutional pooling and re-attention, and utilizes image keyframes for performance improvement. Extensive experiments on Deepfake benchmark datasets demonstrate that our proposed solution consistently outperforms state-of-the-art baselines in both within- and cross-dataset experiments.
Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improve the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available