Survey of deep face manipulation and fake detection
XIE Tian1,2, YU Lingyun3,2, LUO Changwei4,5, XIE Hongtao3, ZHANG Yongdong3,2
1. AHU-IAI AI Joint Laboratory, Anhui University, Hefei 230601, China; 2. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China; 3. The School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China; 4. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; 5. Academy of Military Sciences, Beijing 100091, China
Abstract:[Significance] Deep face manipulation technology involves the generation and manipulation of human imagery by different strategies, such as identity swapping or face reenactment between the source face and the target face. On the one hand, the rise of deep face manipulation has inspired a series of applications, including video making and advertising marketing. On the other hand, because face manipulation technology is usually open source or packaged as APPs for free distribution, it makes the threshold of tampering technology lower, resulting in the proliferation of fake videos. Moreover, when face manipulation technology is maliciously used by criminals to produce fake news, especially for important military and political officials, it will guide and intervene in public opinion, posing a great threat to national security and social stability. Therefore, the research on deep face forgery detection technology is particularly important. Hence, it is necessary to summarize the existing research to rationally guide deep face manipulation and detection technology.[Progress] Nowadays, deep face manipulation technology can be roughly divided into four types, namely, identity swapping, face reenactment, face editing, and face synthesis. Deepfakes bring real-world identity swapping to a new level of fidelity. The region-aware face-swapping network provides the identity information of source characters from local and global perspectives, making the generated faces more natural. In the field of facial reenactment, Wav2lip uses pretrained lip synchro models as expert models, encouraging the model to generate natural and accurate lip movements. In the field of face editing, FENeRF, a three-dimensional perception generator based on a neural radiation field, aligns semantic, geometric, and texture information in spatial domain and improves the consistency of the generated image between different perspectives while ensuring that the face can be edited. In the field of face synthesis, Anyface proposes a cross-modal distillation module for the alignment of language and visual representation, realizing the use of text information to generate more diversified face images. Deep face forgery detection technology can be roughly divided into image-level forgery detection and video-level forgery detection methods. In the image-level methods, SBI proposes a self-blended technique to generate realistic fake face images with data augmentation, effectively improving the generalization ability of the model. M2TR proposes a multimodal and multi-scale Transformer model to detect local artifacts at different levels of the image in spatial. Frequency domain features are also added as auxiliary information to ensure the forgery detection ability of the model for highly compressed images. In the video-level methods, RealForensics learns the natural correspondence between the face and audio in a real video in a self-supervised way, enhancing the generalization and robustness of the model.[Conclusions and Prospects] Presently, deep face manipulation and detection technologies are rapidly developing, and various corresponding technologies are in the process of continuous update and iteration. First, this survey reviews the deep face manipulation and detection methods and discusses their strengths and weaknesses. Second, the common datasets and the evaluation results of different manipulation and detection methods are summarized. Finally, the main challenges of face manipulation and fake detection are discussed, and the possible research direction in the future is pointed out.
[1] Deepfakes. Deepfakes github[EB/OL].[2022-09-14]. https://github.com/Deepfakes/faceswap. [2] Zao. Zao app.[EB/OL]. (2019-12-01)[2022-09-14]. https://zaodownload.com/download-zao-app-deepfake. [3] Face app. Face app[EB/OL].[2022-09-14]. https://apps.apple.com/gb/app/faceapp-ai-face-editor/id1180884341. [4] DOLHANSKY B, BITTON J, PFLAUM B, et al. The deepfake detection challenge (DFDC) dataset[EB/OL].[2022-09-14]. https://arxiv.org/abs/2006.07397. [5] MIRSKY Y, LEE W. The creation and detection of deepfakes: A survey[J]. ACM Computing Surveys, 2022, 54(1): 7. [6] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014: 2672-2680. [7] XU C, ZHANG J N, HUA M, et al. Region-aware face swapping[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 7622-7631. [8] PRAJWAL K R, MUKHOPADHYAY R, NAMBOODIRI V P, et al. A lip sync expert is all you need for speech to lip generation in the wild[C]// Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 484-492. [9] LIANG B R, PAN Y, GUO Z Z, et al. Expressive talking head generation with granular audio-visual control[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 3377-3386. [10] SCHWARZ K, LIAO Y Y, NIEMEYER M, et al. Graf: Generative radiance fields for 3D-aware image synthesis[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2020: 1692. [11] SUN J X, WANG X, ZHANG Y, et al. FENeRf: Face editing in neural radiance fields[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 7662-7672. [12] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019: 4396-4405. [13] SUN J X, DENG Q Y, LI Q, et al. AnyFace: Free-style text-to-face synthesis and manipulation[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 18666-18675. [14] SHANG Z H, XIE H T, ZHA Z J, et al. PRRNet: Pixel-region relation network for face forgery detection[J]. Pattern Recognition, 2021, 116: 107950. [15] FaceSwap. FaceSwap github[EB/OL].[2022-09-14]. https://github.com/MarekKowalski/FaceSwap. [16] LI L Z, BAO J M, YANG H, et al. Advancing high fidelity identity swapping for forgery detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 5073-5082. [17] CHEN R W, CHEN X H, NI B B, et al. SimSwap: An efficient framework for high fidelity face swapping[C]// Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 2003-2011. [18] XU Y Y, DENG B L, WANG J L, et al. High-resolution face swapping via latent semantics disentanglement[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 7632-7641. [19] KIM J, LEE J, ZHANG B T. Smooth-Swap: A simple enhancement for face-swapping with smoothness[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 10769-10778. [20] THIES J, ZOLLHÖFER M, STAMMINGER M, et al. Face2Face: Real-time face capture and reenactment of RGB videos[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 2387-2395. [21] THIES J, ZOLLHÖFER M, NIEβNER M. Deferred neural rendering: Image synthesis using neural textures[J]. ACM Transactions on Graphics, 2019, 38(4): 66. [22] WILES O, KOEPKE A S, ZISSERMAN A. X2Face: A network for controlling face generation using images, audio, and pose codes[C]// Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018: 690-706. [23] SIAROHIN A, LATHUILIōRE S, TULYAKOV S, et al. First order motion model for image animation[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS Foundation, 2019: 641. [24] HSU G S, TSAI C H, WU H Y. Dual-generator face reenactment[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 632-640. [25] ZHOU Y, HAN X T, SHECHTMAN E, et al. MakeltTalk: Speaker-aware talking-head animation[J]. ACM Transactions on Graphics, 2020, 39(6): 221. [26] ZHOU H, SUN Y S, WU W, et al. Pose-controllable talking face generation by implicitly modularized audio-visual representation[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 4174-4184. [27] ZHANG C X, ZHAO Y F, HUANG Y F, et al. FACIAL: Synthesizing dynamic talking face with implicit attribute learning[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 3847-3856. [28] YU L Y, XIE H T, ZHANG Y D. Multimodal learning for temporally coherent talking face generation with articulator synergy[J]. IEEE Transactions on Multimedia, 2022, 24: 2950-2962. [29] SUWAJANAKORN S, SEITZ S M, KEMELMACHER- SHLIZERMAN I. Synthesizing obama: Learning lip sync from audio[J]. ACM Transactions on Graphics, 2017, 36(4): 95. [30] GUO Y D, CHEN K Y, LIANG S, et al. AD-NeRF: Audio driven neural radiance fields for talking head synthesis[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 5764-5774. [31] MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2022, 65(1): 99-106. [32] SONG H K, WOO S H, LEE J, et al. Talking face generation with multilingual TTS[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 21393-21398. [33] PERARNAU G, VAN DE WEIJER J, RADUCANU B, et al. Invertible conditional GANs for image editing[EB/OL].[2022-09-14]. https://arxiv.org/abs/1611.06355. [34] HE Z L, ZUO W M, KAN M N, et al. AttGAN: Facial attribute editing by only changing what you want[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5464-5478. [35] CHAN E R, MONTEIRO M, KELLNHOFER P, et al. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 5795-5805. [36] NIEMEYER M, GEIGER A. GIRAFFE: Representing scenes as compositional generative neural feature fields[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 11448-11459. [37] SHEN Y J, GU J J, TANG X O, et al. Interpreting the latent space of GANs for semantic face editing[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 9240-9249. [38] YAO X, NEWSON A, GOUSSEAU Y, et al. A latent transformer for disentangled face editing in images and videos[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 13769-13778. [39] XU Y B, YIN Y Q, JIANG L M, et al. TransEditor: Transformer-based dual-space GAN for highly controllable facial editing[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 7673-7682. [40] JIANG Y M, HUANG Z Q, PAN X G, et al. Talk-to-edit: Fine-grained facial editing via dialog[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 13779-13788. [41] ODENA A, OLAH C, SHLENS J. Conditional image synthesis with auxiliary classifier GANs[C]// Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 2642-2651. [42] ARJOVSKY M, BOTTOU L. Towards principled methods for training generative adversarial networks[C]// Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview.net, 2017. [43] KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[C]// Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: OpenRevier.net, 2018. [44] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 8107-8116. [45] XIA W H, YANG Y J, XUE J H, et al. TediGAN: Text-guided diverse face image generation and manipulation[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 2256-2265. [46] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021: 8748-8763. [47] LI L Z, BAO J M, ZHANG T, et al. Face X-ray for more general face forgery detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 5000-5009. [48] ZHAO H Q, WEI T Y, ZHOU W B, et al. Multi-attentional deepfake detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 2185-2194. [49] QIAN Y Y, YIN G J, SHENG L, et al. Thinking in frequency: Face forgery detection by mining frequency-aware clues[C]// Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020: 86-103. [50] LI J M, XIE H T, LI J H, et al. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 6454-6463. [51] ZHENG Y L, BAO J M, CHEN D, et al. Exploring temporal coherence for more general video face forgery detection[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 15024-15034. [52] HALIASSOS A, MIRA R, PETRIDIS S, et al. Leveraging real talking faces via self-supervision for robust forgery detection[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 14930-14942. [53] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 1800-1807. [54] TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks[C]// Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR, 2019: 6105-6114. [55] ZHAO T C, XU X, XU M Z, et al. Learning self-consistency for deepfake detection[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 15003-15013. [56] SHIOHARA K, YAMASAKI T. Detecting deepfakes with self-blended images[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 18699-18708. [57] CAO J Y, MA C, YAO T P, et al. End-to-end reconstruction-classification learning for face forgery detection[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 4103-4112. [58] DONG S C, WANG J, LIANG J J, et al. Explaining deepfake detection by analysing image matching[C]// Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022: 18-35. [59] WANG J K, WU Z X, OUYANG W H, et al. M2TR: Multi-modal multi-scale transformers for deepfake detection[C]// Proceedings of the 2022 International Conference on Multimedia Retrieval. Newark, USA: ACM, 2022: 615-623. [60] MASI I, KILLEKAR A, MASCARENHAS R M, et al. Two-branch recurrent network for isolating deepfakes in videos[C]// Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020: 667-684. [61] RUFF L, GÖRNITZ N, DEECKE L, et al. Deep one-class classification[C]// Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018: 4390-4399. [62] HALIASSOS A, VOUGIOUKAS K, PETRIDIS S, et al. Lips don't lie: A generalisable and robust approach to face forgery detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 5037-5047. [63] CHUNG J S, ZISSERMAN A. Lip reading in the wild[C]// Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer, 2017: 87-103. [64] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778. [65] TRAN D, WANG H, FEISZLI M, et al. Video classification with channel-separated convolutional networks[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019: 5551-5560. [66] RÖSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics++: Learning to detect manipulated facial images[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019: 1-11. [67] NAGRANI A, CHUNG J S, ZISSERMAN A. VoxCeleb: A large-scale speaker identification dataset[C]// Proceedings of the 18th Annual Conference of the International Speech Communication Association. Stockholm, Sweden: ISCA, 2017: 2616-2620. [68] CHUNG J S, NAGRANI A, ZISSERMAN A. VoxCeleb2: Deep speaker recognition[C]// Proceedings of the 19th Annual Conference of the International Speech Communication Association. Hyderabad, India: ISCA, 2018: 1086-1090. [69] WANG K, WU Q Y, SONG L S, et al. MEAD: A large-scale audio-visual dataset for emotional talking-face generation[C]// Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020: 700-717. [70] SUN J X, LI Q, WANG W N, et al. Multi-caption text-to-face synthesis: Dataset and algorithm[C]// Proceedings of the 29th ACM International Conference on Multimedia. China: ACM, 2021: 2290-2298. [71] WANG H, WANG Y T, ZHOU Z, et al. CosFace: Large margin cosine loss for deep face recognition[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 5265-5274. [72] RUIZ N, CHONG E, REHG J M. Fine-grained head pose estimation without keypoints[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition workshops. Salt Lake City, USA: IEEE, 2018: 2074-2083. [73] DENG Y, YANG J L, XU S C, et al. Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach, USA: IEEE, 2019: 285-295. [74] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [75] CHEN L L, LI Z H, MADDOX R K, et al. Lip movements generation at a glance[C]// Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018: 538-553. [76] CHUNG J S, ZISSERMAN A. Out of time: Automated lip sync in the wild[C]// Proceedings of 2016 Asian Conference on Computer Vision. Taipei, China: Springer, 2017: 251-263. [77] DENG J K, GUO J, XUE N N, et al. ArcFace: Additive angular margin loss for deep face recognition[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019: 4685-4694. [78] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017: 6629-6640. [79] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 586-595. [80] BIHKOWSKI M, SUTHERLAND D J, ARBEL M, et al. Demystifying MMD GANs[C]// Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: OpenReview.net, 2018. [81] MATERN F, RIESS C, STAMMINGER M. Exploiting visual artifacts to expose deepfakes and face manipulations[C]// Proceedings of 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW). Waikoloa, USA: IEEE, 2019: 83-92. [82] KORSHUNOV P, MARCEL S. DeepFakes: A new threat to face recognition? Assessment and detection[EB/OL].[2022-09-14]. https://arxiv.org/abs/1812.08685. [83] DeepfakeDetection. DeepfakeDetection github[EB/OL].[2022-09-14]. https://github.com/ondyari/FaceForensics. [84] DOLHANSKY B, HOWES R, PFLAUM B, et al. The deepfake detection challenge (DFDC) preview dataset[EB/OL].[2022-09-14]. https://arxiv.org/abs/1910.08854. [85] JIANG L M, LI R, WU W, et al. DeeperForensics-1.0: A large-scale dataset for real-world face forgery detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 2886-2895. [86] LI Y Z, YANG X, SUN P, et al. Celeb-DF (v2): A new dataset for deepfake forensics[EB/OL].[2022-09-14]. https://arxiv.org/abs/1909.12962. [87] ZI B J, CHANG M H, CHEN J J, et al. WildDeepfake: A challenging real-world dataset for deepfake detection[C]// Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 2382-2390. [88] ZHOU T F, WANG W G, LIANG Z Y, et al. Face forensics in the wild[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 5774-5784. [89] HE Y N, GAN B, CHEN S Y, et al. ForgeryNet: A versatile benchmark for comprehensive forgery analysis[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 4358-4367. [90] LE T N, NGUYEN H H, YAMAGISHI J, et al. OpenForensics: Large-scale challenging dataset for multi-face forgery detection and segmentation in-the-wild[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 10097-10107. [91] SHAO R, WU T X, LIU Z W. Detecting and recovering sequential deepfake manipulation[C]// Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022: 712-728. [92] KHALID H, TARIQ S, KIM M, et al. FakeAVCeleb: A novel audio-video multimodal deepfake dataset[C]// Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1. Curran Associates, Inc, 2021. [93] CAI Z X, STEFANOV K, DHALL A, et al. Do you really mean that? Content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization[EB/OL].[2022-09-14]. https://arxiv.org/abs/2204.06228. [94] SANDERSON C. The vidtimit database[EB/OL].[2022-09-14]. https://conradsanderson.id.au/vidtimit/. [95] KUZNETSOVA A, ROM H, ALLDRIN N, et al. The open images dataset v4[J]. International Journal of Computer Vision, 2020, 128(7): 1956-1981.