[1] Deepfakes. Deepfakes github[EB/OL].[2022-09-14]. https://github.com/Deepfakes/faceswap.
[2] Zao. Zao app.[EB/OL]. (2019-12-01)[2022-09-14]. https://zaodownload.com/download-zao-app-deepfake.
[3] Face app. Face app[EB/OL].[2022-09-14]. https://apps.apple.com/gb/app/faceapp-ai-face-editor/id1180884341.
[4] DOLHANSKY B, BITTON J, PFLAUM B, et al. The deepfake detection challenge (DFDC) dataset[EB/OL].[2022-09-14]. https://arxiv.org/abs/2006.07397.
[5] MIRSKY Y, LEE W. The creation and detection of deepfakes: A survey[J]. ACM Computing Surveys, 2022, 54(1): 7.
[6] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014: 2672-2680.
[7] XU C, ZHANG J N, HUA M, et al. Region-aware face swapping[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 7622-7631.
[8] PRAJWAL K R, MUKHOPADHYAY R, NAMBOODIRI V P, et al. A lip sync expert is all you need for speech to lip generation in the wild[C]// Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 484-492.
[9] LIANG B R, PAN Y, GUO Z Z, et al. Expressive talking head generation with granular audio-visual control[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 3377-3386.
[10] SCHWARZ K, LIAO Y Y, NIEMEYER M, et al. Graf: Generative radiance fields for 3D-aware image synthesis[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2020: 1692.
[11] SUN J X, WANG X, ZHANG Y, et al. FENeRf: Face editing in neural radiance fields[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 7662-7672.
[12] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019: 4396-4405.
[13] SUN J X, DENG Q Y, LI Q, et al. AnyFace: Free-style text-to-face synthesis and manipulation[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 18666-18675.
[14] SHANG Z H, XIE H T, ZHA Z J, et al. PRRNet: Pixel-region relation network for face forgery detection[J]. Pattern Recognition, 2021, 116: 107950.
[15] FaceSwap. FaceSwap github[EB/OL].[2022-09-14]. https://github.com/MarekKowalski/FaceSwap.
[16] LI L Z, BAO J M, YANG H, et al. Advancing high fidelity identity swapping for forgery detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 5073-5082.
[17] CHEN R W, CHEN X H, NI B B, et al. SimSwap: An efficient framework for high fidelity face swapping[C]// Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 2003-2011.
[18] XU Y Y, DENG B L, WANG J L, et al. High-resolution face swapping via latent semantics disentanglement[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 7632-7641.
[19] KIM J, LEE J, ZHANG B T. Smooth-Swap: A simple enhancement for face-swapping with smoothness[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 10769-10778.
[20] THIES J, ZOLLHÖFER M, STAMMINGER M, et al. Face2Face: Real-time face capture and reenactment of RGB videos[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 2387-2395.
[21] THIES J, ZOLLHÖFER M, NIEβNER M. Deferred neural rendering: Image synthesis using neural textures[J]. ACM Transactions on Graphics, 2019, 38(4): 66.
[22] WILES O, KOEPKE A S, ZISSERMAN A. X2Face: A network for controlling face generation using images, audio, and pose codes[C]// Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018: 690-706.
[23] SIAROHIN A, LATHUILIōRE S, TULYAKOV S, et al. First order motion model for image animation[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS Foundation, 2019: 641.
[24] HSU G S, TSAI C H, WU H Y. Dual-generator face reenactment[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 632-640.
[25] ZHOU Y, HAN X T, SHECHTMAN E, et al. MakeltTalk: Speaker-aware talking-head animation[J]. ACM Transactions on Graphics, 2020, 39(6): 221.
[26] ZHOU H, SUN Y S, WU W, et al. Pose-controllable talking face generation by implicitly modularized audio-visual representation[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 4174-4184.
[27] ZHANG C X, ZHAO Y F, HUANG Y F, et al. FACIAL: Synthesizing dynamic talking face with implicit attribute learning[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 3847-3856.
[28] YU L Y, XIE H T, ZHANG Y D. Multimodal learning for temporally coherent talking face generation with articulator synergy[J]. IEEE Transactions on Multimedia, 2022, 24: 2950-2962.
[29] SUWAJANAKORN S, SEITZ S M, KEMELMACHER- SHLIZERMAN I. Synthesizing obama: Learning lip sync from audio[J]. ACM Transactions on Graphics, 2017, 36(4): 95.
[30] GUO Y D, CHEN K Y, LIANG S, et al. AD-NeRF: Audio driven neural radiance fields for talking head synthesis[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 5764-5774.
[31] MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2022, 65(1): 99-106.
[32] SONG H K, WOO S H, LEE J, et al. Talking face generation with multilingual TTS[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 21393-21398.
[33] PERARNAU G, VAN DE WEIJER J, RADUCANU B, et al. Invertible conditional GANs for image editing[EB/OL].[2022-09-14]. https://arxiv.org/abs/1611.06355.
[34] HE Z L, ZUO W M, KAN M N, et al. AttGAN: Facial attribute editing by only changing what you want[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5464-5478.
[35] CHAN E R, MONTEIRO M, KELLNHOFER P, et al. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 5795-5805.
[36] NIEMEYER M, GEIGER A. GIRAFFE: Representing scenes as compositional generative neural feature fields[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 11448-11459.
[37] SHEN Y J, GU J J, TANG X O, et al. Interpreting the latent space of GANs for semantic face editing[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 9240-9249.
[38] YAO X, NEWSON A, GOUSSEAU Y, et al. A latent transformer for disentangled face editing in images and videos[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 13769-13778.
[39] XU Y B, YIN Y Q, JIANG L M, et al. TransEditor: Transformer-based dual-space GAN for highly controllable facial editing[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 7673-7682.
[40] JIANG Y M, HUANG Z Q, PAN X G, et al. Talk-to-edit: Fine-grained facial editing via dialog[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 13779-13788.
[41] ODENA A, OLAH C, SHLENS J. Conditional image synthesis with auxiliary classifier GANs[C]// Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 2642-2651.
[42] ARJOVSKY M, BOTTOU L. Towards principled methods for training generative adversarial networks[C]// Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview.net, 2017.
[43] KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[C]// Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: OpenRevier.net, 2018.
[44] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 8107-8116.
[45] XIA W H, YANG Y J, XUE J H, et al. TediGAN: Text-guided diverse face image generation and manipulation[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 2256-2265.
[46] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021: 8748-8763.
[47] LI L Z, BAO J M, ZHANG T, et al. Face X-ray for more general face forgery detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 5000-5009.
[48] ZHAO H Q, WEI T Y, ZHOU W B, et al. Multi-attentional deepfake detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 2185-2194.
[49] QIAN Y Y, YIN G J, SHENG L, et al. Thinking in frequency: Face forgery detection by mining frequency-aware clues[C]// Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020: 86-103.
[50] LI J M, XIE H T, LI J H, et al. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 6454-6463.
[51] ZHENG Y L, BAO J M, CHEN D, et al. Exploring temporal coherence for more general video face forgery detection[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 15024-15034.
[52] HALIASSOS A, MIRA R, PETRIDIS S, et al. Leveraging real talking faces via self-supervision for robust forgery detection[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 14930-14942.
[53] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 1800-1807.
[54] TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks[C]// Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR, 2019: 6105-6114.
[55] ZHAO T C, XU X, XU M Z, et al. Learning self-consistency for deepfake detection[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 15003-15013.
[56] SHIOHARA K, YAMASAKI T. Detecting deepfakes with self-blended images[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 18699-18708.
[57] CAO J Y, MA C, YAO T P, et al. End-to-end reconstruction-classification learning for face forgery detection[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 4103-4112.
[58] DONG S C, WANG J, LIANG J J, et al. Explaining deepfake detection by analysing image matching[C]// Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022: 18-35.
[59] WANG J K, WU Z X, OUYANG W H, et al. M2TR: Multi-modal multi-scale transformers for deepfake detection[C]// Proceedings of the 2022 International Conference on Multimedia Retrieval. Newark, USA: ACM, 2022: 615-623.
[60] MASI I, KILLEKAR A, MASCARENHAS R M, et al. Two-branch recurrent network for isolating deepfakes in videos[C]// Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020: 667-684.
[61] RUFF L, GÖRNITZ N, DEECKE L, et al. Deep one-class classification[C]// Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018: 4390-4399.
[62] HALIASSOS A, VOUGIOUKAS K, PETRIDIS S, et al. Lips don't lie: A generalisable and robust approach to face forgery detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 5037-5047.
[63] CHUNG J S, ZISSERMAN A. Lip reading in the wild[C]// Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer, 2017: 87-103.
[64] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
[65] TRAN D, WANG H, FEISZLI M, et al. Video classification with channel-separated convolutional networks[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019: 5551-5560.
[66] RÖSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics++: Learning to detect manipulated facial images[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019: 1-11.
[67] NAGRANI A, CHUNG J S, ZISSERMAN A. VoxCeleb: A large-scale speaker identification dataset[C]// Proceedings of the 18th Annual Conference of the International Speech Communication Association. Stockholm, Sweden: ISCA, 2017: 2616-2620.
[68] CHUNG J S, NAGRANI A, ZISSERMAN A. VoxCeleb2: Deep speaker recognition[C]// Proceedings of the 19th Annual Conference of the International Speech Communication Association. Hyderabad, India: ISCA, 2018: 1086-1090.
[69] WANG K, WU Q Y, SONG L S, et al. MEAD: A large-scale audio-visual dataset for emotional talking-face generation[C]// Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020: 700-717.
[70] SUN J X, LI Q, WANG W N, et al. Multi-caption text-to-face synthesis: Dataset and algorithm[C]// Proceedings of the 29th ACM International Conference on Multimedia. China: ACM, 2021: 2290-2298.
[71] WANG H, WANG Y T, ZHOU Z, et al. CosFace: Large margin cosine loss for deep face recognition[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 5265-5274.
[72] RUIZ N, CHONG E, REHG J M. Fine-grained head pose estimation without keypoints[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition workshops. Salt Lake City, USA: IEEE, 2018: 2074-2083.
[73] DENG Y, YANG J L, XU S C, et al. Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach, USA: IEEE, 2019: 285-295.
[74] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[75] CHEN L L, LI Z H, MADDOX R K, et al. Lip movements generation at a glance[C]// Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018: 538-553.
[76] CHUNG J S, ZISSERMAN A. Out of time: Automated lip sync in the wild[C]// Proceedings of 2016 Asian Conference on Computer Vision. Taipei, China: Springer, 2017: 251-263.
[77] DENG J K, GUO J, XUE N N, et al. ArcFace: Additive angular margin loss for deep face recognition[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019: 4685-4694.
[78] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017: 6629-6640.
[79] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 586-595.
[80] BIHKOWSKI M, SUTHERLAND D J, ARBEL M, et al. Demystifying MMD GANs[C]// Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: OpenReview.net, 2018.
[81] MATERN F, RIESS C, STAMMINGER M. Exploiting visual artifacts to expose deepfakes and face manipulations[C]// Proceedings of 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW). Waikoloa, USA: IEEE, 2019: 83-92.
[82] KORSHUNOV P, MARCEL S. DeepFakes: A new threat to face recognition? Assessment and detection[EB/OL].[2022-09-14]. https://arxiv.org/abs/1812.08685.
[83] DeepfakeDetection. DeepfakeDetection github[EB/OL].[2022-09-14]. https://github.com/ondyari/FaceForensics.
[84] DOLHANSKY B, HOWES R, PFLAUM B, et al. The deepfake detection challenge (DFDC) preview dataset[EB/OL].[2022-09-14]. https://arxiv.org/abs/1910.08854.
[85] JIANG L M, LI R, WU W, et al. DeeperForensics-1.0: A large-scale dataset for real-world face forgery detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 2886-2895.
[86] LI Y Z, YANG X, SUN P, et al. Celeb-DF (v2): A new dataset for deepfake forensics[EB/OL].[2022-09-14]. https://arxiv.org/abs/1909.12962.
[87] ZI B J, CHANG M H, CHEN J J, et al. WildDeepfake: A challenging real-world dataset for deepfake detection[C]// Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 2382-2390.
[88] ZHOU T F, WANG W G, LIANG Z Y, et al. Face forensics in the wild[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 5774-5784.
[89] HE Y N, GAN B, CHEN S Y, et al. ForgeryNet: A versatile benchmark for comprehensive forgery analysis[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 4358-4367.
[90] LE T N, NGUYEN H H, YAMAGISHI J, et al. OpenForensics: Large-scale challenging dataset for multi-face forgery detection and segmentation in-the-wild[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 10097-10107.
[91] SHAO R, WU T X, LIU Z W. Detecting and recovering sequential deepfake manipulation[C]// Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022: 712-728.
[92] KHALID H, TARIQ S, KIM M, et al. FakeAVCeleb: A novel audio-video multimodal deepfake dataset[C]// Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1. Curran Associates, Inc, 2021.
[93] CAI Z X, STEFANOV K, DHALL A, et al. Do you really mean that? Content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization[EB/OL].[2022-09-14]. https://arxiv.org/abs/2204.06228.
[94] SANDERSON C. The vidtimit database[EB/OL].[2022-09-14]. https://conradsanderson.id.au/vidtimit/.
[95] KUZNETSOVA A, ROM H, ALLDRIN N, et al. The open images dataset v4[J]. International Journal of Computer Vision, 2020, 128(7): 1956-1981.