基于攻击引导扩散的中文对抗样本生成方法

吴厚月; 李现伟; 张顺香; 朱洪浩; 王婷

doi:10.16511/j.cnki.qhdxxb.2024.21.027

清华大学学报（自然科学版） >

2024 , Vol. 64 >Issue 12: 1997 - 2006

DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2024.21.027

专题：大数据

基于攻击引导扩散的中文对抗样本生成方法

吴厚月 ,
李现伟 ,
张顺香 ,
朱洪浩 ,
王婷

展开

1. 蚌埠学院计算机与信息工程学院, 蚌埠 233030;
2. 安徽工业大学安徽省工业互联网智能应用与安全工程研究中心, 马鞍山 243032;
3. 安徽理工大学计算机科学与工程学院, 淮南 232000;
4. 合肥综合性国家科学中心人工智能研究院, 合肥 240088;
5. 淮南联合大学信息工程学院, 淮南 232000

收稿日期: 2024-06-16

网络出版日期: 2024-11-22

基金资助

国家自然科学基金面上项目(62076006);认知智能全国重点实验室开放课题(COGOS-2023HE02);安徽省高校协同创新项目(GXXT-2021-008);安徽省高校自然科学研究重点项目(2022AH051921,2022AH051909);安徽省高校优秀青年人才支持计划重点项目(gxyqZD2021135);蚌埠学院高层次人才科研启动基金(BBXY2020KYQD02);安徽工业大学工程研究中心开放项目(IASII22-08);蚌埠学院2024年校级科研一般项目(2024ZR02,2024ZR03);蚌埠学院2024年校级科研应用型科研项目(2024YYX48pj)

收起

Attack-guided diffusion model for Chinese adversarial samples generation

WU Houyue ,
LI Xianwei ,
ZHANG Shunxiang ,
ZHU Honghao ,
WANG Ting

Expand

1. School of Computer and Information Engineering, Bengbu University, Bengbu 233030, China;
2. Anhui Engineering Research Center for Intelligent Applications and Security of Industrial Internet, Anhui University of Technology, Ma'anshan 243032, China;
3. School of Computer Science and Engineering, Anhui University of Science & Technology, Huainan 232000, China;
4. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 240088, China;
5. School of Information Engineering, Huainan Union University, Huainan 232000, China

Received date: 2024-06-16

Online published: 2024-11-22

Fold

摘要

中文对抗样本生成作为自然语言处理领域的重要研究内容,一直受到众多学者的广泛关注。先前的中文对抗样本生成方法主要有替换字词、改变词序等,生成的对抗样本攻击效果差且容易被检测模型识别。该文提出基于攻击引导扩散的中文对抗样本生成方法DiffuAdv。将扩散模型引入中文对抗样本生成中,通过模拟文本对抗样本攻击时的数据分布来增强其扩散机制,利用对抗样本与原始样本之间的变化梯度作为引导条件,在预训练阶段指导模型的逆扩散过程,进而生成更自然且攻击成功率更高的对抗样本。在多个数据集上对自然语言处理领域的不同任务与多种方法进行了对比实验验证。结果表明,本文方法所生成的对抗样本具有高攻击成功率。此外,消融实验也验证了攻击梯度引导在提高对抗样本生成质量的有效性。经过困惑度(PPL)度量实验,本文方法所生成的对抗样本平均PPL仅为0.518,验证了其具有强鲁棒性。DiffuAdv的提出丰富了文本对抗样本生成的研究视角,也拓宽了文本情感分类、因果关系抽取及情感原因对抽取等任务的研究思路。

关键词： 对抗样本生成; 引导扩散; 条件扩散; 扩散模型; 文本生成

本文引用格式

吴厚月 , 李现伟 , 张顺香 , 朱洪浩 , 王婷 . 基于攻击引导扩散的中文对抗样本生成方法[J]. 清华大学学报（自然科学版）, 2024 , 64(12) : 1997 -2006 . DOI: 10.16511/j.cnki.qhdxxb.2024.21.027

Abstract

[Objective] The generation of adversarial samples in text represents a significant area of research in natural language processing. The process is employed to test the robustness of machine learning models and has gained widespread attention from scholars. Owing to the complex nature of Chinese semantics, generating Chinese adversarial samples remains a major challenge. Traditional methods for generating Chinese adversarial samples mainly involve word replacement, deletion/insertion, and word order adjustment. These methods often produce samples that are easily detectable and have low attack success rates, and thus, the methods struggle to balance attack effectiveness and semantic coherence. To address these limitations, this study introduces DiffuAdv, a novel method for generating Chinese adversarial samples. This approach enhances the generation process by simulating the data distribution during the adversarial attack phase. The gradient changes between adversarial and original samples are used as guiding conditions during the model's reverse diffusion phase in pre-training, resulting in the generation of more natural and effective adversarial samples. [Methods] DiffuAdv entails the introduction of diffusion models into the generation of adversarial samples to improve attack success rates while ensuring the naturalness of the generated text. This method utilizes a gradient-guided diffusion process, leveraging gradient information between original and adversarial samples as guiding conditions. It consists of two stages: forward diffusion and reverse diffusion. In the forward diffusion stage, noise is progressively added to the original data until a noise-dominated state is achieved. The reverse diffusion stage involves the reconstruction of samples, in which the gradient changes between adversarial and original samples are leveraged to maximize the adversarial objective. During the pre-training phase, data capture and feature learning occur under gradient guidance, with the aim of learning the data distribution of original samples and analyzing the deviations from adversarial samples. In the reverse diffusion generation phase, adversarial perturbations are constructed using gradients and integrated into the reverse diffusion process, ensuring that at each step of reverse diffusion, samples evolve toward greater adversarial effectiveness. To validate the effectiveness of the proposed method, extensive experiments are conducted across multiple datasets and various natural language processing tasks, and the performance of the method is compared with those of seven existing state-of-the-art methods. [Results] Compared with existing methods for generating Chinese adversarial samples, DiffuAdv demonstrates higher attack success rates across three tasks: text sentiment classification, causal relation extraction, and sentiment cause extraction. Ablation experiments confirm the effectiveness of using gradient changes between original and adversarial samples to guide the generation of adversarial samples and improve their quality. Perplexity (PPL) measurements indicate that the adversarial samples generated by DiffuAdv have an average PPL value of only 0.518, demonstrating that these samples are superior in rationality and readability compared with the samples generated by other methods. [Conclusions] DiffuAdv effectively generates high-quality adversarial samples that closely resemble real text in terms of fluency and naturalness. The adversarial samples produced by this method not only achieve high attack success rates but also exhibit strong robustness. The introduction of DiffuAdv enhances the research perspective on generating adversarial text samples and broadens the approaches for tasks such as text sentiment classification, causal relationship extraction, and emotion-cause pair extraction.

Key words： adversarial sample generation; guided diffusion; conditional diffusion; diffusion model; text generation

参考文献

[1] ALMIANI M, ABUGHAZLEH A, JARARWEH Y, et al. Resilient back propagation neural network security model for containerized cloud computing[J]. Simulation Modelling Practice and Theory, 2022, 118: 102544.
[2] SAGU A, GILL N S, GULIA P. Hybrid deep neural network model for detection of security attacks in IoT enabled environment[J]. International Journal of Advanced Computer Science and Applications, 2022, 13(1): 120-127.
[3] XIONG Z B, CAI Z P, HU C Q, et al. Towards neural network-based communication system: Attack and defense[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20(4): 3238-3250.
[4] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[EB/OL]. (2019-06-11)[2024-04-18]. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
[5] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: NAACL-HLT, 2019: 4171-4186.
[6] OPENAI. ChatGPT (GPT-4 turbo version)[Large language model][R/OL]. (2022-11-30)[2024-03-22]. https://chat.openai.com/chat.
[7] ZHANG J P, HUANG J T, WANG W X, et al. Improving the transferability of adversarial samples by path-augmented method[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 8173-8182.
[8] 张顺香, 吴厚月, 朱广丽, 等. 面向中文文本分类的字符级对抗样本生成方法[J]. 电子与信息学报, 2023, 45(6): 2226-2235. ZHANG S X, WU H Y, ZHU G L, et al. Character-level adversarial samples generation approach for Chinese text classification[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2226-2235. (in Chinese)
[9] 韩子屹, 王巍, 玄世昌. 多约束引导的中文对抗样本生成[J]. 中文信息学报, 2023, 37(2): 41-52. HAN Z Y, WANG W, XUAN S C. Chinese adversarial example generation guided by multi-constraints[J]. Journal of Chinese Information Processing, 2023, 37(2): 41-52. (in Chinese)
[10] 夏倪明, 张洁. 基于自适应集束搜索算法的中文对抗样本生成[J/OL]. 计算机工程. (2024-05-29)[2024-06-20]. https://link.cnki.net/doi/10.19678/j.issn.1000-3428.0069348. XIA N M, ZHANG J. Chinese text adversarial examples generation based on adaptive beam search[J/OL]. Computer Engineering. (2024-05-29)[2024-06-20]. https://link.cnki.net/doi/10.19678/j.issn.1000-3428.0069348. (in Chinese)
[11] SONG X F, XU D H, PENG C, et al. A two-stage frequency-domain generation algorithm based on differential evolution for black-box adversarial samples[J]. Expert Systems with Applications, 2024, 249: 123741.
[12] HOOGEBOOM E, NIELSEN D, JAINI P, et al. Argmax flows and multinomial diffusion: Learning categorical distributions[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc., 2021:12454-12465.
[13] AUSTIN J, JOHNSON D D, HO J, et al. Structured denoising diffusion models in discrete state-spaces[C]//Proceedings of the 34th Annual Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc., 2021: 17981-17993.
[14] CHEN S F, SUN P Z, SONG Y B, et al. DiffusionDet: Diffusion model for object detection[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 2023: 19773-19786.
[15] HO C J, TAI C H, LIN Y Y, et al. Diffusion-SS3D: Diffusion model for semi-supervised 3D object detection[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans, LA, USA: Curran Associates Inc., 2024: 2134.
[16] GONG S S, LI M K, FENG J T, et al. DiffuSeq: Sequence to sequence text generation with diffusion models[C]//Proceedings of the 11th International Conference on Learning Representations (ICLR). Kigali, Rwanda: ICLR, 2023.
[17] GONG S S, LI M K, FENG J T, et al. DiffuSeq-v2: Bridging discrete and continuous text spaces for accelerated seq2seq diffusion models[C] //Findings of the Association for Computational Linguistics: EMNLP. Singapore: ACL, 2023: 9868-9875.
[18] LIANG B, LI H C, SU M Q, et al. Deep text classification can be fooled[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: AAAI Press, 2018: 4208-4215.
[19] EBRAHIMI J, RAO A Y, LOWD D, et al. HotFlip: White-box adversarial examples for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018: 31-36.
[20] LIU M X, ZHANG Z H, ZHANG Y M, et al. Automatic generation of adversarial readable Chinese texts[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20(2): 1756-1770.
[21] CHEN M S, MEI S, FAN J Q, et al. An overview of diffusion models: Applications, guided generation, statistical rates and optimization[EB/OL]. (2024-04-16)[2024-05-18]. https://doi.org/10.48550/arXiv.2404.07771.
[22] DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc., 2021: 8780-8794.
[23] AJAY A, DU Y L, GUPTA A, et al. Is conditional generative modeling all you need for decision making?[C]//Proceedings of the 11th International Conference on Learning Representations. Kigali, Rwanda: ICLR, 2023: 4940.
[24] 胡忠义, 秦维, 吴江. 基于改进扩散模型的电商营销文本的自动生成研究[J/OL]. 数据分析与知识发现. (2024-04-16)[2024-04-20]. http://kns.cnki.net/kcms/detail/10.1478.G2.20240415.1125.002.html. HU Z Y, QIN W, WU J. Automatic generation of e-commerce marketing text based on improved diffusion model[J/OL]. Data Analysis and Knowledge Discovery. (2024-04-16)[2024-04-20]. http://kns.cnki.net/kcms/detail/10.1478.G2.20240415.1125.002.html. (in Chinese)
[25] 陈子民, 关志涛. 基于条件扩散模型的图像分类对抗样本防御方法[J/OL]. 计算机工程. (2024-03-26)[2024-04-19]. https://link.cnki.net/doi/10.19678/j.issn.1000-3428. 0068512. CHEN Z M, GUAN Z T. Image classification adversarial defense based on classifier-free diffusion model[J/OL]. Computer Engineering. (2024-03-26)[2024-04-19]. https://link.cnki.net/doi/10.19678/j.issn.1000-3428.0068512. (in Chinese)
[26] 何琨, 佘计思, 张子君, 等. 基于引导扩散模型的自然对抗补丁生成方法[J]. 电子学报, 2024, 52(2): 564-573. HE K, SHE J S, ZHANG Z J, et al. A guided diffusion-based approach to natural adversarial patch gen-eration[J]. Acta Electronica Sinica, 2024, 52(2): 564-573. (in Chinese)
[27] 徐瑞, 曾诚, 程世杰, 等. 基于双三元组网络的易混淆文本情感分类方法[J]. 中文信息学报, 2024, 38(1): 135-145. XU R, ZENG C, CHENG S J, et al. Double triplet network for confusing text sentiment classification[J]. Journal of Chinese Information Processing, 2024, 38(1): 135-145. (in Chinese)
[28] 朱广丽, 许鑫, 张顺香, 等. PosNet: 基于位置的因果关系抽取网络[J]. 计算机科学, 2022, 49(12): 305-311. ZHU G L, XU X, ZHANG S X, et al. PosNet: Position-based causal relation extraction network[J]. Computer Science, 2022, 49(12): 305-311. (in Chinese)
[29] SU X X, HUANG Z, ZHAO Y X, et al. Recent trends in deep learning-based textual emotion cause extraction[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 2765-2786.
[30] SHANG X C, CHEN C X, CHEN Z P, et al. Modularized mutuality network for emotion-cause pair extraction[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 539-549.
[31] 何天文, 王红. 基于语义语法分析的中文语句困惑度评价[J]. 计算机应用研究, 2017, 34(12): 3538-3542, 3546. HE T W, WANG H. Evaluating perplexity of Chinese sentences based on grammar & semantics analysis[J]. Application Research of Computers, 2017, 34(12): 3538-3542, 3546. (in Chinese)
[32] 李相葛, 罗红, 孙岩. 基于汉语特征的中文对抗样本生成方法[J]. 软件学报, 2023, 34(11): 5143-5161. LI X G, LUO H, SUN Y. Adversarial sample generation method based on Chinese features[J]. Journal of Software, 2023, 34(11): 5143-5161. (in Chinese)
[33] OU H X, YU L, TIAN S W, et al. Chinese adversarial examples generation approach with multi-strategy based on semantic[J]. Knowledge and Information Systems, 2022, 64(4): 1101-1119.
[34] 王文琦, 汪润, 王丽娜, 等. 面向中文文本倾向性分类的对抗样本生成方法[J]. 软件学报, 2019, 30(8): 2415-2427. WANG W Q, WANG R, WANG L N, et al. Adversarial examples generation approach for tendency classification on Chinese texts[J]. Journal of Software, 2019, 30(8): 2415-2427. (in Chinese)
[35] 仝鑫, 王罗娜, 王润正, 等. 面向中文文本分类的词级对抗样本生成方法[J]. 信息网络安全, 2020, 20(9): 12-16. TONG X, WANG L N, WANG R Z, et al. A generation method of word-level adversarial samples for Chinese text classiifcation[J]. Netinfo Security, 2020, 20(9): 12-16. (in Chinese)
[36] LI L Y, MA R T, GUO Q P, et al. BERT-ATTACK: Adversarial attack against Bert using Bert[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Virtual Event: ACL, 2020: 6193-6202.
[37] 张千锟, 韩虎, 郝俊. 基于双注意力融合知识的方面级情感分类[J]. 计算机工程与科学, 2023, 45(10): 1866-1873. ZHANG Q K, HAN H, HAO J. Aspect-level sentiment classification based on dual attention fusion knowledge[J]. Computer Engineering and Science, 2023, 45(10): 1866-1873. (in Chinese)
[38] MAO X L, CHANG S, SHI J J, et al. Sentiment-aware word embedding for emotion classification[J]. Applied Sciences, 2019, 9(7): 1334.
[39] 周艳玲, 兰正寅, 张, 等. 融合交替归一化的细粒度情感分类研究[J]. 中文信息学报, 2023, 37(9): 140-149. ZHOU Y L, LAN Z Y, ZHANG Y, et al. Fine-grained sentiment classification based on alternating normalization[J]. Journal of Chinese Information Processing, 2023, 37(9): 140-149. (in Chinese)
[40] 崔仕林, 闫蓉. 基于SoftLexicon和注意力机制的中文因果关系抽取[J]. 中文信息学报, 2023, 37(4): 81-89. CUI S L, YAN R. Chinese causality extraction based on SoftLexicon and attention mechanism[J]. Journal of Chinese Information Processing, 2023, 37(4): 81-89. (in Chinese)
[41] 邓金科, 段文杰, 张顺香, 等. 基于提示增强与双图注意力网络的复杂因果关系抽取[J/OL]. 计算机应用. (2024-01-30)[2024-04-18]. http://kns.cnki.net/kcms/detail/51.1307.tp.20240129.0903.002.html. DENG J K, DUAN W J, ZHANG S X, et al. Complex causal relationship extraction based on prompt enhancement and bi-graph attention network[J/OL]. Journal of Computer Applications. (2024-01-30)[2024-04-18]. http://kns.cnki.net/kcms/detail/51.1307.tp.20240129.0903.002.html. (in Chinese)
[42] 张顺香, 张镇江, 朱广丽, 等. 基于Bi-LSTM与双路CNN的金融领域文本因果关系识别[J]. 数据分析与知识发现, 2022, 6(7): 118-127. ZHANG S X, ZHANG Z J, ZHU G L, et al. Identifying financial text causality with Bi-LSTM and two-way CNN[J]. Data Analysis and Knowledge Discovery, 2022, 6(7): 118-127. (in Chinese)
[43] 代建华, 邓育彬. 基于情感膨胀门控CNN的情感—原因对提取[J]. 数据分析与知识发现, 2020, 4(8): 98-106. DAI J H, DENG Y B. Extracting emotion-cause pairs based on emotional dilation gated CNN[J]. Data Analysis and Knowledge Discovery, 2020, 4(8): 98-106. (in Chinese)
[44] 张思阳, 魏苏波, 孙争艳, 等. 基于多标签Seq2Seq模型的情绪—原因对提取模型[J]. 数据分析与知识发现, 2023, 7(2): 86-96. ZHANG S Y, WEI S B, SUN Z Y, et al. Extracting emotion-cause pairs based on multi-label Seq2Seq model[J]. Data Analysis and Knowledge Discovery, 2023, 7(2): 86-96. (in Chinese)
[45] LI C B, HU J, LI T R, et al. An effective multi-task learning model for end-to-end emotion-cause pair extraction[J]. Applied Intelligence, 2023, 53(3): 3519-3529.
[46] CHEN F, SHI Z W, YANG Z L, et al. Recurrent synchronization network for emotion-cause pair extraction[J]. Knowledge-Based Systems, 2022, 238: 107965.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献

访问统计