Abstract:The importance of each word in a text sequence and the dependencies between them have a significant impact on identifying the text categories. Capsule networks cannot selectively focus on important words in texts. Moreover, it is not possible to encode long-distance dependencies, therefore there are significant limitations in identifying texts with semantic transitions. In order to solve the above problems, this paper proposes a capsule networks based on multi-head attention, which can encode the dependencies between words, capture important words in texts, and encode the semantic of texts, thus effectively improve the effect of text classification task. The experimental results show that the model of this paper is better than the convolutional neural network and the capsule networks in the text classification task, it is more effective in the multi-label text classification task. In addition, it proves that this model can benefit better from the attention.
贾旭东, 王莉. 基于多头注意力胶囊网络的文本分类模型[J]. 清华大学学报(自然科学版), 2020, 60(5): 415-421.
JIA Xudong, WANG Li. Text classification model based on multi-head attention capsule neworks. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 415-421.
[1] JOACHIMS T. Text categorization with suport vector machines:Learning with many relevant features[C]//Proceedings of the 10th European Conference on Machine Learning. Chemnitz, Germany:Springer, 1998:137-142. [2] MCCALLUM A, NIGAM K. A comparison of event models for naive bayes text classification[C]//AAAI-98 Workshop on Learning for Text Categorization. Madison, Wisconsin:AAAI, 1998:41-48. [3] ZHANG W, YOSHIDA T, TANG X J. TFIDF, LSI and multi-word in information retrieval and text categorization[C]//Proceedings of 2008 IEEE International Conference on Systems, Man and Cybernetics. Singapore:IEEE, 2008:108-113. [4] LIN C Y, HOVY E. Automatic evaluation of summaries using N-gram co-occurrence statistics[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, Canada:ACL, 2003. DOI:10.3115/1073445.1073465. [5] GENKIN A, LEWIS D D, MADIGAN D. Large-scale Bayesian logistic regression for text categorization[J]. Technometrics, 2007, 49(3):291-304. [6] PANG B, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA:ACL, 2002:79-86. [7] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:3111-3119. [8] PENNINGTON J, SOCHER R, MANNING C D. GloVe:Global vectors for word representation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar:ACL, 2014:1532-1543. [9] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv, 2014:1408.5882. [10] CONNEAU A, SCHWENK H, CUN Y L, et al. Very deep convolutional networks for text classification[C]//Proceedings of the 15th European Chapter of the Association for Computational Linguistics. Valencia, Spain:ACL, 2017:1107-1116. [11] MOUSA A, SCHULLER B. Contextual bidirectional long short-term memory recurrent neural network language models:A generative approach tosentiment analysis[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain:ACL, 2017:1023-1032. [12] XI E, BING S, JIN Y. Capsule network performance on complex data[J]. arXiv preprint arXiv, 2017:1712.03480. [13] YANG M, ZHAO W, YE J B, et al. Investigating capsule networks with dynamic routing for text classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium:ACL, 2018:3110-3119. [14] HINTON G E, KRIZHEVSKY A, WANG S D. Transforming auto-encoders[C]//Proceedings of the 21st International Conference on Artificial Neural Networks. Espoo, Finland:Springer, 2011:44-51. [15] SABOUR S, FROSST N, HINTON G E. Dynamic routing between capsules[C]//Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, USA:Curran Associates Inc., 2017:3856-3866. [16] HINTON G, SABOUR S, FROSST N. Matrix capsules with EM routing[C]//International Conference on Learning Representations. Toronto, Canada, 2018. [17] LIN Z H, FENG M W, SANTOS C N D, et al. A structured self-attentive sentence embedding[C]//International Conference on Learning Representations. Toulon, France, 2017. [18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of 31st Conference on Neural Information Processing Systems. Long Beach, USA:Curran Associates Inc., 2017:5998-6008. [19] KIM Y, LEE H, JUNG K. AttnConvnet at SemEval-2018 task 1:Attention-based convolutional neural networks for multi-label emotion classification[C]//Proceedings of the 12th International Workshop on Semantic Evaluation. New Orleans, Louisiana:ACL, 2018:141-145. [20] LUONG M T, PHAM H, MANNING C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal:ACL, 2015:1412-1421.