SPECIAL SECTION:COMPUTATIONAL LINGUISTICS |
|
|
|
|
|
Deep learning multi-language topic alignment model across domains |
YU Chuanming1, YUAN Sai2, HU Shasha1, AN Lu3 |
1. School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China; 2. School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China; 3. School of Information Management, Wuhan University, Wuhan 430072, China |
|
|
Abstract Deep representation learning of domain topics was used to build a topic alignment model (TAM) with integrated bilingual word embedding. The semantic alignment lexicon was extended to include bilingual word embedding. A traditional bilingual topic model was used to develop an auxiliary distribution to improve the word distribution semantic sharing to improve the topic alignments in the cross-lingual and cross-domain contexts. A bilingual topic similarity (BTS) indicator and a bilingual alignment similarity (BAS) indicator were developed to evaluate the supplementary alignment. The bilingual alignment similarity improved the cross-language topic matching by about 1.5% compared to a traditional multi-language common cultural theme analysis and improved F1 by about 10% for cross-domain topic alignment. These results can improve cross language and cross domain information processing.
|
Keywords
cross-lingual topic alignment
cross-domain topic alignment
deep learning
bilingual word embedding
knowledge alignment
|
Issue Date: 26 April 2020
|
|
|
[1] PAPADIMITRIOU C H, RAGHAVAN P, TAMAKI H, et al. Latent semantic indexing:A probabilistic analysis[J]. Journal of Computer and System Sciences, 2000, 61(2):217-235. [2] 夏青, 严馨, 余正涛, 等. 融合要素及主题的汉越双语新闻话题分析[J]. 计算机工程, 2016, 42(9):186-191.XIA Q, YAN X, YU Z T, et al. Analysis of sino-Vietnamese bilingual news topics mixing elements and themes[J]. Computer Engineering, 2016, 42(9):186-191. (in Chinese) [3] 唐莫鸣, 朱明玮, 余正涛, 等. 基于双语主题和因子图模型的汉语-越南语双语事件关联分析[J]. 中文信息学报, 2017, 31(6):125-131, 139.TANG M M, ZHU M W, YU Z T, et al. Chinese-Vietnamese bilingual event correlation analysis based on bilingual topic and factor graph[J]. Journal of Chinese Information Processing, 2017, 31(6):125-131, 139. (in Chinese) [4] 司莉, 陈雨雪, 曾粤亮. 基于多语言本体的中英跨语言信息检索模型及实现[J]. 图书情报工作, 2017, 61(1):100-108.SI L, CHEN Y X, ZENG Y L. A study on cross-language information retrieval model based on multilingual ontology[J]. Library and Information Service, 2017, 61(1):100-108. (in Chinese) [5] 余传明, 冯博琳, 田鑫, 等. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3):13-23.YU C M, FENG B L, TIAN X, et al. Deep representative learning based sentiment analysis in the cross-lingual environment[J]. Journal of Shandong University (Natural Science), 2018, 53(3):13-23. (in Chinese) [6] 许海云, 董坤, 刘春江, 等. 文本主题识别关键技术研究综述[J]. 情报科学, 2017, 35(1):153-160.XU H Y, DONG K, LIU C J, et al. A review on topic identification of scientific text files[J]. Information Science, 2017, 35(1):153-160. (in Chinese) [7] 余传明, 安璐. 从小数据到大数据——观点检索面临的三个挑战[J]. 情报理论与实践, 2016, 39(2):13-19.YU C M, AN L. From small data to big data:Three challenges for opinion retrieval[J]. Information Studies (Theory & Application), 2016, 39(2):13-19. (in Chinese) [8] WEI X, CROFT W B. LDA-based document models for ad-hoc retrieval[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, USA:ACM Press, 2006:178-185. [9] LI S H, CHUA T S, ZHU J, et al. Generative topic embedding:A continuous representation of documents[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Berlin, Germany:Association for Computational Linguistics, 2016:666-675. [10] LIU Y, LIU Z Y, CHUA T S, et al. Topical word embeddings[EB/OL].[2018-02-19]. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/viewPaper/9314. [11] ZHANG H, ZHONG G Q. Improving short text classification by learning vector representations of both words and hidden topics[J]. Knowledge-Based Systems, 2016, 102:76-86. [12] MOODY C E. Mixing dirichlet topic models and word embeddings to make lda2vec[EB/OL].[2018-05-06]. https://arxiv.org/abs/1605.02019. [13] LI D Y, LI Y, WANG S G. Topic enhanced word vectors for documents representation[M]//CHENG X, MA W, LIU H, et al. Social Media Processing. SMP 2017. Singapore:Springer, 2017:166-177. [14] 杨奇奇. 基于多主题空间的跨领域文本分类方法研究[D]. 合肥:合肥工业大学, 2017.YANG Q Q. Research on cross-domain text classification based on multi-topic spaces[D]. Hefei:Hefei University of Technology, 2017. (in Chinese) [15] WU T X, ZHANG L, QI G L, et al. Encoding category correlations into bilingual topic modeling for cross-lingual taxonomy alignment[M]//D'AMATO C. The Semantic Web-ISWC 2017. ISWC 2017. Cham:Springer, 2017:728-744. [16] TAMURA A, SUMITA E. Bilingual segmented topic model[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Berlin, Germany:Association for Computational Linguistics, 2016:1266-1276. [17] ZHANG D, MEI Q Z, ZHAI C X. Cross-lingual latent topic extraction[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden:Association for Computational Linguistics, 2010:1128-1137. [18] ZHANG T, LIU K, ZHAO J. Cross lingual entity linking with bilingual topic model[EB/OL].[2013-06-30]. https://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/viewPaper/6268. [19] WU T X, QI G L, WANG H F, et al. Cross-lingual taxonomy alignment with bilingual biterm topic model[EB/OL].[2018-06-21]. https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12011. [20] HEYMAN G, VULIĆ I, MOENS M F. C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content[J]. Data Mining and Knowledge Discovery, 2016, 30(5):1299-1323. [21] SAKATA Y, EGUCHI K. Cross-lingual link prediction using multimodal relational topic models[C]//2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS). Okayama, Japan:IEEE, 2016:1-8. [22] LI L H, JIN X M, LONG M S. Topic correlation analysis for cross-domain text classification[C]//Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. Toronto, Ontario, Canada:AAAI Press, 2012. [23] YANG P, GAO W, TAN Q, et al. A link-bridged topic model for cross-domain document classification[J]. Information Processing & Management, 2013, 49(6):1181-1193. [24] 杨奇奇, 张玉红, 胡学钢. 一种基于多桥映射的跨领域文本分类方法[J]. 计算机应用研究, 2018, 35(4):996-1000.YANG Q Q, ZHANG Y H, HU X G. Cross-domain text classification approach based on multi-bridge mapping[J]. Application Research of Computers, 2018, 35(4):996-1000. (in Chinese) [25] ARTETXE M, LABAKA G, AGIRRE E. Learning bilingual word embeddings with (almost) no bilingual data[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Vancouver, Canada:Association for Computational Linguistics, 2017:451-462. [26] SHI B, LAM W, BING L D, et al. Detecting common discussion topics across culture from news reader comments[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Berlin, Germany:Association for Computational Linguistics, 2016:676-685. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|