Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2022, Vol. 62 Issue (9): 1442-1449    DOI: 10.16511/j.cnki.qhdxxb.2021.26.040
  数据库 本期目录 | 过刊浏览 | 高级检索 |
面向跨模态检索的自监督深度语义保持Hash
逯波1, 段晓东1, 袁野2
1. 大连民族大学 大数据应用技术国家民委重点实验室, 大连 116600;
2. 北京理工大学 计算机科学与技术学院, 北京 100081
Self-supervised deep semantics-preserving Hashing for cross-modal retrieval
LU Bo1, DUAN Xiaodong1, YUAN Ye2
1. SECA Key Laboratory of Big Data Applied Technology, Dalian Minzu University, Dalian 116600, China;
2. School of Computer and Technology, Beijing Institute of Technology, Beijing 100081, China
全文: PDF(6198 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 在跨模态媒体检索任务中,如何最大化保持异构媒体数据映射后的语义关联成为跨模态Hash的关键问题,该文提出一种基于自监督学习的深度语义保持Hash网络框架用于生成紧凑的Hash编码。首先,针对图像和文本数据,分别训练2个单一模态的深度Hash网络并生成高层语义特征及各自的Hash编码。同时,利用跨模态注意力机制度量不同模态高层语义特征之间的相似性,最大化异构媒体数据间的局部语义关联性。其次,利用训练数据的多标签语义信息建立深度语义Hash网络,并以自监督对抗学习的方式同时监督指导2个单一模态的深度Hash网络的训练过程,从而在全局角度保持不同模态数据之间的语义关联,提高生成Hash编码的区分能力。最后,在3个被广泛使用的大规模多模态媒体数据集上验证了提出框架的有效性。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
逯波
段晓东
袁野
关键词 深度跨模态Hash对抗学习语义Hash跨模态注意力机制    
Abstract:The key issue for cross-modal retrieval using cross-modal Hashing is how to maximize the consistency of the semantic relationship for heterogeneous media data. This paper presents a self-supervised deep semantics-preserving hashing network (UDSPH) that generates compact Hash codes using an end-to-end architecture. Two modality-specific hashing networks are first trained for generating the Hash codes and high-level features. The semantic relationship between different modalities is then measured using cross-modal attention mechanisms that maximize preservation of the local semantic correlation. Multi-label semantic information in the training data is used to simultaneously guide the training of two modality-specific Hashing networks by self-supervised adversarial learning. This constructs a deep semantic hashing network that preserves the semantic association in the global view and improves the discriminative capability of the generated Hash codes. Tests on three widely-used benchmark datasets verify the effectiveness of this method.
Key wordsdeep cross-modal Hashing    adversarial learning    semantic Hashing    cross-modal attention
收稿日期: 2021-07-22      出版日期: 2022-08-18
基金资助:段晓东,教授,E-mail:lubo@dlnu.edu.cn
引用本文:   
逯波, 段晓东, 袁野. 面向跨模态检索的自监督深度语义保持Hash[J]. 清华大学学报(自然科学版), 2022, 62(9): 1442-1449.
LU Bo, DUAN Xiaodong, YUAN Ye. Self-supervised deep semantics-preserving Hashing for cross-modal retrieval. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1442-1449.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2021.26.040  或          http://jst.tsinghuajournals.com/CN/Y2022/V62/I9/1442
  
  
  
  
[1] WAN J, WANG D Y, HOI S C, et al. Deep learning for content-based image retrieval: A comprehensive study[C]//Proceedings of the 22nd ACM International Conference on Multimedia. Orlando Florida, USA: ACM, 2014: 157-166.
[2] ZHUANG Y T, YU Z, WANG W, et al. Cross-media hashing with neural networks[C]//Proceedings of the 22nd ACM International Conference on Multimedia. Orlando Florida, USA: ACM, 2014: 901-904.
[3] SONG J, YANG Y, HUANG Z. Inter-media hashing for large-scale retrieval from heterogeneous data sources[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York, USA: ACM, 2013: 785-796.
[4] DING G G, GUO Y C, ZHOU J L, et al. Large-scale cross-modality search via collective matrix factorization hashing[J]. IEEE Transactions on Image Processing, 2016, 25(11): 5427-5440.
[5] BRONSTEIN M M, BRONSTEIN A M, MICHEL F, et al. Data fusion through cross-modality metric learning using similarity-sensitive hashing[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010: 3594-3601.
[6] WU B T, YANG Q, ZHENG W S, et al. Quantized correlation hashing for fast cross-modal search[C]//Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI Press, 2015: 3946-3952.
[7] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
[8] ZHANG J, PENG Y X, YUAN M K. Unsupervised generative adversarial cross-modal hashing[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI Press, 2018: 539-546.
[9] JIANG Q Y, LI W J. Deep cross-modal hashing[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 3270-3278.
[10] WANG B K, YANG Y, XU X, et al. Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, USA: ACM, 2017: 154-162.
[11] LIONG V E, LU J W, TAN Y P, et al. Cross-modal deep variational hashing[C]//Proceeding of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017: 4097-4105.
[12] CHATFIELD C, SIMONYAN K, VEDALDI A, et al. Return of the devil in the details: Delving deep into convolutional nets[C]//Proceedings of the British Machine Vision Conference. Nottingham, UK: BMVA Press, 2014: 1-12.
[13] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of the 31st International Conference on International Conference on Machine Learning. Beijing, China: JMLR, 2014: 1188-1196.
[14] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of the International Conference on Learning Representations. Scottsdale, USA, 2013: 2-11.
[15] LEE K H, CHEN X, HUA G, et al. Stacked cross attention for image-text matching[C]//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018: 212-228.
[16] HUISKES M J, LEW M S. The MIR flickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. Vancouver, Canada: ACM, 2008: 39-43.
[17] CHUA T S, TANG J H, HONG R C, et al. NUS-WIDE: A real-world web image database from national university of Singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval. Santorini, Greece: ACM, 2009: Article No.: 48.
[18] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 740-755.
[19] WANG D, GAO X B, WANG X M, et al. Semantic topic multimodal hashing for cross-media retrieval[C]// Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI Press, 2015: 3890-3896.
[20] LIN Z J, DING G G, HU M Q, et al. Semantics-preserving hashing for cross-view retrieval[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015: 3864-3872.
[21] ZHANG D Q, LI W J. Large-scale supervised multimodal hashing with semantic correlation maximization[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec City, Canada: AAAI Press, 2014: 2177-2183.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn