基于残差网络的矢量量化堆叠自编码器

王辉, 叶晓俊, 董泽委

清华大学学报(自然科学版) ›› 2025, Vol. 65 ›› Issue (11) : 2259-2268.

PDF(7544 KB)
PDF(7544 KB)
清华大学学报(自然科学版) ›› 2025, Vol. 65 ›› Issue (11) : 2259-2268. DOI: 10.16511/j.cnki.qhdxxb.2025.21.030
计算机科学与技术

基于残差网络的矢量量化堆叠自编码器

作者信息 +

Residual network-based stacked vector quantized autoencoder

Author information +
文章历史 +

摘要

尽管现有生成式模型在音视频等连续域取得了优异效果,但在推荐系统等离散序列化数据中存在较大的重构误差,从而严重制约特征捕捉能力和数据生成质量。该文利用残差连接将多层次编码器输出的连续隐向量传递至解码器对应层,并结合矢量量化编码方法,提出了一种基于残差网络的堆叠矢量量化自编码器。该方法有效避免了编码器中有高价值的连续信息被丢弃;同时,借助矢量量化方法构建多重堆叠的码本,学习多维度的离散化矢量量化特征表达。并通过在训练过程引入对抗网络,缩小了量化结果与编码器输出的差异,成功克服了编码器中广泛存在的后验坍塌问题。该方法不仅能够确保与自编码器相当的重构效果,还能通过在码本空间进行先验采样,生成分布特性一致的离散序列数据。在多个公开数据集上进行实验验证,均取得了良好的效果。

Abstract

Objective: Deep learning technologies have achieved remarkable progress in the field of personalized recommendation services. However, recommendation systems based on deep neural networks still face the challenge of data sparsity, which limits the ability of a model to accurately capture subtle differences in user preferences, thereby affecting the robustness of model training. This problem is specifically prominent in scenarios with limited user interaction data. Therefore, this paper aims to propose a recommendation system model that can effectively address the data sparsity issue to enhance the capability of a model in user behavior modeling and overall performance. Methods: To tackle the data sparsity issue, this paper proposes a residual network-based stacked vector-quantized autoencoder (RSVQ-AE). This model fully utilizes the advantages of residual connections by directly passing the continuous latent vector output from the multiple layers of encoders to the corresponding layers of the decoder. This effectively reduces the loss of high-value continuous information that is common in encoders, which is crucial for maintaining the fidelity of data representation. Meanwhile, by introducing vector quantization technology, we discretize the latent space to ensure that the model can accurately capture and represent the data. In addition, this paper constructs multiple stacked codebooks using vector quantization technology, enabling the model to learn multidimensional discrete vector quantization feature representations and capture the discretized interest representations of user behavior across multiple dimensions through stacked codebooks. To further enhance the stability and generative capabilities of the model, an adversarial network is introduced as a regularizer during the training process to promote rapid convergence. Results: To verify the effectiveness of the model, experiments were conducted on several public datasets widely used in recommendation systems. The experimental results revealed that the RSVQ-AE model exhibits excellent reconstruction performance across multiple datasets. Based on the ML-1M (MovieLens-1M) dataset, when the sequence length is 20, the reconstruction loss of RSVQ-AE is only 0.1525, with an accuracy rate of as high as 70.69%; when the sequence length increases to 100, the reconstruction loss further decreases to 0.0039, with an accuracy rate of 50.58%. Based on the Retail Rocket dataset, when the sequence length is 20, the reconstruction loss is as low as 2.42×10-4, with an accuracy rate of 81.26%; when the sequence length is 100, the reconstruction loss is 0.0019, with an accuracy rate of 74.21%. These results fully demonstrate that RSVQ-AE can maintain low reconstruction loss and high accuracy when processing sequences with different lengths. Its performance is only second to the autoencoder model, which cannot perform sampling generation. Conclusions: The proposed RSVQ-AE offers a powerful solution for the generation of discrete sequence data in recommendation systems. By addressing the limitations of existing generative models and introducing innovative technologies such as stacked codebooks, this model has achieved remarkable improvements in reconstruction accuracy and data generation quality. This method not only enhances the capability of the model in user behavior modeling but also provides new ideas and approaches for the development of personalized recommendation services, holding the potential to drive the future development of more efficient and user-behavior-centered recommendation systems. In addition, the flexibility and robustness of model data generation make it applicable to a variety of recommendation system model architectures.

关键词

推荐系统 / 生成式模型 / 残差网络 / 自编码器

Key words

recommendation systems / generative models / residual networks / autoencoders

引用本文

导出引用
王辉, 叶晓俊, 董泽委. 基于残差网络的矢量量化堆叠自编码器[J]. 清华大学学报(自然科学版). 2025, 65(11): 2259-2268 https://doi.org/10.16511/j.cnki.qhdxxb.2025.21.030
Hui WANG, Xiaojun YE, Zewei DONG. Residual network-based stacked vector quantized autoencoder[J]. Journal of Tsinghua University(Science and Technology). 2025, 65(11): 2259-2268 https://doi.org/10.16511/j.cnki.qhdxxb.2025.21.030
中图分类号: TP393.1   

参考文献

1
WANG J L, DING K Z, HONG L J, et al. Next-item recommendation with sequential hypergraphs[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, United States: ACM, 2020: 1101-1110.
2
EKSOMBATCHAI C, JINDAL P, LIU J Z, et al. Pixie: A system for recommending 3+ billion items to 200+ million users in real-time[C]//Proceedings of the Web Conference 2018. Lyon, France: ACM, 2018: 1775-1784.
3
SUN F, LIU J, WU J, et al. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing, China: ACM, 2019: 1441-1450.
4
KANG W C, MCAULEY J. Self-attentive sequential recommendation[C]//Proceedings of the 2018 IEEE International Conference on Data Mining. Singapore, Singapore: IEEE, 2018: 1-10.
5
VAN DEN OORD A, VINYALS O, KAVUKCUOGLU K. Neural discrete representation learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, Long Beach, USA: Curran Associates Inc., 2017: 1-10.
6
KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. (2013-12-10)[2025-01-12]. https://arxiv.org/abs/1312.6114.
7
HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, USA: Curran Associates Inc., 2020, 33: 6840-6851.
8
GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, United States: MIT Press, 2014: 2672-2680.
9
KASWAN K S, DHATTERWAL J S, MALIK K, et al. Generative AI: A review on models and applications[C]//2023 International Conference on Communication, Security and Artificial Intelligence (ICCSAI). Greater Noida, India: IEEE, 2023: 699-704.
10
GONG S S, LI M K, FENG J T, et al. Diffuseq: Sequence to sequence text generation with diffusion models[EB/OL]. (2022-10-17)[2025-01-12]. https://arxiv.org/abs/1312.6114.
11
RAZAVI A, VAN DEN OORD A, VINYALS O. Generating diverse high-fidelity images with VQ-VAE-2[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, USA: Curran Associates Inc., 2019: 1-10.
12
RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015). Munich, Germany: Springer, 2015: 234-241.
13
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates, Inc., 2017: 5998-6008.
14
RUMELHART D E , HINTON G E , WILLIAMS R J . Learning representations by back-propagating errors[J]. Nature, 1986, 323 (6088): 533- 536.
15
LEE D, KIM C, KIM S, et al. Autoregressive image generation using residual quantization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 11523-11532.
16
ESER P, ROMBACH R, OMMER B. Taming transformers for high-resolution image synthesis[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 1268-1277.
17
LI Z H , SUN A X , LI C L . Diffurec: A diffusion model for sequential recommendation[J]. ACM Transactions on Information Systems, 2023, 42 (3): 1- 28.
18
BURGER T . Fudging the volcano-plot without dredging the data[J]. Nature Communications, 2024, 15 (1): 1392.

版权

版权所有,未经授权,不得转载。
PDF(7544 KB)

审稿意见

Accesses

Citation

Detail

段落导航
相关文章

/