Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2020, Vol. 60 Issue (2): 171-180    DOI: 10.16511/j.cnki.qhdxxb.2019.21.038
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
药物表示学习研究进展
陈鑫, 刘喜恩, 吴及
清华大学 电子工程系, 北京 10084
Research progress on drug representation learning
CHEN Xin, LIU Xien, WU Ji
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
全文: PDF(1031 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 药物开发过程存在资本密度高、风险大、周期长的特点,需要投入大量的资金、人力与物力。传统的机器学习方法虽然可以在一定程度上辅助药物开发,但需要分子描述符作为特征输入,而不同的分子描述符的选择对机器学习模型的性能影响较大,因此传统的机器学习方法大多需要进行繁复、耗时的特征工程。近年新兴的深度学习方法,能够从药物的"原始"结构中直接提取特征,从而绕开特征工程,缩短开发周期。该文将现有的药物表示学习方法划分为2类:基于简化分子线性输入规范(SMILES)表达式的药物表示学习和基于分子图的药物表示学习,报告了这两类药物表示学习方法的最新研究进展,阐述了各种方法的创新点与局限性。最后,指出了当前药物表示学习研究中存在的重大挑战,并讨论了可能的解决方案。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈鑫
刘喜恩
吴及
关键词 药物表示学习简化分子线性输入规范(SMILES)分子图    
Abstract:The drug development process is characterized by large capital density, high risk and long cycles; thus, drug development requires much capital, manpower and resources. While traditional machine learning methods can aid drug development some, they require molecular descriptors as inputs. The selection of the molecular descriptors then greatly impacts the performance of the machine learning models. Therefore, most traditional machine learning methods require complex and time-consuming feature engineering. The emerging deep learning methods can directly learn the features from raw representations of the drugs which bypasses the feature engineering and shortens the drug development cycle. In this paper, the drug representation learning methods are divided into simplified molecular input line entry specification (SMILES) expression based drug representation learning methods and molecular graph based representation learning methods. This paper then surveys the innovations and limitations of various drug representation learning methods. This paper then identifies major challenges in current drug representation learning methods and presents possible solutions.
Key wordsdrug    representation learning    simplified molecular input line entry specification (SMILES)    molecular graph
收稿日期: 2019-07-19      出版日期: 2020-01-15
基金资助:吴及,教授,E-mail:wuji_ee@tsinghua.edu.cn
引用本文:   
陈鑫, 刘喜恩, 吴及. 药物表示学习研究进展[J]. 清华大学学报(自然科学版), 2020, 60(2): 171-180.
CHEN Xin, LIU Xien, WU Ji. Research progress on drug representation learning. Journal of Tsinghua University(Science and Technology), 2020, 60(2): 171-180.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2019.21.038  或          http://jst.tsinghuajournals.com/CN/Y2020/V60/I2/171
  表1 水杨酸(salicylic acid)的SMILES和分子图
  表2 基于SMILES的药物表示学习研究工作总结
  表3 基于分子图的药物表示学习研究工作总结
  图1 药物表示学习的研究工作
[1] MERKWIRTH C, LENGAUER T. Automatic generation of complementary descriptors with molecular graph networks[J]. Journal of Chemical Information and Modeling, 2005, 45(5):1159-1168.
[2] WEININGER D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules[J]. Journal of Chemical Information and Computer Sciences, 1988, 28(1):31-36.
[3] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA:MIT Press, 2014:3104-3112.
[4] KINGMA D P, WELLING M. Auto-encoding variational Bayes[Z/OL]. (2014-05-01)[2019-06-23]. https://arxiv.org/abs/1312.6114.
[5] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[Z/OL]. (2016-09-09)[2019-06-23]. https://arxiv.org/abs/1609.02907.
[6] GÓMEZ-BOMBARELLI R, WEI J N, DUVENAUD D, et al. Automatic chemical design using a data-driven continuous representation of molecules[J]. ACS Central Science, 2018, 4(2):268-276.
[7] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.
[8] MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]//Eleventh Annual Conference of the International Speech Communication Association. Makuhari, Chiba, Japan:IEEE, 2010:1045-1048.
[9] SEGLER M H S, KOGEJ T, TYRCHAN C, et al. Generating focused molecule libraries for drug discovery with recurrent neural networks[J]. ACS Central Science, 2017, 4(1):120-131.
[10] OLIVECRONA M, BLASCHKE T, ENGKVIST O, et al. Molecular de-novo design through deep reinforcement learning[J]. Journal of Cheminformatics, 2017, 9:48.
[11] POPOVA M, ISAYEV O, TROPSHA A. Deep reinforcement learning for de novo drug design[J]. Science Advances, 2018, 4(7):eaap7885.
[12] ZHENG S J, YAN X, YANG Y D, et al. Identifying structure-property relationships through SMILES syntax analysis with self-attention mechanism[J]. Journal of Chemical Information and Modeling, 2019, 59(2):914-923.
[13] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11):2673-2681.
[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[Z/OL]. (2017-12-06). https://arxiv.org/abs/1706.03762.
[15] JAEGER S, FULLE S, TURK S. Mol2vec:Unsupervised machine learning approach with chemical intuition[J]. Journal of Chemical Information and Modeling, 2018, 58(1):27-35.
[16] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[Z/OL]. (2013-09-07)[2019-06-23]. https://arxiv.org/abs/1301.3781.
[17] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[Z/OL]. (2016-05-19)[2019-06-23]. https://arxiv.org/abs/1409.0473v2.
[18] XU Z, WANG S, ZHU F Y, et al. Seq2seq fingerprint:An unsupervised deep molecular embedding for drug discovery[C]//Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Boston, USA:ACM, 2017:285-294.
[19] KIM S, THIESSEN P A, BOLTON E E, et al. PubChem substance and compound databases[J]. Nucleic Acids Research, 2015, 44(D1):D1202-D1213.
[20] GAULTON A, HERSEY A, NOWOTKA M, et al. The ChEMBL database in 2017[J]. Nucleic Acids Research, 2016, 45(D1):D945-D954.
[21] WINTER R, MONTANARI F, NOÉ F, et al. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations[J]. Chemical Science, 2019, 10(6):1692-1701.
[22] HELLER S, MCNAUGHT A, STEIN S, et al. InChI:The worldwide chemical structure identifier standard[J]. Journal of Cheminformatics, 2013, 5:7.
[23] LIU B W, RAMSUNDAR B, KAWTHEKAR P, et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models[J]. ACS Central Science, 2017, 3(10):1103-1113.
[24] LIM J, RYU S, KIM J W, et al. Molecular generative model based on conditional variational autoencoder for de novo molecular design[J]. Journal of Cheminformatics, 2018, 10:31.
[25] KANG S, CHO K. Conditional molecular design with deep generative models[J]. Journal of Chemical Information and Modeling, 2018, 59(1):43-52.
[26] BLASCHKE T, OLIVECRONA M, ENGKVIST O, et al. Application of generative autoencoder in de novo molecular design[J]. Molecular Informatics, 2018, 37(1-2):1700123.
[27] IOVANAC N, SAVOIE B M. Improved chemical prediction from scarce data sets via latent space enrichment[J]. The Journal of Physical Chemistry A, 2019, 123(19):4295-4305.
[28] SHUMAN D I, NARANG S K, FROSSARD P, et al. The emerging field of signal processing on graphs:Extending high-dimensional data analysis to networks and other irregular domains[J]. IEEE Signal Processing Magazine, 2013, 30(3):83-98.
[29] FIGUEIREDO D R, RIBEIRO L F R, SAVERESE P H P. Struc2vec:Learning node representations from structural identity[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada:ACM, 2017:385-394.
[30] BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[Z/OL]. (2014-05-21)[2019-06-23]. https://arxiv.org/abs/1312.6203.
[31] HENAFF M, BRUNA J, LECUN Y. Deep convolutional networks on graph-structured data[Z/OL]. (2015-06-16)[2019-06-07]. https://arxiv.org/abs/1506.05163.
[32] DUVENAUD D K, MACLAURIN D, AGUILERA-IPARRAGUIRRE J, et al. Convolutional networks on graphs for learning molecular fingerprints[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, USA:ACM, 2015:2224-2232.
[33] DEFFERRARD M, BRESSON X, VANDERGHEYNST P. Convolutional neural networks on graphs with fast localized spectral filtering[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain:ACM, 2016:3844-3852.
[34] KEARNES S, MCCLOSKEY K, BERNDL M, et al. Molecular graph convolutions:Moving beyond fingerprints[J]. Journal of Computer-Aided Molecular Design, 2016, 30(8):595-608.
[35] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015:1-9.
[36] SCHVTT K T, ARBABZADAH F, CHMIELA S, et al. Quantum-chemical insights from deep tensor neural networks[J]. Nature Communications, 2017, 8:13890.
[37] GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for quantum chemistry[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia:JMLR, 2017:1263-1272.
[38] SCHVTT K, KINDERMANS P J, SAUCEDA H E, et al. Schnet:A continuous-filter convolutional neural network for modeling quantum interactions[C]//Advances in Neural Information Processing Systems. Long Beach, USA:ACM, 2017:991-1001.
[39] LI J Y, CAI D, HE X F. Learning graph-level representation for drug discovery[Z/OL]. (2017-09-12)[2019-06-07]. https://arxiv.org/abs/1709.03741.
[40] COLEY C W, BARZILAY R, GREEN W H, et al. Convolutional embedding of attributed molecular graphs for physical property prediction[J]. Journal of Chemical Information and Modeling, 2017, 57(8):1757-1772.
[41] GAO K Y, FOKOUE A, LUO H, et al. Interpretable drug target prediction using deep neural representation[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, Sweden:IJCAI, 2018:3371-3377.
[42] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[43] TSUBAKI M, TOMII K, SESE J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences[J]. Bioinformatics, 2019, 35(2):309-318.
[44] WEISFEILER B, LEHMAN A A. A reduction of a graph to a canonical form and an algebra arising during this reduction[J]. Nauchno-Technicheskaya Informatsia, 1968, 2(9):12-16.
[45] XU K L, HU W H, LESKOVEC J, et al. How powerful are graph neural networks?[Z/OL]. (2019-02-22)[2019-06-23]. https://arxiv.org/abs/1810.00826.
[46] ZITNIK M, AGRAWAL M, LESKOVEC J. Modeling polypharmacy side effects with graph convolutional networks[J]. Bioinformatics, 2018, 34(13):i457-i466.
[47] MA T F, XIAO C, ZHOU J Y, et al. Drug similarity integration through attentive multi-view graph auto-encoders[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden:IJCAI, 2018:3371-3377.
[48] DEAC A, HUANG Y H, VELI AČG KOVIĆ P, et al. Drug-drug adverse effect prediction with graph co-attention[Z/OL]. (2019)[2019-06-22]. https://arxiv.org/abs/1905.00534.
[49] XU N, WANG P H, CHEN L, et al. MR-GNN:Multi-resolution and dual graph neural network for predicting structured entity interactions[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macau, China:IJCAI, 2019:3968-3974.
[50] YOU J, LIU B, YING Z, et al. Graph convolutional policy network for goal-directed molecular graph generation[C]//Advances in Neural Information Processing Systems. Montréal, Canada:ACM, 2018:6410-6421.
[51] CAO N D, KIPF T. MolGAN:An implicit generative model for small molecular graphs[Z/OL]. (2018-05-30)[2019-06-07]. https://arxiv.org/abs/1805.11973.
[52] YOU J X, YING R, REN X, et al. Graphrnn:Generating realistic graphs with deep auto-regressive models[Z/OL]. (2018-02-24)[2019-06-07]. https://arxiv.org/abs/1802.08773.
[53] KUZMINYKH D, POLYKOVSKIY D, KADURIN A, et al. 3D molecular representations based on the wave transform for convolutional neural networks[J]. Molecular Pharmaceutics, 2018, 15(10):4378-4385.
[54] VERMA N, BOYER E, VERBEEK J. Feastnet:Feature-steered graph convolutions for 3D shape analysis[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE, 2018:2598-2606.
[55] TORNG W, ALTMAN R B. 3D deep convolutional neural networks for amino acid environment similarity analysis[J]. BMC Bioinformatics, 2017, 18:302.
[56] ZHANG Z W, CUI P, ZHU W W. Deep learning on graphs:A survey[Z/OL]. (2018-12-11)[2019-06-07]. https://arxiv.org/abs/1812.04202.
[57] DENG J, DONG W, SOCHER R, et al. ImageNet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA:IEEE, 2009:248-255.
[58] DEVLIN J, CHANG M W, LEE K, et al. Bert:Pre-training of deep bidirectional transformers for language understanding[Z/OL]. (2019-05-24). https://arxiv.org/abs/1810.04805.
[59] NAVARIN N, TRAN D V, SPERDUTI A. Pre-training graph neural networks with kernels[Z/OL]. (2018-11-16)[2019-06-07]. https://arxiv.org/abs/1811.06930.
[60] HU W H, LIU B W, GOMES J, et al. Pre-training graph neural networks[Z/OL]. (2019-05-29)[2019-06-22]. https://arxiv.org/abs/1905.12265.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn