药物表示学习研究进展

陈鑫, 刘喜恩, 吴及

清华大学学报(自然科学版) ›› 2020, Vol. 60 ›› Issue (2) : 171-180.

PDF(1031 KB)
PDF(1031 KB)
清华大学学报(自然科学版) ›› 2020, Vol. 60 ›› Issue (2) : 171-180. DOI: 10.16511/j.cnki.qhdxxb.2019.21.038
电子工程

药物表示学习研究进展

  • 陈鑫, 刘喜恩, 吴及
作者信息 +

Research progress on drug representation learning

  • CHEN Xin, LIU Xien, WU Ji
Author information +
文章历史 +

摘要

药物开发过程存在资本密度高、风险大、周期长的特点,需要投入大量的资金、人力与物力。传统的机器学习方法虽然可以在一定程度上辅助药物开发,但需要分子描述符作为特征输入,而不同的分子描述符的选择对机器学习模型的性能影响较大,因此传统的机器学习方法大多需要进行繁复、耗时的特征工程。近年新兴的深度学习方法,能够从药物的"原始"结构中直接提取特征,从而绕开特征工程,缩短开发周期。该文将现有的药物表示学习方法划分为2类:基于简化分子线性输入规范(SMILES)表达式的药物表示学习和基于分子图的药物表示学习,报告了这两类药物表示学习方法的最新研究进展,阐述了各种方法的创新点与局限性。最后,指出了当前药物表示学习研究中存在的重大挑战,并讨论了可能的解决方案。

Abstract

The drug development process is characterized by large capital density, high risk and long cycles; thus, drug development requires much capital, manpower and resources. While traditional machine learning methods can aid drug development some, they require molecular descriptors as inputs. The selection of the molecular descriptors then greatly impacts the performance of the machine learning models. Therefore, most traditional machine learning methods require complex and time-consuming feature engineering. The emerging deep learning methods can directly learn the features from raw representations of the drugs which bypasses the feature engineering and shortens the drug development cycle. In this paper, the drug representation learning methods are divided into simplified molecular input line entry specification (SMILES) expression based drug representation learning methods and molecular graph based representation learning methods. This paper then surveys the innovations and limitations of various drug representation learning methods. This paper then identifies major challenges in current drug representation learning methods and presents possible solutions.

关键词

药物 / 表示学习 / 简化分子线性输入规范(SMILES) / 分子图

Key words

drug / representation learning / simplified molecular input line entry specification (SMILES) / molecular graph

引用本文

导出引用
陈鑫, 刘喜恩, 吴及. 药物表示学习研究进展[J]. 清华大学学报(自然科学版). 2020, 60(2): 171-180 https://doi.org/10.16511/j.cnki.qhdxxb.2019.21.038
CHEN Xin, LIU Xien, WU Ji. Research progress on drug representation learning[J]. Journal of Tsinghua University(Science and Technology). 2020, 60(2): 171-180 https://doi.org/10.16511/j.cnki.qhdxxb.2019.21.038

参考文献

[1] MERKWIRTH C, LENGAUER T. Automatic generation of complementary descriptors with molecular graph networks[J]. Journal of Chemical Information and Modeling, 2005, 45(5):1159-1168.
[2] WEININGER D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules[J]. Journal of Chemical Information and Computer Sciences, 1988, 28(1):31-36.
[3] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA:MIT Press, 2014:3104-3112.
[4] KINGMA D P, WELLING M. Auto-encoding variational Bayes[Z/OL]. (2014-05-01)[2019-06-23]. https://arxiv.org/abs/1312.6114.
[5] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[Z/OL]. (2016-09-09)[2019-06-23]. https://arxiv.org/abs/1609.02907.
[6] GÓMEZ-BOMBARELLI R, WEI J N, DUVENAUD D, et al. Automatic chemical design using a data-driven continuous representation of molecules[J]. ACS Central Science, 2018, 4(2):268-276.
[7] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.
[8] MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]//Eleventh Annual Conference of the International Speech Communication Association. Makuhari, Chiba, Japan:IEEE, 2010:1045-1048.
[9] SEGLER M H S, KOGEJ T, TYRCHAN C, et al. Generating focused molecule libraries for drug discovery with recurrent neural networks[J]. ACS Central Science, 2017, 4(1):120-131.
[10] OLIVECRONA M, BLASCHKE T, ENGKVIST O, et al. Molecular de-novo design through deep reinforcement learning[J]. Journal of Cheminformatics, 2017, 9:48.
[11] POPOVA M, ISAYEV O, TROPSHA A. Deep reinforcement learning for de novo drug design[J]. Science Advances, 2018, 4(7):eaap7885.
[12] ZHENG S J, YAN X, YANG Y D, et al. Identifying structure-property relationships through SMILES syntax analysis with self-attention mechanism[J]. Journal of Chemical Information and Modeling, 2019, 59(2):914-923.
[13] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11):2673-2681.
[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[Z/OL]. (2017-12-06). https://arxiv.org/abs/1706.03762.
[15] JAEGER S, FULLE S, TURK S. Mol2vec:Unsupervised machine learning approach with chemical intuition[J]. Journal of Chemical Information and Modeling, 2018, 58(1):27-35.
[16] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[Z/OL]. (2013-09-07)[2019-06-23]. https://arxiv.org/abs/1301.3781.
[17] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[Z/OL]. (2016-05-19)[2019-06-23]. https://arxiv.org/abs/1409.0473v2.
[18] XU Z, WANG S, ZHU F Y, et al. Seq2seq fingerprint:An unsupervised deep molecular embedding for drug discovery[C]//Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Boston, USA:ACM, 2017:285-294.
[19] KIM S, THIESSEN P A, BOLTON E E, et al. PubChem substance and compound databases[J]. Nucleic Acids Research, 2015, 44(D1):D1202-D1213.
[20] GAULTON A, HERSEY A, NOWOTKA M, et al. The ChEMBL database in 2017[J]. Nucleic Acids Research, 2016, 45(D1):D945-D954.
[21] WINTER R, MONTANARI F, NOÉ F, et al. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations[J]. Chemical Science, 2019, 10(6):1692-1701.
[22] HELLER S, MCNAUGHT A, STEIN S, et al. InChI:The worldwide chemical structure identifier standard[J]. Journal of Cheminformatics, 2013, 5:7.
[23] LIU B W, RAMSUNDAR B, KAWTHEKAR P, et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models[J]. ACS Central Science, 2017, 3(10):1103-1113.
[24] LIM J, RYU S, KIM J W, et al. Molecular generative model based on conditional variational autoencoder for de novo molecular design[J]. Journal of Cheminformatics, 2018, 10:31.
[25] KANG S, CHO K. Conditional molecular design with deep generative models[J]. Journal of Chemical Information and Modeling, 2018, 59(1):43-52.
[26] BLASCHKE T, OLIVECRONA M, ENGKVIST O, et al. Application of generative autoencoder in de novo molecular design[J]. Molecular Informatics, 2018, 37(1-2):1700123.
[27] IOVANAC N, SAVOIE B M. Improved chemical prediction from scarce data sets via latent space enrichment[J]. The Journal of Physical Chemistry A, 2019, 123(19):4295-4305.
[28] SHUMAN D I, NARANG S K, FROSSARD P, et al. The emerging field of signal processing on graphs:Extending high-dimensional data analysis to networks and other irregular domains[J]. IEEE Signal Processing Magazine, 2013, 30(3):83-98.
[29] FIGUEIREDO D R, RIBEIRO L F R, SAVERESE P H P. Struc2vec:Learning node representations from structural identity[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada:ACM, 2017:385-394.
[30] BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[Z/OL]. (2014-05-21)[2019-06-23]. https://arxiv.org/abs/1312.6203.
[31] HENAFF M, BRUNA J, LECUN Y. Deep convolutional networks on graph-structured data[Z/OL]. (2015-06-16)[2019-06-07]. https://arxiv.org/abs/1506.05163.
[32] DUVENAUD D K, MACLAURIN D, AGUILERA-IPARRAGUIRRE J, et al. Convolutional networks on graphs for learning molecular fingerprints[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, USA:ACM, 2015:2224-2232.
[33] DEFFERRARD M, BRESSON X, VANDERGHEYNST P. Convolutional neural networks on graphs with fast localized spectral filtering[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain:ACM, 2016:3844-3852.
[34] KEARNES S, MCCLOSKEY K, BERNDL M, et al. Molecular graph convolutions:Moving beyond fingerprints[J]. Journal of Computer-Aided Molecular Design, 2016, 30(8):595-608.
[35] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015:1-9.
[36] SCHVTT K T, ARBABZADAH F, CHMIELA S, et al. Quantum-chemical insights from deep tensor neural networks[J]. Nature Communications, 2017, 8:13890.
[37] GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for quantum chemistry[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia:JMLR, 2017:1263-1272.
[38] SCHVTT K, KINDERMANS P J, SAUCEDA H E, et al. Schnet:A continuous-filter convolutional neural network for modeling quantum interactions[C]//Advances in Neural Information Processing Systems. Long Beach, USA:ACM, 2017:991-1001.
[39] LI J Y, CAI D, HE X F. Learning graph-level representation for drug discovery[Z/OL]. (2017-09-12)[2019-06-07]. https://arxiv.org/abs/1709.03741.
[40] COLEY C W, BARZILAY R, GREEN W H, et al. Convolutional embedding of attributed molecular graphs for physical property prediction[J]. Journal of Chemical Information and Modeling, 2017, 57(8):1757-1772.
[41] GAO K Y, FOKOUE A, LUO H, et al. Interpretable drug target prediction using deep neural representation[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, Sweden:IJCAI, 2018:3371-3377.
[42] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[43] TSUBAKI M, TOMII K, SESE J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences[J]. Bioinformatics, 2019, 35(2):309-318.
[44] WEISFEILER B, LEHMAN A A. A reduction of a graph to a canonical form and an algebra arising during this reduction[J]. Nauchno-Technicheskaya Informatsia, 1968, 2(9):12-16.
[45] XU K L, HU W H, LESKOVEC J, et al. How powerful are graph neural networks?[Z/OL]. (2019-02-22)[2019-06-23]. https://arxiv.org/abs/1810.00826.
[46] ZITNIK M, AGRAWAL M, LESKOVEC J. Modeling polypharmacy side effects with graph convolutional networks[J]. Bioinformatics, 2018, 34(13):i457-i466.
[47] MA T F, XIAO C, ZHOU J Y, et al. Drug similarity integration through attentive multi-view graph auto-encoders[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden:IJCAI, 2018:3371-3377.
[48] DEAC A, HUANG Y H, VELI AČG KOVIĆ P, et al. Drug-drug adverse effect prediction with graph co-attention[Z/OL]. (2019)[2019-06-22]. https://arxiv.org/abs/1905.00534.
[49] XU N, WANG P H, CHEN L, et al. MR-GNN:Multi-resolution and dual graph neural network for predicting structured entity interactions[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macau, China:IJCAI, 2019:3968-3974.
[50] YOU J, LIU B, YING Z, et al. Graph convolutional policy network for goal-directed molecular graph generation[C]//Advances in Neural Information Processing Systems. Montréal, Canada:ACM, 2018:6410-6421.
[51] CAO N D, KIPF T. MolGAN:An implicit generative model for small molecular graphs[Z/OL]. (2018-05-30)[2019-06-07]. https://arxiv.org/abs/1805.11973.
[52] YOU J X, YING R, REN X, et al. Graphrnn:Generating realistic graphs with deep auto-regressive models[Z/OL]. (2018-02-24)[2019-06-07]. https://arxiv.org/abs/1802.08773.
[53] KUZMINYKH D, POLYKOVSKIY D, KADURIN A, et al. 3D molecular representations based on the wave transform for convolutional neural networks[J]. Molecular Pharmaceutics, 2018, 15(10):4378-4385.
[54] VERMA N, BOYER E, VERBEEK J. Feastnet:Feature-steered graph convolutions for 3D shape analysis[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE, 2018:2598-2606.
[55] TORNG W, ALTMAN R B. 3D deep convolutional neural networks for amino acid environment similarity analysis[J]. BMC Bioinformatics, 2017, 18:302.
[56] ZHANG Z W, CUI P, ZHU W W. Deep learning on graphs:A survey[Z/OL]. (2018-12-11)[2019-06-07]. https://arxiv.org/abs/1812.04202.
[57] DENG J, DONG W, SOCHER R, et al. ImageNet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA:IEEE, 2009:248-255.
[58] DEVLIN J, CHANG M W, LEE K, et al. Bert:Pre-training of deep bidirectional transformers for language understanding[Z/OL]. (2019-05-24). https://arxiv.org/abs/1810.04805.
[59] NAVARIN N, TRAN D V, SPERDUTI A. Pre-training graph neural networks with kernels[Z/OL]. (2018-11-16)[2019-06-07]. https://arxiv.org/abs/1811.06930.
[60] HU W H, LIU B W, GOMES J, et al. Pre-training graph neural networks[Z/OL]. (2019-05-29)[2019-06-22]. https://arxiv.org/abs/1905.12265.

基金

吴及,教授,E-mail:wuji_ee@tsinghua.edu.cn

PDF(1031 KB)

Accesses

Citation

Detail

段落导航
相关文章

/