Cross-database technology-product category mapping link method for innovative information mining

Yutong WEI, Xinming XIA, Shaojie ZHOU

Journal of Tsinghua University(Science and Technology) ›› 2025, Vol. 65 ›› Issue (11) : 2206-2220.

PDF(2068 KB)
PDF(2068 KB)
Journal of Tsinghua University(Science and Technology) ›› 2025, Vol. 65 ›› Issue (11) : 2206-2220. DOI: 10.16511/j.cnki.qhdxxb.2025.27.023
Resources and Environmental Issues in Global Value Chains

Cross-database technology-product category mapping link method for innovative information mining

Author information +
History +

Abstract

Objective: As the central role of innovation elements in economic and social development continues to rise, based on the practical need of the country to "strengthen the evaluation of the relationship between patent activities and economic benefits", breaking through the barriers between patent databases and other economic databases is a substantial development in the economic and national analysis research fields. The establishment of connections and data networks across various databases, such as product and patent databases in different fields and systems, is required to explore the correlation, internal mechanisms, and heterogeneity of innovative applications and transformations. Existing studies have mainly focused on the mapping between patents and industries. However, the complexity of product classification has led to the absence of direct mapping between International Patent Classification (IPC) and Harmonized System (HS) code, which has limited the analysis of technology transfer and industry-technology adaptation mechanisms. This paper aims to construct a cross-database technology-product category mapping method, reveal the technological characteristics of segmented industries, and provide data support for industrial innovation research. Methods: This paper utilizes the classification information of patent and product databases to explore the full-category mapping relationship between patent IPC classification and product HS classification in the Chinese language environment. Based on the comprehensive method of natural language processing (NLP), cross-searching, and algorithmic links with probabilities (ALP), this paper employs the examples of products corresponding to the HS codes from the data released by the General Administration of Customs of China as external word sources to expand the HS category keywords, thereby obtaining a keyword list with higher quality than that generated by NLP segmentation. Furthermore, three weighting correction methods(raw weight, specificity weight, and hybrid weight) are employed based on the Bayesian theorem to establish mapping links between HS (six-digit) and IPC (three-digit); these are combined with multilevel classification to refine the analysis of technological differences and associations. Results: The mapping results reveal that complex products are associated with a wide variety of technologies, whereas simple industrial and agricultural products are associated with fewer technology types. The results reflect the heterogeneity of technological innovation across different industries and products. The calculation results of specificity and mixed weights are more likely to reveal unique technology types related to the production of certain product categories compared with the original weight, which is of great importance for further identifying specialized, sophisticated, and novel patents. The development of strategic emerging industries is closely related to the technological support of sections G (Physics) and H (Electricity), objectively indicating the importance of basic research in the development of strategic emerging industries. Conclusions: The IPC-HS link method constructed using cross-searching and ALP can effectively quantify the strength of technology-product associations, break through the barriers of the classification systems between technology and products from the perspective of innovation achievement transformation, and provide data-driven empirical support for the transformation of technological achievements. This mapping relationship can reveal the technological characteristics and differences of segmented industries; it can contribute to the understanding of technology diffusion in the innovation ecosystem, the application of technology in strategic emerging industries, and the adaptation mechanism between technology and industry.

Key words

cross-database cross-searching / natural language processing / algorithmic links with probabilities / mapping / innovative applications

Cite this article

Download Citations
Yutong WEI , Xinming XIA , Shaojie ZHOU. Cross-database technology-product category mapping link method for innovative information mining[J]. Journal of Tsinghua University(Science and Technology). 2025, 65(11): 2206-2220 https://doi.org/10.16511/j.cnki.qhdxxb.2025.27.023

References

1
王一鸣. 百年大变局、高质量发展与构建新发展格局[J]. 管理世界, 2020, 36 (12): 1- 13.
WANG Y M . Changes Unseen in a Century, High-Quality Development, and the Construction of a New Development Pattern[J]. Journal of Management World, 2020, 36 (12): 1- 13.
2
李晓华, 吕铁. 战略性新兴产业的特征与政策导向研究[J]. 宏观经济研究, 2010 (9): 20- 26.
LI X H , LV T . Research on the characteristics and policy orientation of strategic emerging industries[J]. Macroeconomics, 2010 (9): 20- 26.
3
姜永常. 基于知识网络的动态知识构建: 空间透视与机理分析[J]. 中国图书馆学报, 2010, 36 (4): 115- 124.
JIANG Y C . Dynamic knowledge architecture based on knowledge network: Space perspectives and mechanism analysis[J]. Journal of Library Science in China, 2010, 36 (4): 115- 124.
4
丁照琪, 张建辉, 许辰辉. 需求驱动的跨领域专利技术挖掘方法构建[J]. 科技管理研究, 2024, 44 (14): 154- 163.
DING Z Q , ZHANG J H , XU C H . The construction of demand-driven cross-disciplinary patent technology mining method[J]. Science and Technology Management Research, 2024, 44 (14): 154- 163.
5
詹文青, 肖国华. 面向技术需求的潜在技术转移专利识别[J]. 情报理论与实践, 2019, 42 (5): 117-121, 176.
ZHAN W Q , XIAO G H . Identify potential technology transfer patents oriented technology demand[J]. Information Studies: Theory & Application, 2019, 42 (5): 117-121, 176.
6
厉宁, 邹志仁. 专利信息的利用研究[J]. 中国图书馆学报, 2001, 27 (1): 38- 43.
LI N , ZOU Z R . On the use of patent information[J]. Journal of Library Science in China, 2001, 27 (1): 38- 43.
7
顾夏铭, 陈勇民, 潘士远. 经济政策不确定性与创新——基于我国上市公司的实证分析[J]. 经济研究, 2018, 53 (2): 109- 123.
GU X M , CHEN Y M , PAN S Y . Economic policy uncertainty and innovation: Evidence from listed companies in China[J]. Economic Research Journal, 2018, 53 (2): 109- 123.
8
余明桂, 范蕊, 钟慧洁. 中国产业政策与企业技术创新[J]. 中国工业经济, 2016 (12): 5- 22.
YU M G , FAN R , ZHONG H J . Chinese industrial policy and corporate technological innovation[J]. China Industrial Economics, 2016 (12): 5- 22.
9
王馨, 王营. 绿色信贷政策增进绿色创新研究[J]. 管理世界, 2021, 37 (6): 173- 188.
WANG X , WANG Y . Research on the green innovation promoted by green credit policies[J]. Journal of Management World, 2021, 37 (6): 173- 188.
10
GOLDSCHLAG N, LYBBERT T J, ZOLAS N J. An 'algorithmic links with probabilities' crosswalk for USPC and CPC patent classifications with an application towards industrial technology composition[R]. Census Bureau: US Census Bureau Center for Economic Studies, 2016.
11
杨震宁, 赵红. 中国企业的开放式创新: 制度环境、"竞合"关系与创新绩效[J]. 管理世界, 2020, 36 (2): 139-160, 224.
YANG Z N , ZHAO H . Chinese enterprises' open innovation: Institutional environment, co-opetition relationship and innovation performance[J]. Journal of Management World, 2020, 36 (2): 139-160, 224.
12
HARHOFF D , HOISL K , REICHL B , et al. Patent validation at the country level: The role of fees and translation costs[J]. Research Policy, 2009, 38 (9): 1423- 1437.
13
伊惠芳, 吴红. 基于产品-功能分析的高校专利转移对象识别研究——以我国石墨烯领域为例[J]. 情报杂志, 2020, 39 (8): 63- 70.
YI H F , WU H . A study on universities patent transfer object recognition based on product function analysis by taking the graphene patents in China as an example[J]. Journal of Intelligence, 2020, 39 (8): 63- 70.
14
马费成, 张帅. 我国图书情报领域新兴交叉学科发展探析[J]. 中国图书馆学报, 2023, 49 (2): 4- 14.
MA F C , ZHANG S . The development of emerging interdisciplines in library and information science in China[J]. Journal of Library Science in China, 2023, 49 (2): 4- 14.
15
KORTUM S , PUTNAM J . Assigning patents to industries: Tests of the Yale technology concordance[J]. Economic Systems Research, 1997, 9 (2): 161- 176.
16
SCHMOCH U, LAVILLE F, PATEL P, et al. Linking technology areas to industrial sectors: Final report to the European Commission, DG Research[R]. Karlsruhe: ISI, 2003.
17
VERSPAGEN B, VAN MOERGASTEL T, SLABBERS M. MERIT concordance tables: IPC-ISIC (Rev. 2)[R]. Maastrichit: MERIT Research Memorandum February, 1994.
18
中华人民共和国国家知识产权局. 关于印发《国际专利分类与国民经济行业分类参照关系表(2018)》的通知[EB/OL]. (2018-10-08)[2024-08-08]. https://www.cnipa.gov.cn/art/2018/10/8/art_75_131968.html.
State Intellectual Property Office of the People's Republic of China. Notice on printing the "table of correspondence between international patent classification and national economic industry classification (2018)"[EB/OL]. (2018-10-08)[2024-08-08]. https://www.cnipa.gov.cn/art/2018/10/8/art_75_131968.html. (in Chinese)
19
TANG Y , LOU X M , CHEN Z F , et al. A study on dynamic patterns of technology convergence with IPC co-occurrence-based analysis: The case of 3D printing[J]. Sustainability, 2020, 12 (7): 2655.
20
YAN B W , LUO J X . Measuring technological distance for patent mapping[J]. Journal of the Association for Information Science and Technology, 2017, 68 (2): 423- 437.
21
BINDING C , TUDHOPE D . Improving interoperability using vocabulary linked data[J]. International Journal on Digital Libraries, 2016, 17 (1): 5- 21.
22
MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: ACM, 2013: 3111-3119.
23
LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China: JMLR. org, 2014: 1188-1196.
24
李悦, 苏成, 潘云涛. 分类法映射研究综述[J]. 情报理论与实践, 2018, 41 (9): 154- 160.
LI Y , SU C , PAN Y T . A review of classification mapping[J]. Information Studies: Theory & Application, 2018, 41 (9): 154- 160.
25
林泽斐, 欧石燕. 多特征融合的中文命名实体链接方法研究[J]. 情报学报, 2019, 38 (1): 68- 78.
LIN Z F , OU S Y . Research on Chinese named entity linking based on multi-feature fusion[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38 (1): 68- 78.
26
AHARONSON B S , SCHILLING M A . Mapping the technological landscape: Measuring technology distance, technological footprints, and technology evolution[J]. Research Policy, 2016, 45 (1): 81- 96.
27
CHOI J , HWANG Y S . Patent keyword network analysis for improving technology development efficiency[J]. Technological Forecasting and Social Change, 2014, 83, 170- 182.
28
DORNER M , HARHOFF D . A novel technology-industry concordance table based on linked inventor-establishment data[J]. Research Policy, 2018, 47 (4): 768- 781.
29
ZHAO R , MAO K Z . Fuzzy bag-of-words model for document representation[J]. IEEE Transactions on Fuzzy Systems, 2018, 26 (2): 794- 804.
30
JOACHIMS T. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization[C]//Proceedings of the Fourteenth International Conference on Machine Learning. San Francisco, CA, United States: Morgan Kaufmann Publishers Inc., 1997.
31
LEYDESDORFF L , KOGLER D F , YAN B W . Mapping patent classifications: Portfolio and statistical analysis, and the comparison of strengths and weaknesses[J]. Scientometrics, 2017, 112 (3): 1573- 1591.
32
XIA P P , ZHANG L , LI F Z . Learning similarity with cosine similarity ensemble[J]. Information Sciences, 2015, 307, 39- 52.
33
JOHNSON D. The OECD Technology Concordance (OTC): Patents by industry of manufacture and sector of use[R]. Paris: OECD Publishing, 2002.
34
LEYDESDORFF L , KUSHNIR D , RAFOLS I . Interactive overlay maps for US patent (USPTO) data based on International Patent Classification (IPC)[J]. Scientometrics, 2014, 98 (3): 1583- 1599.
35
田创, 赵亚娟. 一种基于相似度的专利与产业类目映射模型——以《国际专利分类》与《国民经济行业分类》为例[J]. 图书情报工作, 2016, 60 (20): 123- 131.
TIAN C , ZHAO Y J . A similarity-based model for mapping between patent and industrial classifications: Mapping between the international patent classification and the industrial classification for national economic activities[J]. Library and Information Service, 2016, 60 (20): 123- 131.
36
马晓萌, 徐峰, 刘清民, 等. 基于Doc2vec的专利与行业类目映射研究[J]. 情报探索, 2020 (6): 67- 74.
MA X M , XU F , LIU Q M , et al. Doc2vec-based study on mapping between patented and industrial categories[J]. Information Research, 2020 (6): 67- 74.
37
周林志, 齐建东, 王建新, 等. 基于词汇相似度的IPC与CLC映射[J]. 计算机工程, 2010, 36 (23): 274-276, 279.
ZHOU L Z , QI J D , WANG J X , et al. Mapping between IPC and CLC based on similarity of words[J]. Computer Engineering, 2010, 36 (23): 274-276, 279.
38
LYBBERT T J , ZOLAS N J . Getting patents and economic data to speak to each other: An 'Algorithmic Links with Probabilities' approach for joint analyses of patenting and economic activity[J]. Research Policy, 2014, 43 (3): 530- 542.
39
GOLDSCHLAG N , LYBBERT T J , ZOLAS N J . Tracking the technological composition of industries with algorithmic patent concordances[J]. Economics of Innovation and New Technology, 2020, 29 (6): 582- 602.
40
刘德馨, 李有馥. 国际专利分类法评价[J]. 情报科学, 1993 (4): 20- 27.
LIU D X , LI Y F . Evaluation of the international patent classification[J]. Information Science, 1993 (4): 20- 27.
41
YANG Y X , REN G C . HanLP-based technology function matrix construction on Chinese process patents[J]. International Journal of Mobile Computing and Multimedia Communications (IJMCMC), 2020, 11 (3): 48- 64.
42
上海闻泰电子科技有限公司. 一种双CPU架构智能手机及其通信控制方法: CN101242607A[P]. 2008-08-13.
Shanghai Wentai Electronic Technology Co., Ltd. An intelligent mobile phone based on dual-CPU architecture and communication control method: CN101242607A[P]. 2008-08-13. (in Chinese)
43
上海华虹NEC电子有限公司, 上海集成电路研发中心有限公司. 一种投影式光刻机中硅片平台高度控制系统及方法: CN1920668A[P]. 2007-02-28.
Shanghai Huahong NEC Electronics Co., Ltd., Shanghai Integrated Circuit Research and Development Center Co., Ltd. Silicon wafer platform height control system and method in projection type photoetching machine: CN1920668A[P]. 2007-02-28. (in Chinese)
44
浙江大学. 携带miR-199*的脂肪间充质干细胞在肝癌细胞治疗中的应用及其构建方法: CN103451155A[P]. 2013-12-18.
Zhejiang University. Applications of miR-199*-carried mesenchymal stem cells in hepatoma carcinoma cell therapy, and construction method of miR-199*-carried mesenchymal stem cells: CN103451155A[P]. 2013-12-18. (in Chinese)
45
王磊. 多风轮机混合储能式风力发电机: CN1363761A[P]. 2002-08-14.
WANG L. Energy-accumulating wind-driven electric generator and multiple aerovanes: CN1363761A[P]. 2002-08-14. (in Chinese)
46
东北大学. 用微波技术合成锂离子蓄电池材料的方法: CN1359163A[P]. 2002-07-17.
Northeast University. Method for synthesizing lithium ion accumulator material by microwave technology: CN1359163A[P]. 2002-07-17. (in Chinese)
47
上海科星自动化技术有限公司. 单束多股金属复合外套光缆: CN1430079A[P]. 2003-07-16.
Shanghai Kexing Automation Technology Co., Ltd. Single bunch multistrand optical cable with metal composite sheath: CN1430079A[P]. 2003-07-16. (in Chinese)
48
武汉铁路科学技术研究发展有限公司. 一种铁道机车、车辆轮轴故障救援装置: CN101875357A[P]. 2010-11-03.
Wuhan Railway Science and Technology Research and Development Co., Ltd. Axle fault rescue device of railway locomotives and vehicles: CN101875357A[P]. 2010-11-03. (in Chinese)
49
财团法人工业技术研究院. 一种复合式燃料电池电动车辆的电力输出控制系统: CN1346759A[P]. 2002-05-01.
Industrial Technology Research Institute. Electric power output control system for electric vehicle with combined fuel battery: CN1346759A[P]. 2002-05-01. (in Chinese)
50
深圳航天东方红海特卫星有限公司. 一种数据通信卫星星座系统及其通信方法: CN104753580A[P]. 2015-07-01.
Shenzhen Aerospace Dongfanghong Satellite, Ltd. Data communication satellite constellation system and communication method thereof: CN104753580A[P]. 2015-07-01. (in Chinese)
51
江苏省金峰石油机械制造有限公司. 一种节能型海上石油钻井平台: CN106988284A[P]. 2017-07-28.
Jiangsu Jinfeng Petroleum Machinery Manufacturing Co., Ltd. Energy-saving type offshore oil drilling platform: CN106988284A[P]. 2017-07-28. (in Chinese)

RIGHTS & PERMISSIONS

All rights reserved. Unauthorized reproduction is prohibited.
PDF(2068 KB)

Accesses

Citation

Detail

Sections
Recommended

/