PDF(2068 KB)
Cross-database technology-product category mapping link method for innovative information mining
Yutong WEI, Xinming XIA, Shaojie ZHOU
Journal of Tsinghua University(Science and Technology) ›› 2025, Vol. 65 ›› Issue (11) : 2206-2220.
PDF(2068 KB)
PDF(2068 KB)
Cross-database technology-product category mapping link method for innovative information mining
Objective: As the central role of innovation elements in economic and social development continues to rise, based on the practical need of the country to "strengthen the evaluation of the relationship between patent activities and economic benefits", breaking through the barriers between patent databases and other economic databases is a substantial development in the economic and national analysis research fields. The establishment of connections and data networks across various databases, such as product and patent databases in different fields and systems, is required to explore the correlation, internal mechanisms, and heterogeneity of innovative applications and transformations. Existing studies have mainly focused on the mapping between patents and industries. However, the complexity of product classification has led to the absence of direct mapping between International Patent Classification (IPC) and Harmonized System (HS) code, which has limited the analysis of technology transfer and industry-technology adaptation mechanisms. This paper aims to construct a cross-database technology-product category mapping method, reveal the technological characteristics of segmented industries, and provide data support for industrial innovation research. Methods: This paper utilizes the classification information of patent and product databases to explore the full-category mapping relationship between patent IPC classification and product HS classification in the Chinese language environment. Based on the comprehensive method of natural language processing (NLP), cross-searching, and algorithmic links with probabilities (ALP), this paper employs the examples of products corresponding to the HS codes from the data released by the General Administration of Customs of China as external word sources to expand the HS category keywords, thereby obtaining a keyword list with higher quality than that generated by NLP segmentation. Furthermore, three weighting correction methods(raw weight, specificity weight, and hybrid weight) are employed based on the Bayesian theorem to establish mapping links between HS (six-digit) and IPC (three-digit); these are combined with multilevel classification to refine the analysis of technological differences and associations. Results: The mapping results reveal that complex products are associated with a wide variety of technologies, whereas simple industrial and agricultural products are associated with fewer technology types. The results reflect the heterogeneity of technological innovation across different industries and products. The calculation results of specificity and mixed weights are more likely to reveal unique technology types related to the production of certain product categories compared with the original weight, which is of great importance for further identifying specialized, sophisticated, and novel patents. The development of strategic emerging industries is closely related to the technological support of sections G (Physics) and H (Electricity), objectively indicating the importance of basic research in the development of strategic emerging industries. Conclusions: The IPC-HS link method constructed using cross-searching and ALP can effectively quantify the strength of technology-product associations, break through the barriers of the classification systems between technology and products from the perspective of innovation achievement transformation, and provide data-driven empirical support for the transformation of technological achievements. This mapping relationship can reveal the technological characteristics and differences of segmented industries; it can contribute to the understanding of technology diffusion in the innovation ecosystem, the application of technology in strategic emerging industries, and the adaptation mechanism between technology and industry.
cross-database cross-searching / natural language processing / algorithmic links with probabilities / mapping / innovative applications
| 1 |
王一鸣. 百年大变局、高质量发展与构建新发展格局[J]. 管理世界, 2020, 36 (12): 1- 13.
|
| 2 |
李晓华, 吕铁. 战略性新兴产业的特征与政策导向研究[J]. 宏观经济研究, 2010 (9): 20- 26.
|
| 3 |
姜永常. 基于知识网络的动态知识构建: 空间透视与机理分析[J]. 中国图书馆学报, 2010, 36 (4): 115- 124.
|
| 4 |
丁照琪, 张建辉, 许辰辉. 需求驱动的跨领域专利技术挖掘方法构建[J]. 科技管理研究, 2024, 44 (14): 154- 163.
|
| 5 |
詹文青, 肖国华. 面向技术需求的潜在技术转移专利识别[J]. 情报理论与实践, 2019, 42 (5): 117-121, 176.
|
| 6 |
厉宁, 邹志仁. 专利信息的利用研究[J]. 中国图书馆学报, 2001, 27 (1): 38- 43.
|
| 7 |
顾夏铭, 陈勇民, 潘士远. 经济政策不确定性与创新——基于我国上市公司的实证分析[J]. 经济研究, 2018, 53 (2): 109- 123.
|
| 8 |
余明桂, 范蕊, 钟慧洁. 中国产业政策与企业技术创新[J]. 中国工业经济, 2016 (12): 5- 22.
|
| 9 |
王馨, 王营. 绿色信贷政策增进绿色创新研究[J]. 管理世界, 2021, 37 (6): 173- 188.
|
| 10 |
GOLDSCHLAG N, LYBBERT T J, ZOLAS N J. An 'algorithmic links with probabilities' crosswalk for USPC and CPC patent classifications with an application towards industrial technology composition[R]. Census Bureau: US Census Bureau Center for Economic Studies, 2016.
|
| 11 |
杨震宁, 赵红. 中国企业的开放式创新: 制度环境、"竞合"关系与创新绩效[J]. 管理世界, 2020, 36 (2): 139-160, 224.
|
| 12 |
|
| 13 |
伊惠芳, 吴红. 基于产品-功能分析的高校专利转移对象识别研究——以我国石墨烯领域为例[J]. 情报杂志, 2020, 39 (8): 63- 70.
|
| 14 |
马费成, 张帅. 我国图书情报领域新兴交叉学科发展探析[J]. 中国图书馆学报, 2023, 49 (2): 4- 14.
|
| 15 |
|
| 16 |
SCHMOCH U, LAVILLE F, PATEL P, et al. Linking technology areas to industrial sectors: Final report to the European Commission, DG Research[R]. Karlsruhe: ISI, 2003.
|
| 17 |
VERSPAGEN B, VAN MOERGASTEL T, SLABBERS M. MERIT concordance tables: IPC-ISIC (Rev. 2)[R]. Maastrichit: MERIT Research Memorandum February, 1994.
|
| 18 |
中华人民共和国国家知识产权局. 关于印发《国际专利分类与国民经济行业分类参照关系表(2018)》的通知[EB/OL]. (2018-10-08)[2024-08-08]. https://www.cnipa.gov.cn/art/2018/10/8/art_75_131968.html.
State Intellectual Property Office of the People's Republic of China. Notice on printing the "table of correspondence between international patent classification and national economic industry classification (2018)"[EB/OL]. (2018-10-08)[2024-08-08]. https://www.cnipa.gov.cn/art/2018/10/8/art_75_131968.html. (in Chinese)
|
| 19 |
|
| 20 |
|
| 21 |
|
| 22 |
MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: ACM, 2013: 3111-3119.
|
| 23 |
LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China: JMLR. org, 2014: 1188-1196.
|
| 24 |
李悦, 苏成, 潘云涛. 分类法映射研究综述[J]. 情报理论与实践, 2018, 41 (9): 154- 160.
|
| 25 |
林泽斐, 欧石燕. 多特征融合的中文命名实体链接方法研究[J]. 情报学报, 2019, 38 (1): 68- 78.
|
| 26 |
|
| 27 |
|
| 28 |
|
| 29 |
|
| 30 |
JOACHIMS T. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization[C]//Proceedings of the Fourteenth International Conference on Machine Learning. San Francisco, CA, United States: Morgan Kaufmann Publishers Inc., 1997.
|
| 31 |
|
| 32 |
|
| 33 |
JOHNSON D. The OECD Technology Concordance (OTC): Patents by industry of manufacture and sector of use[R]. Paris: OECD Publishing, 2002.
|
| 34 |
|
| 35 |
田创, 赵亚娟. 一种基于相似度的专利与产业类目映射模型——以《国际专利分类》与《国民经济行业分类》为例[J]. 图书情报工作, 2016, 60 (20): 123- 131.
|
| 36 |
马晓萌, 徐峰, 刘清民, 等. 基于Doc2vec的专利与行业类目映射研究[J]. 情报探索, 2020 (6): 67- 74.
|
| 37 |
周林志, 齐建东, 王建新, 等. 基于词汇相似度的IPC与CLC映射[J]. 计算机工程, 2010, 36 (23): 274-276, 279.
|
| 38 |
|
| 39 |
|
| 40 |
刘德馨, 李有馥. 国际专利分类法评价[J]. 情报科学, 1993 (4): 20- 27.
|
| 41 |
|
| 42 |
上海闻泰电子科技有限公司. 一种双CPU架构智能手机及其通信控制方法: CN101242607A[P]. 2008-08-13.
Shanghai Wentai Electronic Technology Co., Ltd. An intelligent mobile phone based on dual-CPU architecture and communication control method: CN101242607A[P]. 2008-08-13. (in Chinese)
|
| 43 |
上海华虹NEC电子有限公司, 上海集成电路研发中心有限公司. 一种投影式光刻机中硅片平台高度控制系统及方法: CN1920668A[P]. 2007-02-28.
Shanghai Huahong NEC Electronics Co., Ltd., Shanghai Integrated Circuit Research and Development Center Co., Ltd. Silicon wafer platform height control system and method in projection type photoetching machine: CN1920668A[P]. 2007-02-28. (in Chinese)
|
| 44 |
浙江大学. 携带miR-199*的脂肪间充质干细胞在肝癌细胞治疗中的应用及其构建方法: CN103451155A[P]. 2013-12-18.
Zhejiang University. Applications of miR-199*-carried mesenchymal stem cells in hepatoma carcinoma cell therapy, and construction method of miR-199*-carried mesenchymal stem cells: CN103451155A[P]. 2013-12-18. (in Chinese)
|
| 45 |
王磊. 多风轮机混合储能式风力发电机: CN1363761A[P]. 2002-08-14.
WANG L. Energy-accumulating wind-driven electric generator and multiple aerovanes: CN1363761A[P]. 2002-08-14. (in Chinese)
|
| 46 |
东北大学. 用微波技术合成锂离子蓄电池材料的方法: CN1359163A[P]. 2002-07-17.
Northeast University. Method for synthesizing lithium ion accumulator material by microwave technology: CN1359163A[P]. 2002-07-17. (in Chinese)
|
| 47 |
上海科星自动化技术有限公司. 单束多股金属复合外套光缆: CN1430079A[P]. 2003-07-16.
Shanghai Kexing Automation Technology Co., Ltd. Single bunch multistrand optical cable with metal composite sheath: CN1430079A[P]. 2003-07-16. (in Chinese)
|
| 48 |
武汉铁路科学技术研究发展有限公司. 一种铁道机车、车辆轮轴故障救援装置: CN101875357A[P]. 2010-11-03.
Wuhan Railway Science and Technology Research and Development Co., Ltd. Axle fault rescue device of railway locomotives and vehicles: CN101875357A[P]. 2010-11-03. (in Chinese)
|
| 49 |
财团法人工业技术研究院. 一种复合式燃料电池电动车辆的电力输出控制系统: CN1346759A[P]. 2002-05-01.
Industrial Technology Research Institute. Electric power output control system for electric vehicle with combined fuel battery: CN1346759A[P]. 2002-05-01. (in Chinese)
|
| 50 |
深圳航天东方红海特卫星有限公司. 一种数据通信卫星星座系统及其通信方法: CN104753580A[P]. 2015-07-01.
Shenzhen Aerospace Dongfanghong Satellite, Ltd. Data communication satellite constellation system and communication method thereof: CN104753580A[P]. 2015-07-01. (in Chinese)
|
| 51 |
江苏省金峰石油机械制造有限公司. 一种节能型海上石油钻井平台: CN106988284A[P]. 2017-07-28.
Jiangsu Jinfeng Petroleum Machinery Manufacturing Co., Ltd. Energy-saving type offshore oil drilling platform: CN106988284A[P]. 2017-07-28. (in Chinese)
|
/
| 〈 |
|
〉 |