针对大型水电工程建造过程产生的海量数据常分散于设计、施工和监理等多个业务主体, 数据类型差异大且专业壁垒高(严重制约数据高效共享和流通)等难题, 该文提出一种大型水电工程数据共享智能体构建方法。首先, 提出以数据共享智能体为核心, 可处理水电工程全生命周期结构化、半结构化和非结构化3类数据的共享框架; 其次, 针对上述3类数据, 分别开发时序数据库、知识图谱和文本向量库等外部工具, 构建数据共享智能体; 再次, 提出数据共享智能体工具学习方法, 建立工具学习数据集, 并将标准语言建模目标作为损失函数, 对DeepSeek-R1-Distill-Qwen模型的1.5B和7.0B这2个版本进行微调; 最后, 以某水电工程为例, 围绕环境监测、工程规范和文献书稿等数据构建数据共享智能体, 实现了工程参建各方之间的数据高效共享, 促进了数据价值的挖掘。结果表明:1.5B和7.0B模型微调后, 规划准确率分别提升270%和104%, 任务成功率分别达65.83%和90.83%。该文研究结果有助于充分挖掘和利用数据价值, 为高海拔地区大型水电工程智能建造提供参考。
Objective: Large-scale hydropower projects generate substantial amounts of heterogeneous data dispersed across design, construction, and supervision units. The interoperability among stakeholders is suboptimal due to the heterogeneity of data structures and professional contexts. Consequently, information sharing remains inefficient. Existing studies have typically focused on specific data types or lifecycle stages, lacking a unifying framework to facilitate comprehensive, full-cycle data sharing. To address this issue, this study proposes the development of a data-sharing agent tailored to the needs of hydropower engineering. The proposed agent is designed to accommodate structured, semi-structured, and unstructured data, and it integrates external tools such as time-series databases, knowledge graphs, and text vector databases. This integration enables accurate, on-demand data retrieval. By enhancing the tool-learning capabilities of large language models, the agent bridges data silos, enhances cross-domain collaboration, and lays a solid technical foundation for intelligent construction in complex hydropower projects. Methods: The research commences with a systematic analysis of data-sharing requirements across the full lifecycle of hydropower projects, encompassing time-series monitoring data, technical documentation, and parametric design files. Based on this analysis, a comprehensive agent framework is designed to support multi-modal data interoperability. To ensure its practicality, a supporting tool system is constructed that integrates intelligent modules for database retrieval, knowledge graph querying, and rule-based inference. Furthermore, an action-planning dataset comprising over 4000 samples is developed to train the agent in decision-making and tool invocation. Two versions of the DeepSeek-R1-Distill-Qwen model (1.5B and 7.0B parameters) are fine-tuned using this dataset to enhance structured parameter extraction, multi-step reasoning, and action planning capabilities. To assess performance, a benchmark testing dataset comprising hundreds of real-world business queries derived from hydropower project workflows is established and manually annotated to ensure fairness and reproducibility. Results: Experimental results demonstrated that the fine-tuned models substantially improved planning and reasoning performance. A comparative analysis revealed that the 1.5B and 7.0B models achieved 270% and 104% improvements in planning accuracy, respectively, compared with their pre-fine-tuning counterparts. On the business query test set, the overall output accuracies were 65.83% and 90.83%, respectively, thereby confirming a significant enhancement in model reliability and practical utility through fine-tuning. Notably, the 7.0B model consistently outperformed the smaller version, highlighting the larger model's capacity to handle complex, multi-step reasoning tasks. A practical deployment of the agent-based data-sharing platform was conducted for a real hydropower project in a representative watershed. Under static and structured data-sharing conditions, the agent maintained an average response time of less than 20s. Conversely, dynamic monitoring scenarios involving high-frequency data streams exhibited average latencies exceeding 30s, with peaks exceeding 60s under intensive analytical loads. Conclusions: This study proposes a comprehensive framework for constructing a data-sharing agent that effectively addresses critical challenges in current hydropower data-sharing practices, particularly in high-altitude, data-scarce environments. By aligning agent design with engineering-specific requirements and integrating a highly refined large language model with a domain-oriented tool ecosystem, the proposed method significantly enhances the efficiency, intelligence, and semantic interoperability of data sharing. The agent reduces cross-disciplinary access barriers, improves system responsiveness, and supports knowledge-driven decision-making. The results from field applications confirm its considerable potential for practical implementation in intelligent construction platforms. Furthermore, the findings of this study provide a scalable, generalizable technical foundation for the future development of data-driven management and intelligent decision-support systems in complex hydropower projects.