Chinese  |  English
Home Table of Contents

15 May 2026, Volume 66 Issue 5
    

  • Select all
    |
    CONSTRUCTION MANAGEMENT
  • ZHENG Xiaosheng, LYU Qian, WANG Juan, FANG Dongping, GU Botao
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 877-887. https://doi.org/10.16511/j.cnki.qhdxxb.2025.21.046
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] Although construction accidents in China have declined over the past decade, safety management in the construction industry continues to face persistent challenges. Weak safety leadership remains a critical factor contributing to the attenuation of safety requirements across organizational levels. This study aims to assess the current state of safety leadership among construction enterprise managers and to propose targeted improvement strategies grounded in the leadership-culture-behavior (LCB) theoretical framework. [Methods] Drawing on LCB theory, a safety leadership assessment scale was developed, covering four dimensions—leading by example, vision motivation, care and respect, and performance control—and consisting of 20 items tailored to construction enterprises. A questionnaire survey was administered to 1 115 managers in a Shenzhen-based construction enterprise. Following rigorous validity checks, 1 032 valid responses were retained (response rate: 92.5%). Descriptive statistics, one-way analysis of variance, and qualitative interviews were employed to evaluate the status of safety leadership. Differences across demographic variables, including gender, age, work experience, position, and educational background, were also examined. [Results] The overall safety leadership score was 4.18, suggesting a relatively high level according to established standards. However, notable imbalances were observed across dimensions: vision motivation (4.05) scored significantly lower than leading by example (4.27) and performance control (4.19). This result reflects a typical pattern of “strong institutional control but weak vision-driven leadership, ” aligning with China's current phase of strict supervision in work safety. Three critical weaknesses were identified: (1) safety-prioritized decision-making (item L13, score 4.19), (2) innovative safety incentive mechanisms (item L24, 3.85, with only 29.5% reporting full compliance), and (3) implementation of reward and punishment systems (item L43, 3.96, with 66.7% reporting inadequate execution). Moreover, educational background was inversely correlated with leadership scores: high school graduates achieved the highest score (4.34), compared with bachelor's (4.15) and master's degree holders (3.95). This finding supports the “experience compensation effect” described in the efficiency-thoroughness trade-off theory, suggesting that less-educated front-line managers rely on practical experience and microlevel perspectives, whereas highly educated managers adopt norm-oriented frameworks with higher expectations of leadership effectiveness. [Conclusions] A three-tier intervention framework is proposed to address the identified challenges. First, vision-driven leadership should be strengthened through safety innovation incentives, such as innovation funds and quarterly microinnovation competitions. Second, employee care should be enhanced by establishing bidirectional communication channels, including monthly nonwork-related leader-subordinate interactions and the introduction of mental health days with counseling services. Third, reward systems should be continuously refined by aligning incentives to position-specific risks, linking safety performance to career advancement, and adjusting policies through regular evaluations. The findings emphasize that cultivating effective safety leadership requires organizational-level interventions that consider the diverse cognitive backgrounds of managers. The proposed scale offers a reliable tool for ongoing assessment and targeted improvement. Overall, this study provides practical guidance for strengthening safety leadership, preventing the erosion of safety management requirements, and enhancing safety culture within construction enterprises. Future research should extend validation of the framework across broader regions and examine the predictive relationship between safety leadership development and safety performance outcomes.
  • CHAO Lemeng, DENG Xiaomei
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 888-897. https://doi.org/10.16511/j.cnki.qhdxxb.2026.21.004
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] Financial instruments such as performance bonds and insurance play central roles in controlling construction risk. Performance bonds aim to enforce contractors' contractual obligations, whereas inherent defect insurance (IDI) compensates for latent quality defects revealed after project completion. Existing studies have typically treated these mechanisms as independent systems, despite the fact that in practice, subjective and objective risks in construction projects are often intertwined. Insufficient contractor effort may lead to extensive quality defects that exceed insurers' risk-bearing capacity, leading insurers to commission technical inspection services (TIS) to supervise construction quality during project execution. This arrangement results in an overlap between the risk-control functions of performance bonds and IDI. Therefore, this study aims to examine whether performance bonds and IDI can form a synergistic risk-control mechanism, evaluating its implications for construction quality and market evolution using a dynamic analytical framework. [Methods] A tripartite evolutionary game model was developed to represent interactions among performance bond agencies, contractors, and TIS institutions commissioned by IDI insurers. Bond agencies choose between strict and lenient pre-issuance reviews, contractors engage in high-or low-level performance effort, and TIS decides between rigorous and non-rigorous supervision. The participants maximize expected payoffs by balancing the revenues, management costs, and losses arising from defaults or quality defects. The model incorporates 21 parameters, including bond coverage ratio, premium, management costs, and compensation ratios. Payoff matrices were constructed to derive decision fitness functions and replicator dynamic equations describing strategy evolution under bounded rationality. The stability of pure and mixed strategy equilibria was examined using Jacobian matrices and Lyapunov stability criteria, and parameter values were calibrated using expert interviews and data from domestic and international construction markets. Moreover, sensitivity analyses were conducted on key factors, including bond coverage ratio, compensation levels, contractor performance differentials, screening accuracy of bond agencies, and the effectiveness of TIS in reducing bond payouts and insurance claims. [Results] The results of numerical simulations indicate that collaborative risk control between performance bond agencies and TIS consistently outperforms independent supervision by each party. Coordinated oversight strengthens contractors' incentives to maintain high performance levels, reducing default probabilities and claim frequencies for bond agencies and insurers. In contrast, when bond agencies and insurers operate independently, contractors tend to adopt lower-effort strategies, increasing quality risks and financial burdens. Among all parameters, the bond coverage ratio exerts the strongest influence on evolutionary outcomes. Higher bond coverage induces stricter early-stage screening by bond agencies, restricting market entry for low-performing contractors and improving overall market efficiency. Effective TIS supervision reinforces these effects by lowering contractor default rates and indirectly reducing bond compensation costs. Furthermore, the simulations indicate that initial market conditions, in particular, disparities in contractor performance, affect the speed and stability of convergence. In markets with a high level of initial heterogeneity among contractors, coordinated and stringent supervision accelerates evolution toward high-performance equilibria, demonstrating the corrective function of synergistic risk control. [Conclusions] The findings confirm that performance bonds and IDI act as a synergistic risk-control mechanism, facilitated by TIS, improving contractors' performance and reducing systemic risk more effectively than these instruments in isolation. By applying evolutionary game theory, the study reveals how dynamic collaborations among key stakeholders shape long-term stability. The results support the use of integrated risk-control frameworks that combine bond and insurance functions. Policy and market design may benefit from prioritizing mandatory bond-insurance schemes for projects with significant public safety implications, such as residential buildings, strengthening the authority and remuneration mechanisms of TIS, enhancing information-sharing and professional training across the bond and insurance sectors, and improving credit availability in the construction industry to promote transparency and efficient market entry. Together, these measures contribute to a healthier construction market and support high-quality industry development.
  • ZHAO Xuefeng, ZHANG Rui, FAN Xiongtao, GUO Fei, YAN Wenkai
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 898-910. https://doi.org/10.16511/j.cnki.qhdxxb.2026.21.001
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] In the complex domain of metro tunnel construction, shield-machine station traversal represents a critical operational phase. Within these confined subterranean spaces, the sheer volume and mass of the shield machine pose substantial safety risks, particularly the risk of collisions with the main tunnel structure or peripheral temporary facilities. The clearance between the machinery and the tunnel walls is often minimal, rendering the operation extremely hazardous. As physical rehearsals for such large-scale operations are cost-prohibitive, logistically complex, and difficult to replicate under varying conditions, conventional risk management strategies have typically relied on limited sensor data and manual measurements. However, these methods are inherently labor-intensive and lack adequate real-time perception capabilities, providing only discrete data points rather than a continuous, holistic view of spatial relationships within the tunnel. Furthermore, purely virtual simulations often fail to accurately capture the complex, dynamic, and unscripted characteristics of the actual construction environment. To address these significant limitations, this paper reports a novel simulation method based on in situ virtual-real interaction, designed to provide real-time, high-precision risk early warning and decision support specifically during shield-machine station traversal. [Method] First, multisource design data of the shield machine were fused with on-site sensing information to construct high-fidelity, drivable virtual models of the shield machine and the construction environment using lightweight processing techniques and multilevel-of-detail modeling. These models were subsequently optimized for real-time rendering. Second, a robust markerless three-dimensional registration algorithm based on mixed reality technology was applied. This enabled high-precision spatial alignment of the virtual models with the physical environment without requiring intrusive physical markers, thereby ensuring dynamic synchronization of virtual and real scenes. To further enhance accuracy, the system integrated multisource data, including inertial measurements, inclination sensing, and guidance system inputs. By incorporating these inputs into an extended Kalman filter, the system obtained a stable, real-time solution for the six-degrees-of-freedom pose and motion simulation of the shield machine, effectively mitigating sensor drift. Simultaneously, a comprehensive collision-detection mechanism was established using the Unity physics engine. By implementing a mixed configuration of rigid bodies and triggers, the system achieved real-time interference identification for static and dynamic obstacles, facilitating multimodal warning feedback and forming a closed-loop system encompassing perception, simulation, and early warning. [Result] The proposed system was subjected to rigorous field validation in an actual engineering project at the Beijing Pinggu metro station. The results demonstrated that the system achieved a virtual-real spatial registration accuracy of ±4.5 mm within a 30 m test section. The core collision-detection latency was <6 ms, and the rendering frame rate remained stable at 45 fps, ensuring a smooth visual experience for operators and excellent real-time stability. In diverse complex scenarios, including static obstacles, unpredictable dynamic personnel intrusions, and cluttered temporary facilities, the system consistently triggered real-time highlighting warnings for collision zones. [Conclusion] Compared with conventional manual measurement methods, this approach significantly improved inspection efficiency, effectively enhancing risk-identification accuracy and real-time responsiveness. Furthermore, it substantially mitigated personnel safety risks and potential economic losses associated with equipment collisions and project delays. The simulation method based on in situ virtual-real interaction proposed in this paper overcomes the real-time and precision limitations of conventional techniques. By enabling proactive identification and immediate warning of potential collision risks, it transforms risk management from a lagging, passive mode into a proactive one characterized by risk anticipation and intervention. Ultimately, this approach significantly enhances construction safety and economic efficiency while providing a reliable technical pathway and decision-making basis for advancing intelligent risk management and digital twin applications in complex underground engineering projects.
  • DING Yanqiong, WANG Xue, TANG Zhili, XU Qianjun
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 911-918. https://doi.org/10.16511/j.cnki.qhdxxb.2025.21.040
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] Accurately and quickly determining investment estimation for utility tunnels is crucial for cost optimization and investment decision-making. Owing to the rapid development of artificial intelligence technology and the continuous accumulation of engineering investment databases, research on engineering investment estimation based on machine learning has become a hot topic. However, existing studies on utility tunnel investment estimation suffer from problems such as small data samples, reliance on single methods, lack of performance comparisons among multiple algorithms, low accuracy, and poor generalization performance. These issues result in significant prediction errors in practical applications that fail to meet the needs of engineering practice. Therefore, there is an urgent need to develop a universal investment estimation model for utility tunnels based on machine learning and data-driven approaches. [Methods] This study presents a systematic approach to constructing a utility tunnel investment estimation model, covering the data collection, preprocessing, feature engineering, multi-algorithm comparison, hyperparameter optimization, performance evaluation, and model application processes. Six key factors affecting utility tunnel investment estimation were selected as the input variables of the model, including tunnel length, number of chambers, excavation depth, cross-sectional size, construction method, and construction city, while the civil engineering cost of utility tunnels was taken as the output variable. A dataset containing 98 utility tunnel investment samples was created. Three data preprocessing methods were adopted to standardize the input variables of the dataset, including Min-Max normalization, Z-Score standardization, and RobustScaler. Based on Pearson's correlation analysis of the input variables and civil engineering cost, as well as the results of the feature importance analysis, nine groups of feature combinations that play a decisive role in predicting civil engineering cost were screened out. For multi-algorithm comparison, five classic machine learning algorithms were used to construct the utility tunnel investment estimation model: categorical boosting regression, gradient boosting decision tree, decision tree, extreme gradient boosting (XGB), and K-nearest neighbors. The Optuna hyperparameter optimization algorithm was used to optimize the model hyperparameters, and its performance was compared with that of the model without hyperparameter optimization. The performance of the estimation model was evaluated based on the coefficient of determination (R2 value) under three scenarios: three different preprocessing methods, nine different feature combinations, and with or without Optuna hyperparameter optimization. Through this evaluation, the optimal data preprocessing method and feature combination were determined, as well as the performance of Optuna hyperparameter optimization. Finally, the optimal estimation model was identified. Based on the optimal estimation model, an empirical prediction analysis of investment estimation was conducted for two utility tunnels in Beijing. [Results] The results show that the RobustScaler method is the optimal data preprocessing method for the dataset and the five algorithm models in this paper. Using the F-1 feature combination yields the highest average R2 value (0.623) among the five algorithm models, making F-1 the optimal feature combination. Hyperparameter optimization using the Optuna algorithm improves the performance of the five models by up to 40.4%, compared with no optimization. The Optuna-XGB algorithm model performed best after optimization with an R2 value of 0.843. The prediction deviation rates for the two utility tunnels in Beijing are 5.63% and 6.50%, respectively, for the Optuna-XGB algorithm model (the best-performing model), which are significantly lower than the 10% deviation requirement. [Conclusions] This study presents a data-driven investment estimation model for the civil engineering of utility tunnels, utilizing machine learning. The model's performance is examined in relation to the impact of data preprocessing methods, feature combinations, and the Optuna hyperparameter optimization algorithm. The optimal model proposed in this paper is highly accurate, which is significant for optimizing utility tunnel costs and making investment decisions, as well as ensuring their sustainable development.
  • XIONG Qian, LONG Jian, LAI Wangfa, YANG Zuobin, MAO Hua, ZHAO Haizhong, TANG Wenzhe
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 919-936. https://doi.org/10.16511/j.cnki.qhdxxb.2025.21.041
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] The development of clean energy bases is one of the key initiatives for advancing China's energy transition. Clean energy sources require the collaborative development of several projects, characterized by lengthy construction timelines, numerous stakeholders, and a complex management structure. Current research mostly focuses on key factors that affect how each clean energy project is managed. It does not systematically examine the underlying managerial mechanisms. Moreover, empirical research on the characteristics of clean energy bases is still lacking. This study develops a comprehensive theoretical model to identify key factors in the integration management of clean energy bases from a program management perspective. [Methods] The study uses a mixed research approach combining questionnaire surveys and semi-structured interviews. This study covers 38 clean energy projects located across 14 Chinese provinces or autonomous regions to ensure a diverse representation of regional development practices. Survey participants and interviewees were selected for their extensive project management and technical experience in clean energy domains, including owners, designers, constructors, supervisors, and operators. A robust dataset of 293 valid questionnaires was obtained for subsequent empirical analysis in two stages. The first stage included text analysis to identify key managerial demands for integration management in clean energy bases. Accordingly, a word cloud was created to visualize frequently cited keywords by stakeholders, and a Latent Dirichlet Allocation (LDA) model systematically categorized these demands into major themes, thereby offering an empirical basis for theoretical model development. In the second stage, the questionnaire data were analyzed using Partial Least Squares Structural Equation Modeling (PLS-SEM) with SmartPLS 4. This helped examine latent constructs and clarify the relationships among key program management factors and their impact on program performance. [Results] The text analysis identifies four key managerial demands in clean energy base integration management: strengthening integrated implementation, meeting external requirements, building robust management systems, and overcoming technological bottlenecks. The PLS-SEM results verify four key pathways of impact: (A) external conditions→management systems→partnerships→program operations→program performance; (B) external conditions→(management systems)→(resource allocation)→program implementation→program performance; (C) owners' program management capability→(partnerships)→program operations→program performance; (D) owners' program management capability→(management systems)→resource allocation→program implementation→program performance. Based on these findings, this study proposes five management strategies for integrating clean energy bases: (1) develop multi-energy complementary systems; (2) enhance digital management of clean energy programs; (3) promote integrated management across design, procurement, construction, and operation; (4) build partnerships among owners, governments, grid companies, and local communities; and (5) foster technological innovation to overcome critical bottlenecks. [Conclusions] Overall, this study identifies key challenges and focus areas in managing the integration of clean energy bases. The findings show that external conditions and owners' management skills, supported by well-designed management systems, optimized resource allocation, and effective partnership development, can substantially improve program implementation and operational efficiency. These measures would ultimately lead to an improved performance overall. The study provides theoretical and practical insights to enhance sustainability, efficiency, and competitiveness in China's clean energy transition.
  • XING Fenglin, ZHOU Hua, TANG Wenzhe
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 937-946. https://doi.org/10.16511/j.cnki.qhdxxb.2025.21.039
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] Integration of multi-energy complementarity is key to evolving modern power systems to achieve China's “Dual Carbon Goals”. Pumped storage hydropower (PSH) plants, as a flexible and reliable regulatory energy sources, offer considerable advantages including technological maturity, economic viability, and strong compatibility with renewable generations such as wind and photovoltaic power. They also significantly contribute to grid stability, renewable energy absorption, system flexibility, and enhanced power quality. However, few studies have empirically addressed the key influencing factors and organizational mechanisms of PSH operations from a systematic multi-energy complementarity perspective, and empirical validations remain limited. To bridge this gap, this study develops and empirically validates a comprehensive management model adapted to PSH plants within multi-energy complementary systems. [Methods] The study adopts a mixed-method approach, combining quantitative surveys and qualitative case analyses. Data were collected from 230 professionals engaged in various aspects of PSH project development, planning, operation, and management, ensuring wide coverage of critical operational segments. The proposed model includes six latent variables: multi-energy complementarity, overall PSH conditions, electricity pricing and market mechanisms, plant scheme selection, operational management, and project performance. The study uses structural equation modeling (SEM) with partial least squares (PLS) analysis to examine eight hypothesized pathways among these constructs. [Results] Results demonstrate significant positive relationships, with all path coefficients statistically supported (p<0.001), and key endogenous variables show high explanatory power (R2 values between 0.601 and 0.826). Specifically, multi-energy complementarity and overall PSH conditions exert substantial influence on operational management and eventual project performance through mediating factors such as pricing mechanisms and scheme selection. The empirical findings highlight the importance of market-based electricity pricing and complementary energy planning in enhancing the operational efficiency and economic viability of PSH stations. Furthermore, the study introduces a validated measurement system that offers robust tools for future empirical assessments in similar contexts. Based on these insights, actionable recommendations are proposed, including the advancement of integrated multi-energy coordination mechanisms, the establishment of a multidimensional evaluation framework for PSH site and technology selection, reforms in electricity pricing and market participation models to reflect flexible regulation value, and the integration of digital-intelligent technologies for optimized operation and market responsiveness. [Conclusions] This study provides both theoretical insights and practical strategies to improve the management and performance of PSH plants in multi-energy systems, thereby supporting the sustainable and high-quality development of PSH plants and contributing to national carbon neutrality objectives.
  • FU Zichao, LONG Jian, GONG Youlong, TANG Wenzhe
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 947-956. https://doi.org/10.16511/j.cnki.qhdxxb.2026.28.003
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] In support of China's objectives of “Carbon Peaking and Carbon Neutrality,” there is a significant expansion of clean-energy initiatives, accompanied by concurrent advancements in electricity market reforms. In reality, numerous large-scale clean-energy facilities still face a fundamental decision-making challenge: how to synchronize multi-market bidding with the operational constraints of cascade and pumped-storage hydropower (PSH) systems to maximize the total revenue of a hydro-photovoltaic (PV)-PSH portfolio under realistic settlement rules. Existing studies have offered valuable insights into renewable energy involvement and hybrid dispatch strategies; however, they often (i) focus on a single type of renewable technology, (ii) have a narrow focus on generation-side strategic optimization, or (iii) weakly integrate physical constraints with market participation. This study develops a practical, revenue optimization framework for a clean-energy base that explicitly benefits from multi-functional complementarity among cascade hydropower, PV generation, and PSH. [Methods] A two-level optimization framework is defined to maximize the total expected trading revenue for mid- to long-term contracting, day-ahead market, and real-time market. PV is treated as a priority injection, whereas cascade hydropower and PSH offer balancing and arbitrage functions to adjust the net delivery profile. The objective function combines multi-market revenues and includes settlement elements that adhere to practical guidelines, such as deviation settlement, benchmark settlement, mid-and long-term contract congestion charge, and penalties. From the physical aspects, comprehensive hydropower and PSH restrictions are imposed, including generator output bounds, ramping limits, water balance, reservoir level bounds, release constraints, and hydraulic coupling along the cascade. PSH is represented by mutually exclusive pumping and generating modes with power-flow conversion and head-related constraints. To prevent short-termism in intraday operations, an intraday balancing assumption is introduced to ensure that the daily outflow does not exceed the inflow. Uncertainty in PV output and market prices was captured by developing representative scenarios: historical PV and price time series were reduced using principal component analysis and a clustered K-means approach to produce typical-day scenarios, which were then combined through a Cartesian product with related probabilities. The outer layer optimizes bidding allocation across markets using Gaussian Process Bayesian Optimization, while the inner layer calculates the scenario-wise optimal cascade of hydropower and PSH dispatch through a multi-layer nested dynamic programming algorithm that breaks the cascade into subsystems to reduce complexity and memory usage. [Results] An upstream clean-energy base of the L River, with transactions settled in Guangdong's electricity market, is taken as a case study. This portfolio comprises multiple cascade hydropower stations, a large PV installation, and planned PSH capacity. Results show that, under the pricing structure observed during the studied period, allocating all energy to medium- and long-term contracting maximizes revenue. This suggests that the generator has limited incentives to shift volumes into day-ahead or real-time markets. In this common scenario, hydropower and PSH are highly complementary to PV. During peak PV output hours, cascade hydropower output is reduced, and PSH tends to pump, while during high-price periods and low PV output, hydropower and PSH are generated. Accordingly, the model identifies economically optimal pumping and generating intervals for PSH. Monte Carlo simulations also show that volatility-type forecast noise in PV output and in daily-ahead and real-time prices has a negligible influence on long-term expected revenue, whereas biases in mean PV output and mean price forecasts can significantly change revenue. This highlights that the importance of improving forecast means, particularly for expected PV production and price levels, is more critical for revenue-focused bidding and operation than fine-tuning short-term volatility patterns. [Conclusions] This study provides a consistent and detailed physical methodology for jointly optimizing market participation and hydro-PSH dispatch for clean-energy bases under uncertainty, thereby effectively narrowing the gap between market bidding and constrained multi-reservoir operations. The findings also suggest policy-relevant implications: if medium- to long-term prices remain structurally more attractive than spot prices, the real-time price discovery and balancing value of spot trading may not be fully realized. The limitations include the day-scale focus and the lack of endogenous price impacts from large strategic participants. Future research can broaden the framework to cover multi-day or monthly periods by incorporating intertemporal water value and strategic interaction by game-theoretic market models.
  • KNOWLEDGE GRAPH AND SEMANTIC COMPUTIING
  • QIU Yu, FENG Jun, ZHENG Zhehui, ZHAO Yi, SONG Haomin, CHEN Zuge, WANG Shaolan
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 957-966. https://doi.org/10.16511/j.cnki.qhdxxb.2026.21.002
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] Significant progress has been made in applying large language models (LLMs) to knowledge-based visual question answering (VQA), where systems jointly reason over visual content and external knowledge to produce accurate answers. However, existing approaches are limited in specialized vertical domains, particularly the power industry. A major challenge lies in framing effective prompts for LLMs. Given the scarcity of domain-specific textual corpora and the highly technical nature of power industry system operations, traditional prompt engineering methods often fail to provide sufficient contextual grounding. Consequently, even powerful general-purpose LLMs are unable to fully exploit their reasoning capabilities, resulting in suboptimal performance and limited practical utility. Moreover, most existing studies rely heavily on proprietary, closed-source models, such as GPT-4, for inference in VQA tasks. Despite these models' impressive zero-shot capabilities, their use incurs substantial computational costs, application programming interface latency, and a reliance on third-party services, hindering scalability, reproducibility, and real-world deployment, particularly in industrial settings that require data privacy, low-latency responses, and cost efficiency. These constraints underscore the need for an open, efficient, and domain-adapted alternative that can deliver high accuracy without sacrificing autonomy or affordability. [Methods] This paper proposes a novel large-scale model-based visual question-answering framework that is tailored to the power industry and centered on contextual knowledge prompting. This method leverages a foundational vision-language model that generates initial contextual knowledge examples from input image-question pairs. These examples encapsulate relevant visual semantics and preliminary reasoning traces. Subsequently, we introduce a lightweight answer selection layer that produces a set of plausible candidate answers from multimodal features. Crucially, the generated contextual knowledge examples and candidate answers are dynamically integrated into a structured prompt template, which is then fed to an LLM for final reasoning and answer refinement. This design effectively bridges the gap between generic visual understanding and domain-specific knowledge, enabling the LLM to “reason with context” rather than relying on its internal (and often incomplete) pre-trained knowledge. In alignment with our goals of accessibility and sustainability, we deliberately use LLaMA, an open-source, freely available LLM, as the backbone of our system, replacing expensive alternatives such as GPT-4. To further enhance domain adaptation, we curate a small but high-quality dataset comprising annotated image-question-answer triples from real-world power infrastructure scenarios (e.g., substation equipment identification, fault diagnosis from thermal images, and safety compliance checks). This dataset is used for finetuning the LLaMA-based VQA pipeline using parameter-efficient techniques, such as low-rank adaptation, to achieve rapid adaptation with minimal computational overhead. [Results] We evaluate our proposed method on two established knowledge-intensive VQA benchmarks: EVQA and A-OKVQA. The experimental results demonstrate that our contextual knowledge-prompting strategy significantly outperforms state-of-the-art baselines, achieving absolute accuracy gains of 8.8% on EVQA and 14.5% on A-OKVQA, validating the efficacy of our prompt construction mechanism and the viability of open-source LLMs in specialized industrial applications. [Conclusions] This work advances the technical frontier of domain-specific VQA and provides a practical, cost-effective, and reproducible blueprint for deploying large-model intelligence in critical infrastructure sectors.
  • HAN Tailai, TAN Chuanyuan, SHAO Wenbiao, XIONG Hao, CHEN Wenliang
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 967-976. https://doi.org/10.16511/j.cnki.qhdxxb.2025.21.051
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] This study systematically evaluated the ability of large language models (LLMs) to abstain from answering unanswerable questions—those that lack sufficient, reliable, or coherent information for a definitive response. The goal was to unify diverse unanswerable-question datasets and testing paradigms to examine how model scale, architecture, and prompting strategies influence abstention behavior across both factual and nonfactual scenarios. [Methods] Five representative datasets were categorized as either factual unanswerable or nonfactual unanswerable. Two task paradigms were defined: (1) a binary-classification task requiring explicit “Yes/No” judgments on answerability and (2) an open-domain generation task requiring natural language answers or an abstention token when appropriate. Two prompting strategies were compared—direct prompting and chain-of-thought (CoT) prompting, where CoT prompting required intermediate reasoning steps before a final judgment. Experiments were conducted in zero-shot settings with the temperature fixed at 0. Models evaluated included both open-source and proprietary LLMs spanning small to large parameter scales. Performance metrics included overall accuracy (Acc), accuracy on unanswerable items (AcU), accuracy on answerable items (AcA), and F1 score. Outputs were parsed using standardized rules to detect explicit abstentions and typical abstention-related phrases. [Results] The performance gap between large and small LLMs was limited on nonfactual unanswerable datasets. Larger models often produced more fluent but incorrect answers, reflecting a tendency to rely on linguistic fluency rather than true abstention capability. Conversely, the models performed better on factual unanswerable datasets: most achieved >70% AcU on FalseQA and NEC, and larger models showed higher F1 scores with a balanced trade-off between AcA and AcU. However, the UAQFact dataset remained challenging—even GPT-4o achieved only a 72.03% F1 score, with notably lower AcA, indicating that multifact reasoning and temporal consistency still pose challenges. Prompting strategies also played a significant role. CoT prompting improved accuracy and stability for some models, such as Qwen2.5-7B, Qwen-Plus, and GPT-4o; but for others (e.g., Llama2-7B and DeepSeek-v3), direct prompting yielded higher F1 scores, suggesting that solvability judgment can benefit from concise prompts, while CoT reasoning may introduce redundant steps that obscure decision boundaries. This study also suggests that the LLM performance generally improves with scale but not in a linear manner. Some larger LLMs prioritize answering ability at the expense of abstention capability, reducing robustness and safety. Version upgrades also do not consistently improve the F1 score, indicating limited gains from standard iteration. As per this study, the binary classification and open-domain tasks should be considered when using small LLMs. The results further demonstrate that binary classification does not necessarily make models more susceptible to abstention, ensuring that the evaluation framework does not overestimate model safety. [Conclusions] Under a unified evaluation framework, LLMs exhibited meaningful progress in refusing factual unanswerable questions but remained unreliable on nonfactual unanswerable items. Abstention capability was found to depend not only on scale but also on model alignment, instruction tuning, and prompt design substantially influence outcomes. CoT prompting is not universally beneficial and can help or harm the refusal behavior. These findings indicate that targeted training and evaluation methods are required to improve LLM reliability in real-world scenarios in which safe abstention is critical.
  • BIG DATA
  • YU Jinze, LIU Yanwen, CHEN Yongquan, LI Shuo, SUN Qingyun, ZHOU Haoyi, LI Jianxin
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 977-990. https://doi.org/10.16511/j.cnki.qhdxxb.2026.21.005
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] Vision-language models (VLMs), such as contrastive language-image pretraining (CLIP), achieve cross-modal image-text alignment by employing large-scale contrastive pretraining and enable prompt-driven zero-shot classification; thus, they overcome the limitations of closed category spaces in traditional supervised classification models. However, their generalization performance is constrained by distribution shifts between pretraining and downstream tasks, as well as by inherent knowledge boundaries, particularly in specialized domains with scarce labeled data (e.g., pathology). Various CLIP variants and extensions have emerged; these variants exhibit complementary but inconsistent downstream performance, which is attributed to their distinct model architectures and pretraining datasets. Although various parameter-efficient fine-tuning techniques have been proposed for adapting individual CLIP models to downstream tasks, existing studies have mainly focused on optimizing single pretrained models; thus, they fail to effectively exploit the complementary advantages of heterogeneous models. This study aimed to exploit the complementary strengths of heterogeneous pretrained CLIP models for pathology image classification. Specifically, we conduct a systematic comparison among ensemble strategies at both the model output and middle-layer feature levels; additionally, we propose a novel feature-level ensemble framework termed Mix-of-CLIP-Experts (MoCE). [Methods] Initially, we evaluated multiple pretrained CLIP models for pathology image classification tasks under the zero-shot setting to demonstrate their complementary strengths and weaknesses across different datasets. Next, we designed and evaluated various ensemble strategies. At the output level, we investigated simple averaging and a weighted combination of predictions based on model confidence scores or learned gating networks. At the feature level, we applied the proposed MoCE method to the fusion of image features obtained from heterogeneous CLIP models. The main challenge in feature-level CLIP model ensembling is the misalignment of embedding across incompatible cross-modal spaces of different CLIP models. To address this challenge, we combined MoCE adapter-based fine-tuning with the mix-of-experts (MoE) framework. Using this approach, the pretrained models can be simultaneously adapted to the downstream pathology task, and their image (and aligned text) features can be projected onto a unified embedding space. A learned router was employed to dynamically weight and aggregate these aligned image features to generate a fused representation; this representation was subsequently compared against text prompts, which were encoded using a single text encoder to perform the final classification. This process reduces computational redundancy by eliminating the need for multiple text encoders; additionally, it improves downstream performance via adapter-based fine-tuning and MoE routing to fully exploit model complementarity. [Results] We comprehensively evaluated the proposed framework and baseline ensemble strategies using multiple public pathology datasets under various few-shot settings. The results showed that MoCE consistently outperformed single-model fine-tuning baselines and output-level ensemble methods, demonstrating the advantages of feature-level model ensembling via adapter-based alignment and dynamic routing. Detailed ablation studies validated the effectiveness of the proposed MoCE framework and its specific components. [Conclusions] To the best of our knowledge, MoCE is the first feature-level ensembling framework for heterogeneous pretrained CLIP models. By combining adapter-based cross-model feature alignment with MoCE routing, effective fusion of diverse CLIP backbones can be achieved; additionally, the pathology image classification under limited-data regimes can be substantially improved. The parameter-efficient cross-model alignment and dynamic expert fusion mechanisms are broadly applicable beyond pathology to other specialized domains. In future work, we will focus on scaling to larger model pools, applying model distillation for inference efficiency, and extending the framework to dense prediction tasks such as object detection and image segmentation.
  • QI Lin, BAO Peng, LIU Zhongyi, LI Liang
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 991-1004. https://doi.org/10.16511/j.cnki.qhdxxb.2026.21.003
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] With the widespread application of graph neural networks (GNNs) in domains such as social networks, bioinformatics, and recommender systems, complex high-order structural relationships embedded in graphs have become increasingly important factors determining model expressiveness and performance. Traditional GNN models are mostly built on pairwise edges, which limits their ability to effectively capture high-order interactions among multiple nodes. In recent years, hypergraph neural networks (HGNNs) have significantly extended the modeling capacity of GNNs by introducing hyperedges, enabling the representation of multi-way interactions and more expressive topological patterns. However, existing methods typically rely on a single hyperedge construction strategy, which restricts their ability to simultaneously capture the heterogeneity and structural diversity of different types of high-order relationships. This often leads to insufficient information fusion and limited structural generalization, rendering these methods inadequate for real-world graph data that exhibit multi-level and heterogeneous high-order semantics, such as overlapping communities or functional substructures. To address these challenges, we propose a multisource adaptive HGNN (MSA-HGNN), a unified and flexible high-order representation framework that integrates diverse structural information with structural awareness. [Methods] The proposed method introduces a multisource hyperedge construction framework that incorporates various structural types, including pairwise connectivity, k-hop neighborhoods, motif patterns, and frequent subgraphs. This design ensures comprehensive coverage of high-order associations across different semantic granularities and topological scales, thereby enhancing the completeness of structural modeling. These diverse sources complement one another and jointly reflect latent node dependencies from multiple relational perspectives. On this basis, we design a hyperedge fusion strategy that utilizes a learnable weight matrix to adaptively adjust the relative importance of different structural sources. This strategy enhances the discriminative capability of structure integration and effectively mitigates interference from semantic conflicts among heterogeneous structures. Furthermore, during the high-order information propagation stage, we incorporate an attention mechanism to assign structure-aware weights and context-sensitive relevance scores to node pairs across different high-order structures. This enhances the directionality, precision, and robustness of feature aggregation, particularly under noisy or incomplete structural conditions. [Results] We conducted extensive experiments on five benchmark datasets covering homogeneous and heterogeneous graph scenarios, including three citation networks (Cora, Citeseer, and PubMed) and two heterogeneous graphs (House and Senate). The results demonstrate that MSA-HGNN achieves the highest classification accuracy across all datasets. Specifically, on the Cora dataset, it outperforms the leading baseline, ED-HNN, by 2.6%, while maintaining comparable or superior performance on Citeseer and PubMed. On the more challenging heterogeneous datasets, MSA-HGNN improves accuracy by 12.1% and 6.7% over HyperGCN on House and Senate, respectively, demonstrating strong generalization ability. Further ablation studies confirm the necessity of each structural component and validate the effectiveness of the adaptive fusion mechanism for integrating multisource high-order structures. [Conclusions] This study presents a unified, adaptive, and structure-aware approach for modeling high-order relations in hypergraphs. By integrating multiple hyperedge construction sources with learnable fusion and attention-based propagation, MSA-HGNN effectively captures diverse structural semantics and addresses the limitations of single-source hypergraph modeling. Extensive experimental results demonstrate the model's superior robustness and generalization capability. These findings suggest that multisource structural integration is a promising direction for high-order graph representation learning and establish a solid foundation for developing more expressive and generalizable GNNs.
  • HYDRAULIC ENGINEERING
  • WU Chuandong, CHENG Yishu, YANG Dawen, TANG Lihua, CHEN Licheng, GONG Ke
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 1005-1014. https://doi.org/10.16511/j.cnki.qhdxxb.2026.27.018
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] The midstream of the Yarlung Zangbo River Basin (YZRB-M) is rich in hydropower resources, and several hydropower plants have already been constructed. Under China's 14th Five-Year Plan, the development of hydro-wind-solar (HWS) renewable energy bases that leverage the dispatch flexibility of hydropower plants has been proposed. Although HWS resources are abundant, integrating wind and solar power into the grid remains challenging due to their high intermittency, variability, and randomness. Scientifically exploiting HWS generation complementarity—defined as the ability of one energy source to compensate for the low availability of others—is an effective strategy for improving wind and solar generation. However, existing complementarity indexes often lack clear physical significance and are difficult to interpret. Therefore, developing a complementarity index with explicit physical significance, and clarifying the spatiotemporal patterns and enhancement pathways of wind-solar output complementarity based on a rigorous assessment of wind-solar resource distributions in the YZRB-M, are essential for the construction and stable operation of an HWS renewable energy base. [Methods] Accordingly, this study develops a quantitative indicator for HWS generation complementarity based on power generation curves. After evaluating the development potential of wind and solar resources, the spatiotemporal characteristics of HWS energy complementarity are assessed for Shannan City in the YZRB-M. [Results] The results show that the technological and economic development potentials for solar energy are 183.9 and 183.3 GW, respectively. The technological development potential for wind energy is 75.5 GW, and the economic development potential accounts for only 42.6% of that potential (i.e., 32.2 GW). Spatially, wind-solar energy complementarity is significantly stronger in Lhoka County and southern Nagarze County than in other regions of Shannan City, with the northwest-north zone showing the highest interregional complementarity. To build a grid-friendly clean energy base, we recommend either concurrent development of wind and solar resources in Gonggar, Qonggyai, and Qusum Counties, or prioritized standalone development in Comai County. Temporally, wind speed and solar radiation in Shannan City have declined over the past 44 years, accompanied by a decreasing trend in wind-solar complementarity. Nevertheless, hydropower effectively mitigates this decline: the interannual variability of HWS complementarity is reduced by ~36.8% compared with wind-solar complementarity alone, with the most pronounced improvement (3.4 times) occurring in summer. [Conclusions] Future long-term declines in wind speed and solar radiation in the YZRB-M may further weaken wind-solar complementarity. However, abundant hydropower resources can mitigate the volatility of wind and solar power generation, thereby enhancing the long-term stability of power generation from an HWS renewable energy base. Our findings provide a scientific basis for the planning and construction of renewable energy bases in the YZRB-M.
  • WANG Shuqiang, FAN Haoxiang, ZHOU Yuguo, HU Dongxu, GENG Ji, ZHENG Xin, ZHANG Yongxian, XU Mengzhen
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 1015-1023. https://doi.org/10.16511/j.cnki.qhdxxb.2026.27.021
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] The invasion and attachment of the golden mussel (Limnoperna fortunei) in hydropower engineering structures pose a significant threat to the safe and efficient operation of power stations. Biofouling caused by this species leads to obstruction of trash racks, blockage in cooling water pipes, corrosion of metal structures, and risks of unplanned shutdowns. However, existing studies primarily focus on biofouling in single engineering components, lacking a systemic understanding of the attachment characteristics and diffusion laws across the entire hydraulic system of large hydropower stations. Furthermore, the regulatory mechanisms of engineering scheduling, such as unit start-stop frequency and flow velocity thresholds, on biofouling remain unclear. Taking a large hydropower station in the Jinsha River Basin as the study object, this paper aims to systematically investigate the spatiotemporal distribution, growth rhythm, and reproductive patterns of Limnoperna fortunei in the complex hydraulic environment. Moreover, it seeks to quantitatively reveal the synergistic regulation mechanisms of hydrodynamic conditions, environmental factors, and engineering operations, thereby providing a scientific basis for constructing a comprehensive prevention and control strategy. [Methods] A systematic field investigation and sampling campaign were conducted during the maintenance periods of the study area in February and April 2024. Three key engineering structures prone to biofouling were selected for the study: trash racks, quick gate slots, and main transformer cooling water pipes. Belt transect sampling with 15 cm×15 cm quadrats was employed. For the trash racks, vertical sampling covered the intermittent submersion, stable submersion, and deep-water zones. For the quick gate slots, samples were taken from the bottom sedimentation zone and sidewalls at different elevations. For the cooling water pipes, sampling was performed in enclosed pipelines. The FiSAT II software suite was employed for population parameter analysis. Specifically, the Bhattacharya method was used to separate cohorts from length-frequency data to identify reproductive groups. The annual reproductive pattern was inferred based on growth rates and validated by monitoring synchronous planktonic larvae from 2024 to 2025. The von Bertalanffy growth function parameters (asymptotic length L and growth coefficient K) were estimated using the ELEFAN I module to assess growth potential. In addition, the correlation between biological data and environmental/operational variables, including water level fluctuations, flow velocity, water temperature, and the start-stop records of generating units (specifically comparing Units 6 and 7), was analyzed. [Results] The results revealed considerable spatial heterogeneity and gradient diffusion characteristics of Limnoperna fortunei attachment. 1) In terms of spatial distribution, a clear “source-sink” pattern was observed. The attachment density follows the order: cooling water pipes > trash racks > quick gate slots. In the trash racks, the density showed a vertical single-peak distribution, with the peak located 5-20 m below the minimum operating water level, shaped by the tradeoff between desiccation stress in the upper fluctuation zone and high flow shear in the deep zone. In the quick gate slots, the bottom density was markedly higher than that of the sidewalls due to gravity settlement during shutdowns. Notably, the attachment density in the gate slot of Unit 7 (66 start-stop/year) was markedly lower than that of Unit 6 (21 start-stop/year), indicating that frequent hydraulic disturbance inhibits colonization. An extreme disparity was found in the cooling water pipes; the left and right bank pipes had a density of 31 400 and 0 ind./m2, respectively, attributable to a flow velocity threshold effect. The right bank velocity (0.23-0.28 m/s) was below the critical threshold (~0.3 m/s) required for maintaining sufficient food flux, whereas the left bank velocity (0.45-0.58 m/s) was optimal. 2) The population exhibited a “parent-offspring-grandspring” structure in terms of growth and reproduction, confirming a reproductive pattern of three generations per year. Growth showed strong seasonality, accelerating during the high-temperature season (June-October) and stagnating in the low-temperature season (February-April). Spatially, the growth potential (represented by L and growth performance index ϕ') decreased along the flow path from the trash racks to the cooling water pipes. Despite their high density, the number of individuals in the cooling water pipes was considerably less, attributable to metabolic inhibition due to the consistently lower water temperature (15.5-16.0 ℃) in the deep-water intake. [Conclusions] This study confirms that the distribution and growth of Limnoperna fortunei in large hydropower stations are co-regulated by hydrodynamic conditions, physiological metabolism, and engineering scheduling. The trash rack area serves as the core colonization “source,” continuously supplying larvae to downstream “sinks,” such as quick gate slots and cooling water pipes. Two key physical regulatory mechanisms were identified: the “start-stop frequency effect,” in which frequent hydraulic disturbance and scouring effectively limit colonization, and the “flow velocity threshold effect,” where velocities below ~0.3 m/s inhibit survival due to insufficient trophic flux. Furthermore, low water temperature in deep intakes restricts somatic growth through metabolic suppression. On the basis of these findings, a comprehensive control strategy of “engineering protection as the main measure and operational optimization as auxiliary” is proposed. Recommendations include applying long-acting antifouling coatings in core colonization zones (trash racks), implementing UV/heat treatment and mechanical cleaning for closed systems (cooling water pipes), and attempting intermittent increases in unit start-stop frequency during noncritical periods to use hydraulic disturbance to suppress biofouling.
  • QI Zhiyong, CHEN Chao, XIANG Zhiqian, WANG Jinting, WU Yongheng, ZHOU Zhou
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 1024-1035. https://doi.org/10.16511/j.cnki.qhdxxb.2026.28.007
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] High-arch dams require reliable long-term monitoring to ensure safety in complex operating environments and under extreme loads. Vibration-based operational modal analysis under ambient excitation is well-suited for continuous deployment due to its passive and minimally intrusive nature. However, the vibration response under these conditions is often weak and susceptible to noise and non-stationary excitation. When covariance-driven stochastic subspace identification (SSI-COV) is applied, the stabilization diagram frequently becomes cluttered with a mixture of physical poles and spurious poles, complicating manual pole selection and diminishing the effectiveness of automated pole clustering, particularly for densely spaced modes and weakly excited higher-order modes. This study aims to enhance automated modal identification for high-arch dams by refining the clustering distance metric to better separate physical poles from spurious ones.[Methods] A classical workflow that combines SSI-COV with density-based spatial clustering of applications with noise (DBSCAN) is adopted and enhanced by redesigning the clustering distance metric. Conventional metrics typically use a weighted summation of frequency difference and mode-shape similarity, which may not fully capture the relationships between the two features and may falter when identifying densely spaced modes. In this study, a coupled distance formulation is introduced that directly integrates the modal assurance criterion (MAC) with the absolute frequency deviation and is placed in the denominator. When the mode-shape correlation between two poles is weak and MAC approaches 0, the distance increases significantly. By contrast, when the correlation is strong and MAC approaches 1, the distance reduces to the absolute frequency deviation. Consequently, pole pairs with simultaneously exhibit small frequency differences and highly consistent mode shapes are assigned minimal clustering distances, whereas those with large frequency differences or inconsistent mode shapes are pushed apart. This leads to a clearer separation of physical and spurious modes in the stabilization diagram, thus meeting requirements for automated clustering-based interpretation. A statistical analysis of clustering distances is then performed using the stabilization diagram from a high-arch dam dataset. Finally, the method is validated through two case studies. The first involves a five-degree-of-freedom numerical system excited by broadband white noise with added measurement noise; the responses are segmented into consecutive windows to test both single-window identification and continuous modal tracking. The second case utilizes multisensor field vibration data from an actual high-arch dam, including a representative short-duration record and a multiday dataset for continuous monitoring. For each case, the proposed formulation computes clustering distances, DBSCAN clusters the poles, and modal frequencies and damping ratios are extracted to evaluate clustering accuracy and the performance of automated identification.[Results] The distance-based statistical analysis reveals that the proposed metric enhances separability. Pole pairs that meet both feature-consistency conditions are clustered within a compact distance interval, whereas partially consistent or inconsistent pairs shift toward larger distances. In the numerical example, the proposed method produces physical clusters that are less prone to absorbing noise points compared to the baseline metric, leading to an approximately 31% increase in identified modal poles for weakly excited higher-order modes. In the real dam case, the baseline metric generates excessive clusters that are closely packed, making it difficult to form effective clusters with clear and interpretable boundaries. By contrast, the proposed method clearly identifies three clusters for the high-arch dam and achieves a 34% increase in recognized poles for the relatively higher-order mode during continuous identification. This suggests that the improvement is most significant for relatively higher-order modes, where the number of identified modal poles increases by approximately one-third compared to the baseline approach.[Conclusions] By integrating frequency and MAC in a division-based formulation, the proposed metric enhances the compactness of the identified clusters and enables stable distinction between physical and spurious poles, while also improving the identification of weakly excited higher-order vibration modes. This directly enhances the robustness of DBSCAN-based automated modal identification and continuous modal tracking for high-arch dams under ambient excitation. The method can be easily incorporated into existing SSI-COV workflows, as it mainly updates the distance-computation step, providing a practical solution for reliable long-term vibration-based dam monitoring.
  • DATABASE
  • ZHANG Ziqian, WANG Chaokun, FENG Hao, WU Cheng, NIU Fang
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 1036-1045. https://doi.org/10.16511/j.cnki.qhdxxb.2026.28.009
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] With the rapid growth of Internet e-commerce, recommendation systems have become key components for online platforms to provide efficient, accurate, and personalized user and product suggestions. These recommendations directly improve user experience, increase user retention, and drive sales growth. Incorporating social relationships into the traditional user-product interaction network has been widely proven to enhance recommendation quality and enable innovative scenarios such as friend and sharing-based recommendations. However, mainstream social e-commerce recommendation methods face significant limitations: they rely heavily on external social data from third-party platforms, which are often difficult to fully access due to privacy policies, platform restrictions, and data silo issues. Moreover, most existing solutions have only been tested on datasets with millions of users and struggle to scale to hundreds of millions due to high computational costs and limited user coverage—posing substantial barriers to their deployment on large-scale e-commerce platforms. To overcome these challenges, this study focuses on automatically building a user relationship network with clear real-world social meanings, using only internal behavioral data from e-commerce platforms without external social information. The main goals are to develop a scalable self-supervised method for inferring social relationships among large user bases, improve the efficiency of user relationship prediction at the scale of hundreds of millions, enrich the understanding of user preferences at the platform level, and expand the performance and application scope of e-commerce recommendation systems. [Methods] Based on the homophily principle from communication studies, the proposed framework includes four sequential and interrelated stages: pseudo-label network construction, user relationship inference, efficient candidate matching, and relationship type inference. First, two typical behavioral signals—co-purchase behavior and spatiotemporal co-occurrence—are extracted from e-commerce logs to build pseudo-label social networks that reflect family ties and geographic connections, respectively, serving as weak supervision signals. Next, a user relationship inference model based on multilayer perceptrons is designed to learn user representations from these networks; positive samples are obtained from observed pseudo-label edges, while negative samples are generated by random pairing of users, and the model is trained using binary cross-entropy loss. To address the high computational demand of examining all user pairs in billion-scale scenarios, an efficient candidate matching strategy based on multilevel clustering of user embeddings is proposed, significantly reducing the number of candidate pairs while maintaining high recall. Lastly, a multitask inference module is built to first predict whether a candidate pair has an actual social connection, then classify the relationship into five detailed types—senior-junior, spouse, neighbor, schoolmate, and colleague—using rules that combine pseudo-labels with user attributes such as age, gender, time, and location. [Results] Extensive experiments on real data from a large e-commerce platform (Company T) show that co-purchase relationship prediction achieves a precision of 71.70%, a recall of 87.44%, an accuracy of 76.49%, and an F1-score of 0.79. The multilevel clustering candidate matching strategy reduces computational load and supports stable online deployment at the scale of hundreds of millions of users. Relationship classification reaches high precision: 93.80% for spouses and 64.57% for senior-junior relations. The resulting heterogeneous social graph includes billions of edges across five relationship types, and online A/B tests confirm that incorporating social relationship information into recommendation models significantly improves accuracy, especially for category-sensitive items like medical products. [Conclusions] This research offers a practical solution for social e-commerce recommendations without relying on external social data, addressing privacy and platform restrictions. It enables the automatic construction of semantically rich social graphs using only internal behavioral data and supports large-scale applications through efficient clustering-based candidate matching. The proposed framework effectively incorporates social semantics into traditional recommendation systems, enhances user preference modeling, and boosts recommendation accuracy. This study not only demonstrates that the homophily principle applies to e-commerce behavior analysis but also provides scalable, interpretable methods for building large-scale social graphs and improving socially aware recommendations in real-world industry scenarios.
  • MATERIALS SCIENCE AND TECHNOLOGY
  • HU Xingquan, WU Yao, LI Linshu, CAI Zhipeng, LIN Jian
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 1046-1054. https://doi.org/10.16511/j.cnki.qhdxxb.2026.27.020
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] Fatigue failure remains a primary mechanism of catastrophic damage in engineering structures, necessitating highly accurate prediction of the fatigue crack growth threshold (ΔKth) for ensuring structural integrity and performing life assessment. Traditional two-parameter models, particularly the widely used Vasudevan model, are based on the fundamental assumption that the maximum stress intensity factor (Kmax) and the stress intensity factor range (ΔK) contribute independently to the crack driving force. However, this assumption often leads to significant prediction inaccuracies across varying stress ratios (R), particularly in heterogeneous materials such as dissimilar metal welded joints (DMWJs). This study aimed to minimize these inaccuracies by developing a physically grounded and modified dual driving force model. By incorporating micromechanical dislocation interactions, this research aimed to bridge microscale damage mechanisms with macroscale fracture mechanical parameters, thereby enhancing predictive precision. [Methods] Systematic fatigue threshold investigations were conducted on a DMWJ consisting of base metals A and C and weld metal B. Compact tension specimens were prepared in accordance with GB/T 6398—2017 to assess the heat-affected zones and weld metal. Testing was performed at ambient temperature (23℃) and at an elevated temperature (550℃) at stress ratios (R) of 0.1, 0.5, and 0.7. Crack length was precisely monitored using the direct current potential drop method. Analysis of the experimental data revealed a clear deviation from the classical Vasudevan “L-shaped” curve. Accordingly, a new model was developed based on crack-tip plasticity analysis. This theoretical model proposes that fatigue damage is governed not only by independent parameters but also by the synergistic interaction of forward dislocations, governed by Kmax and reverse dislocations governed by ΔK. Crack extension is initiated only when the product of forward and reverse dislocation densities reaches a critical threshold ρ*, resulting in a new hyperbolic predictive relationship. [Results] The experimental results demonstrated that the relationship between ΔKth and Kmax.th does not conform to the rigid “L-shaped” boundaries predicted by the Vasudevan model, confirming the inadequacy of this model for complex welded structures. In contrast, the proposed modified model accurately captured the continuous, nonlinear variation of the fatigue threshold over the full range of stress ratios. The model exhibited significantly improved predictive accuracy, particularly near the critical stress ratio (R*), where conventional models frequently fail. In addition, the model redefined the crack growth boundaries, indicating that certain loading conditions previously considered sufficient for crack propagation are, in fact, insufficient due to inadequate dislocation interaction. The robustness of the model was further validated using independent literature data for Ti-6Al-4V, AZ31B, and IN720 alloys, for which it consistently outperformed the original two-parameter model. [Conclusions] This study establishes a refined dual driving force model for the accurate prediction of fatigue thresholds. The results demonstrate that although crack-tip forward and reverse plasticity are governed by ΔKth and Kmax.th respectively, their effects are intrinsically coupled. Fatigue crack extension depends critically on the interaction of dislocations, requiring the product of their densities to reach a specific threshold. Compared with existing models, the proposed model provides more accurate, physically consistent predictions across a range of stress ratios.
  • AEROSPACE
  • AN Yang, WU Huzi, MU Zhan, WANG Tianshu
    Journal of Tsinghua University(Science and Technology). 2026, 66(5): 1055-1060. https://doi.org/10.16511/j.cnki.qhdxxb.2026.28.006
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    [Objective] Aerial refueling technology dramatically extends combat radius and flight endurance of fighter aircraft, serving as a crucial guarantee for success in modern warfare. The hose-drogue aerial refueling system has been widely adopted by most nations with aerial refueling capabilities. In aerial refueling operations using the hose-drogue system, the whipping phenomenon caused by excessive slack in the refueling hose is a critical factor that significantly affects both operational success and safety. Proper deployment and retraction of the hose can effectively prevent whipping. Accurate dynamic analysis of the hose during smooth towing and deployment/retraction is essential for ensuring safe and reliable aerial refueling. [Methods] The absolute nodal coordinate formulation (ANCF) is a flexible multibody dynamics modeling technique based on finite element theory and continuum mechanics principles. In ANCF, all nodal coordinates are defined in the global coordinate system, replacing the rotational coordinates typically used in conventional finite element methods with slope vectors. This approach not only yields higher accuracy in modeling flexible multibody systems but also performs well in scenarios with large deformations of flexible bodies. For modeling the fuel delivery hoses during steady towing and deployment/retraction, this study develops a dynamic model for variable-length three-dimensional beam elements using the principle of virtual work combined with ANCF. Equivalent stiffness terms and gyroscopic terms, both arising from variations in element length and potentially affecting system stability, are derived. When the element length is constant, the model can be simplified to the traditional ANCF with fixed-length elements. By applying a unified time-dependent function to set the lengths of all elements—where the undeformed lengths undergo simultaneous and identical changes—the method enables dynamic simulation of hoses at arbitrary non-zero lengths while reducing the overall degrees of freedom. Based on Green-Lagrangian strain theory, both axial and bending deformations of the hose are incorporated into the analysis. Additionally, internal damping forces are included via a damping coefficient. [Results] Through a classical benchmark problem from ANCF studies, the validity of the developed dynamic model presented in this work has been rigorously verified. The simulation results demonstrate that under three distinct scenarios-fixed, extended, and shortened element lengths-the proposed dynamic model consistently achieves precise simulation of temporal morphological evolution in flexible hoses. The constant-length element cases are widely adopted as benchmark configurations in existing research, and the proposed model achieves accuracy comparable to previously reported results. [Conclusions] This study offers an effective methodology for creating a precise dynamic model of hose deployment and retrieval during aerial refueling. It provides a high-fidelity model for hose dynamics analysis and whiplash prevention. The model enables dynamic simulation of refueling hoses with arbitrary non-zero lengths. In practical aerial refueling operations, the refueling hose is subjected to various forces including aerodynamic forces, hydraulic forces from the fluid medium, and collision forces generated during docking. Building on the established dynamic model presented in this study, future research can extend the current work by incorporating these multi-physics interactions to conduct comprehensive dynamic simulations and analytical investigations of different phases within the aerial refueling process.