PDF(18699 KB)
Cooperative perception of connected and autonomous vehicles based on point feature fine alignment
Mingfang ZHANG, Luyu CUI, Jingjing FAN, Li WANG, Ying LIU
Journal of Tsinghua University(Science and Technology) ›› 2026, Vol. 66 ›› Issue (4) : 770-782.
PDF(18699 KB)
PDF(18699 KB)
Cooperative perception of connected and autonomous vehicles based on point feature fine alignment
Objective: With the rapid development of vehicular networks (vehicle-to-everything) and autonomous driving technologies, cooperative perception has become a crucial technology to enhance the environmental perception capability of connected and autonomous vehicles (CAVs). Individual perception information is shared among CAVs, and cooperative perception can effectively expand the sensing range, reduce occlusion effects, and improve perception redundancy. However, vehicle localization errors are unavoidable in real driving scenarios due to sensor noise, environmental interference, and communication uncertainty. Localization errors often lead to spatial misalignment among point clouds from multiple vehicles, thereby reducing the performance of multivehicle cooperative perception. Mitigating the impact of localization errors on cooperative perception and improving computational efficiency for on-board deployment remain challenging problems. Methods: To address the above issues, this paper proposed a cooperative perception method of CAVs based on point feature fine alignment. First, a lightweight point feature extraction module was designed using PointConvFormer to process point cloud data collected by individual vehicles. By integrating PointConvFormer layers into bottleneck residual blocks, the proposed feature extraction module preserves the three-dimensional spatial structure of the point cloud while capturing local geometric features and global contextual information. Second, the cross-vehicle point feature hierarchical fine alignment module was designed to address spatial misalignment in cross-vehicle data fusion. This module used the global poses of multiple CAVs, collected from positioning systems, to achieve coarse alignment of point features between the surrounding CAVs and the ego-vehicle. The fine-grained alignment strategy was further implemented using local overlapping point-cloud registration to improve the spatial feature consistency of the aggregated point cloud, and the point feature similarity within overlapping regions was exploited to maximize cross-vehicle feature correspondence and alleviate feature alignment deviation caused by localization errors. Furthermore, the multiscale feature fusion module was built to integrate local fine-grained features with global contextual information; it employed multiscale mask sampling to retain the structural information of the aligned aggregated point cloud at various spatial resolutions. Results: Extensive experiments and ablation studies were conducted on V2V4real and V2XSet datasets to comprehensively evaluate the performance of the proposed method. The experimental results demonstrated that the proposed approach achieved superior perception accuracy and robustness compared to other state-of-the-art methods across traffic scenarios with varying levels of localization errors. Moreover, the proposed method maintained high computational efficiency and satisfied the real-time requirements of on-board deployment. Conclusions: The proposed cooperative perception method, based on point feature fine alignment, integrates a lightweight point feature extraction module, a cross-vehicle point feature fine alignment module, and a multiscale feature fusion module. It effectively addressed the perception performance degradation problem caused by vehicle localization errors and improved the accuracy and robustness of cooperative perception among CAVs. In future work, we will enhance the collaborative perception performance of CAVs in complex scenarios, such as rain and fog, by integrating information from multimodal sensors, including cameras and millimeter-wave radars.
connected autonomous vehicle / collaborative perception / point feature / feature alignment / feature fusion
| 1 |
张新钰, 卢毅果, 高鑫, 等. 面向智能网联汽车的车路协同感知技术及发展趋势[J]. 自动化学报, 2025, 51 (2): 233- 248.
|
| 2 |
|
| 3 |
田野, 裴华鑫, 晏松, 等. 车路协同环境下行车风险场模型的扩展与应用[J]. 清华大学学报(自然科学版), 2022, 62 (3): 447- 457.
|
| 4 |
张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58 (4): 438- 444.
|
| 5 |
CHEN Q, TANG S H, YANG Q, et al. Cooper: Cooperative perception for connected autonomous vehicles based on 3D point clouds[C]//Proceedings of the IEEE 39th International Conference on Distributed Computing Systems. Dallas, USA: IEEE, 2019: 514-524.
|
| 6 |
王秉路, 靳杨, 张磊, 等. 基于多传感器融合的协同感知方法[J]. 雷达学报, 2024, 13 (1): 87- 96.
|
| 7 |
|
| 8 |
SHI S Y, CUI J H, JIANG Z H, et al. VIPS: Real-time perception fusion for infrastructure-assisted autonomous driving[C]//Proceedings of the 28th Annual International Conference on Mobile Computing and Networking. Sydney, Australia: ACM, 2022: 133-146.
|
| 9 |
HURL B, COHEN R, CZARNECKI K, et al. TruPercept: Trust modelling for autonomous vehicle cooperative perception from synthetic data[C]//Proceedings of the 2020 IEEE Intelligent Vehicles Symposium. Las Vegas, USA: IEEE, 2020: 341-347.
|
| 10 |
RAUCH A, KLANNER F, RASSHOFER R, et al. Car2X-based perception in a high-level fusion architecture for cooperative perception systems[C]//Proceedings of the 2012 IEEE Intelligent Vehicles Symposium. Madrid, Spain: IEEE, 2012: 270-275.
|
| 11 |
YU H B, LUO Y Z, SHU M, et al. DAIR-V2X: A large-scale dataset for vehicle-infrastructure cooperative 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 21329-21338.
|
| 12 |
CHEN Q, MA X, TANG S H, et al. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3D point clouds[C]//Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. Arlington, Virginia: ACM, 2019: 88-100.
|
| 13 |
XU R S, XIANG H, XIA X, et al. OPV2V: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication[C]//Proceedings of the 2022 International Conference on Robotics and Automation. Philadelphia, USA: IEEE, 2022: 2583-2589.
|
| 14 |
LI Y M, REN S L, WU P X, et al. Learning distilled collaboration graph for multi-agent perception[C]//NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2021: 2261.
|
| 15 |
卢敏, 胡振宇. 通信延迟下车辆协同感知的3D目标检测方法[J]. 计算机工程与应用, 2025, 61 (7): 278- 287.
|
| 16 |
毛瑞清, 贾宇宽, 孙宇璇, 等. 定位与通信受限的网联协同感知算法[J]. 模式识别与人工智能, 2023, 36 (11): 1019- 1028.
|
| 17 |
XIANG H, XU R S, MA J Q. HM-ViT: Hetero-modal vehicle-to-vehicle cooperative perception with vision transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 2023: 284-295.
|
| 18 |
SU S B, LI Y M, HE S H, et al. Uncertainty quantification of collaborative detection for self-driving[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation. London, UK: IEEE, 2023: 5588-5594.
|
| 19 |
ZHONG J R, WANG J H, XU J H, et al. CoopTrack: Exploring end-to-end learning for efficient cooperative sequential perception[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Hawaii: USA: IEEE, 2025: 26954-26965.
|
| 20 |
SONG Z Y, WEN F X, ZHANG H L, et al. A cooperative perception system robust to localization errors[C]//Proceedings of the 2023 IEEE Intelligent Vehicles Symposium. Alaska, USA: IEEE, 2023: 1-6.
|
| 21 |
LU Y F, LI Q H, LIU B A, et al. Robust collaborative 3D object detection in presence of pose errors[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation. London, UK: IEEE, 2022: 4812-4818.
|
| 22 |
VADIVELU N, REN M Y, TU J, et al. Learning to communicate and correct pose errors[C]//Proceedings of the 4th Conference on Robot Learning. Cambridge, USA: PMLR, 2021: 1195-1210.
|
| 23 |
XU R S, XIANG H, TU Z Z, et al. V2X-ViT: Vehicle-to-everything cooperative perception with vision transformer[C]//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022: 107-124.
|
| 24 |
CUI J X, QIU H, CHEN D, et al. Coopernaut: End-to-end driving with cooperative perception for networked vehicles[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 17231-17241.
|
| 25 |
|
| 26 |
ZHAO H S, JIANG L, JIA J Y, et al. Point transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 16259-16268.
|
| 27 |
WU W X, FUXIN L, SHAN Q. PointConvFormer: Revenge of the point-based convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 21802-21813.
|
| 28 |
PETERS B, NICULAE V, MARTINS A F T. Sparse sequence-to-sequence models[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: ACL, 2019: 1504-1519.
|
| 29 |
STOYANOV K. An algorithm for the construction of improved training sets for neural networks[C]//Proceedings of the 2024 IEEE 12th International Conference on Intelligent Systems. Varna, Bulgaria: IEEE, 2024: 1-10.
|
| 30 |
QI CHARLES R, SU H, MO K, et al. PointNet: Deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 77-85.
|
| 31 |
|
| 32 |
KIM J H, JUN J, ZHANG B T. Bilinear attention networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc., 2018: 1571-1581.
|
| 33 |
XU R S, XIA X, LI J L, et al. V2V4Real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 13712-13722.
|
| 34 |
LIU Z J, TANG H T, LIN Y J, et al. Point-voxel CNN for efficient 3D deep learning[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019: 87.
|
| 35 |
QI CHARLES R, YI L, SU H, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing System. Long Beach, USA: Curran Associates Inc., 2017: 5105-5114.
|
| 36 |
|
| 37 |
HUANG Z, WANG S, WANG Y C, et al. RoCo: Robust cooperative perception by iterative object matching and pose adjustment[C]//Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne, Australia: ACM, 2024: 7833-7842.
|
/
| 〈 |
|
〉 |