Cooperative perception of connected and autonomous vehicles based on point feature fine alignment

Mingfang ZHANG; Luyu CUI; Jingjing FAN; Li WANG; Ying LIU

doi:10.16511/j.cnki.qhdxxb.2026.27.016

PDF(18699 KB)

Journal of Tsinghua University(Science and Technology) ›› 2026, Vol. 66 ›› Issue (4) : 770-782. DOI: 10.16511/j.cnki.qhdxxb.2026.27.016

Vehicle and Traffic

Cooperative perception of connected and autonomous vehicles based on point feature fine alignment

Author information +

History +

Abstract

Objective: With the rapid development of vehicular networks (vehicle-to-everything) and autonomous driving technologies, cooperative perception has become a crucial technology to enhance the environmental perception capability of connected and autonomous vehicles (CAVs). Individual perception information is shared among CAVs, and cooperative perception can effectively expand the sensing range, reduce occlusion effects, and improve perception redundancy. However, vehicle localization errors are unavoidable in real driving scenarios due to sensor noise, environmental interference, and communication uncertainty. Localization errors often lead to spatial misalignment among point clouds from multiple vehicles, thereby reducing the performance of multivehicle cooperative perception. Mitigating the impact of localization errors on cooperative perception and improving computational efficiency for on-board deployment remain challenging problems. Methods: To address the above issues, this paper proposed a cooperative perception method of CAVs based on point feature fine alignment. First, a lightweight point feature extraction module was designed using PointConvFormer to process point cloud data collected by individual vehicles. By integrating PointConvFormer layers into bottleneck residual blocks, the proposed feature extraction module preserves the three-dimensional spatial structure of the point cloud while capturing local geometric features and global contextual information. Second, the cross-vehicle point feature hierarchical fine alignment module was designed to address spatial misalignment in cross-vehicle data fusion. This module used the global poses of multiple CAVs, collected from positioning systems, to achieve coarse alignment of point features between the surrounding CAVs and the ego-vehicle. The fine-grained alignment strategy was further implemented using local overlapping point-cloud registration to improve the spatial feature consistency of the aggregated point cloud, and the point feature similarity within overlapping regions was exploited to maximize cross-vehicle feature correspondence and alleviate feature alignment deviation caused by localization errors. Furthermore, the multiscale feature fusion module was built to integrate local fine-grained features with global contextual information; it employed multiscale mask sampling to retain the structural information of the aligned aggregated point cloud at various spatial resolutions. Results: Extensive experiments and ablation studies were conducted on V2V4real and V2XSet datasets to comprehensively evaluate the performance of the proposed method. The experimental results demonstrated that the proposed approach achieved superior perception accuracy and robustness compared to other state-of-the-art methods across traffic scenarios with varying levels of localization errors. Moreover, the proposed method maintained high computational efficiency and satisfied the real-time requirements of on-board deployment. Conclusions: The proposed cooperative perception method, based on point feature fine alignment, integrates a lightweight point feature extraction module, a cross-vehicle point feature fine alignment module, and a multiscale feature fusion module. It effectively addressed the perception performance degradation problem caused by vehicle localization errors and improved the accuracy and robustness of cooperative perception among CAVs. In future work, we will enhance the collaborative perception performance of CAVs in complex scenarios, such as rain and fog, by integrating information from multimodal sensors, including cameras and millimeter-wave radars.

Key words

connected autonomous vehicle / collaborative perception / point feature / feature alignment / feature fusion

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

Mingfang ZHANG , Luyu CUI , Jingjing FAN , et al . Cooperative perception of connected and autonomous vehicles based on point feature fine alignment[J]. Journal of Tsinghua University(Science and Technology). 2026, 66(4): 770-782 https://doi.org/10.16511/j.cnki.qhdxxb.2026.27.016

References

List( Publishing order | Descend order by publishing year | Descend order by cited within ) Chart analysis

1	张新钰, 卢毅果, 高鑫, 等. 面向智能网联汽车的车路协同感知技术及发展趋势[J]. 自动化学报, 2025, 51 (2): 233- 248. ZHANG X Y , LU Y G , GAO X , et al. Vehicle-road collaborative perception technology and development trend for intelligent connected vehicles[J]. Acta Automatica Sinica, 2025, 51 (2): 233- 248. Cited in this article [1]

2	HAN Y S , ZHANG H , LI H F , et al. Collaborative perception in autonomous driving: Methods, datasets, and challenges[J]. IEEE Intelligent Transportation Systems Magazine, 2023, 15 (6): 131- 151. https://doi.org/10.1109/MITS.2023.3298534 Cited in this article [1]

田野, 裴华鑫, 晏松, 等. 车路协同环境下行车风险场模型的扩展与应用[J]. 清华大学学报(自然科学版), 2022, 62 (3): 447- 457.

https://doi.org/10.16511/j.cnki.qhdxxb.2021.22.034

TIAN

, PEI

H X

, YAN

, et al. Extended driving risk field model for i-VICS and its application[J]. Journal of Tsinghua University(Science & Technology), 2022, 62 (3): 447- 457.

https://doi.org/10.16511/j.cnki.qhdxxb.2021.22.034

Cited in this article [1]

张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58 (4): 438- 444.

https://doi.org/10.16511/j.cnki.qhdxxb.2018.21.010

ZHANG

X Y

, GAO

H B

, ZHAO

J H

, et al. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University (Science & Technology), 2018, 58 (4): 438- 444.

https://doi.org/10.16511/j.cnki.qhdxxb.2018.21.010

Cited in this article [1]

5	CHEN Q, TANG S H, YANG Q, et al. Cooper: Cooperative perception for connected autonomous vehicles based on 3D point clouds[C]//Proceedings of the IEEE 39th International Conference on Distributed Computing Systems. Dallas, USA: IEEE, 2019: 514-524. Cited in this article [2]

6	王秉路, 靳杨, 张磊, 等. 基于多传感器融合的协同感知方法[J]. 雷达学报, 2024, 13 (1): 87- 96. WANG B L , JIN Y , ZHANG L , et al. Collaborative perception method based on multisensor fusion[J]. Journal of Radars, 2024, 13 (1): 87- 96.

7	ARNOLD E , DIANATI M , DE TEMPLE R , et al. Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (3): 1852- 1864. https://doi.org/10.1109/TITS.2020.3028424 Cited in this article [1]

8	SHI S Y, CUI J H, JIANG Z H, et al. VIPS: Real-time perception fusion for infrastructure-assisted autonomous driving[C]//Proceedings of the 28th Annual International Conference on Mobile Computing and Networking. Sydney, Australia: ACM, 2022: 133-146. Cited in this article [2]

9	HURL B, COHEN R, CZARNECKI K, et al. TruPercept: Trust modelling for autonomous vehicle cooperative perception from synthetic data[C]//Proceedings of the 2020 IEEE Intelligent Vehicles Symposium. Las Vegas, USA: IEEE, 2020: 341-347. Cited in this article [1]

10	RAUCH A, KLANNER F, RASSHOFER R, et al. Car2X-based perception in a high-level fusion architecture for cooperative perception systems[C]//Proceedings of the 2012 IEEE Intelligent Vehicles Symposium. Madrid, Spain: IEEE, 2012: 270-275.

11	YU H B, LUO Y Z, SHU M, et al. DAIR-V2X: A large-scale dataset for vehicle-infrastructure cooperative 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 21329-21338. Cited in this article [1]

12	CHEN Q, MA X, TANG S H, et al. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3D point clouds[C]//Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. Arlington, Virginia: ACM, 2019: 88-100. Cited in this article [2]

13	XU R S, XIANG H, XIA X, et al. OPV2V: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication[C]//Proceedings of the 2022 International Conference on Robotics and Automation. Philadelphia, USA: IEEE, 2022: 2583-2589. Cited in this article [1]

14	LI Y M, REN S L, WU P X, et al. Learning distilled collaboration graph for multi-agent perception[C]//NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2021: 2261. Cited in this article [2]

15	卢敏, 胡振宇. 通信延迟下车辆协同感知的3D目标检测方法[J]. 计算机工程与应用, 2025, 61 (7): 278- 287. LU M , HU Z Y . 3D object detection method for cooperative vehicle sensing under communication delay[J]. Computer Engineering and Applications, 2025, 61 (7): 278- 287.

16	毛瑞清, 贾宇宽, 孙宇璇, 等. 定位与通信受限的网联协同感知算法[J]. 模式识别与人工智能, 2023, 36 (11): 1019- 1028. MAO R Q , JIA Y K , SUN Y X , et al. V2X-enabled cooperative perception with localization and communication constraints[J]. Pattern Recognition and Artificial Intelligence, 2023, 36 (11): 1019- 1028.

17	XIANG H, XU R S, MA J Q. HM-ViT: Hetero-modal vehicle-to-vehicle cooperative perception with vision transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 2023: 284-295.

18	SU S B, LI Y M, HE S H, et al. Uncertainty quantification of collaborative detection for self-driving[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation. London, UK: IEEE, 2023: 5588-5594. Cited in this article [2]

19	ZHONG J R, WANG J H, XU J H, et al. CoopTrack: Exploring end-to-end learning for efficient cooperative sequential perception[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Hawaii: USA: IEEE, 2025: 26954-26965. Cited in this article [1]

20	SONG Z Y, WEN F X, ZHANG H L, et al. A cooperative perception system robust to localization errors[C]//Proceedings of the 2023 IEEE Intelligent Vehicles Symposium. Alaska, USA: IEEE, 2023: 1-6. Cited in this article [3]

21	LU Y F, LI Q H, LIU B A, et al. Robust collaborative 3D object detection in presence of pose errors[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation. London, UK: IEEE, 2022: 4812-4818. Cited in this article [4]

22	VADIVELU N, REN M Y, TU J, et al. Learning to communicate and correct pose errors[C]//Proceedings of the 4th Conference on Robot Learning. Cambridge, USA: PMLR, 2021: 1195-1210. Cited in this article [2]

23	XU R S, XIANG H, TU Z Z, et al. V2X-ViT: Vehicle-to-everything cooperative perception with vision transformer[C]//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022: 107-124. Cited in this article [2]

24	CUI J X, QIU H, CHEN D, et al. Coopernaut: End-to-end driving with cooperative perception for networked vehicles[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 17231-17241.

25	YUAN Y S , CHENG H , SESTER M . Keypoints-based deep feature fusion for cooperative vehicle detection of autonomous driving[J]. IEEE Robotics and Automation Letters, 2022, 7 (2): 3054- 3061. https://doi.org/10.1109/LRA.2022.3143299 Cited in this article [1]

26	ZHAO H S, JIANG L, JIA J Y, et al. Point transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 16259-16268. Cited in this article [2]

27	WU W X, FUXIN L, SHAN Q. PointConvFormer: Revenge of the point-based convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 21802-21813. Cited in this article [2]

28	PETERS B, NICULAE V, MARTINS A F T. Sparse sequence-to-sequence models[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: ACL, 2019: 1504-1519. Cited in this article [1]

29	STOYANOV K. An algorithm for the construction of improved training sets for neural networks[C]//Proceedings of the 2024 IEEE 12th International Conference on Intelligent Systems. Varna, Bulgaria: IEEE, 2024: 1-10. Cited in this article [1]

30	QI CHARLES R, SU H, MO K, et al. PointNet: Deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 77-85. Cited in this article [1]

31	WU J W , SUN M J , JIANG C R , et al. Context-based local-global fusion network for 3D point cloud classification and segmentation[J]. Expert Systems with Applications, 2024, 251, 124023. https://doi.org/10.1016/j.eswa.2024.124023 Cited in this article [1]

32	KIM J H, JUN J, ZHANG B T. Bilinear attention networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc., 2018: 1571-1581. Cited in this article [1]

33	XU R S, XIA X, LI J L, et al. V2V4Real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023: 13712-13722. Cited in this article [1]

34	LIU Z J, TANG H T, LIN Y J, et al. Point-voxel CNN for efficient 3D deep learning[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019: 87. Cited in this article [2]

35	QI CHARLES R, YI L, SU H, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing System. Long Beach, USA: Curran Associates Inc., 2017: 5105-5114. Cited in this article [2]

36	YAN Y , MAO Y X , LI B . SECOND: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18 (10): 3337. https://doi.org/10.3390/s18103337 Cited in this article [2]

37	HUANG Z, WANG S, WANG Y C, et al. RoCo: Robust cooperative perception by iterative object matching and pose adjustment[C]//Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne, Australia: ACM, 2024: 7833-7842. Cited in this article [3]