2-D Hilbert encoding and decoding algorithms on skewed data
JIA Lianyin1,3, KONG Ming1, WANG Weichen1, LI Mengjuan2, YOU Jinguo1,3, DING Jiaman1,3
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; 2. Library, Yunnan Normal University, Kunming 650500, China; 3. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China
摘要高效的Hilbert曲线的编解码算法作为Hilbert曲线应用的基础,具有重要的研究意义。现有算法多未考虑数据偏斜分布的影响,因此在数据偏斜分布时效率较低。该文发现:对于特定的前m阶坐标,其对应的前m阶编码值与其第1阶编码值呈现特定的倍数关系;对于特定的前m阶编码值,其对应的前m阶坐标与其第1阶坐标呈现特定的倍数关系。基于这一发现,在融合高效位操作、快速置位检测等技术的基础上,提出了跳过前m阶的编码(skipping the first m orders Hilbert encoding,SFO-HE)算法和跳过前m阶的解码(skipping the first m orders Hilbert decoding,SFO-HD)算法。这2个算法无需对前m阶逐阶编解码,可有效提高数据向Hilbert空间4个顶点偏斜时的编解码效率。扩展实验表明:该文算法对数据偏斜分布具有更好的适应性,在特定偏斜分布时效率大幅优于现有算法。
Abstract:Hilbert encoding and decoding are fundamental steps in many Hilbert curve based applications. However, existing algorithms are not very efficient when the data distribution is skewed. This paper shows that for a coordinate with the specific first m orders, the code of the first m orders is a multiple of its corresponding first order code. For a code with the specific first m orders, the coordinate of the first m orders is a multiple of its corresponding first order coordinate. These findings were used to develop an algorithm that skips the first m orders of the Hilbert encoding (SFO-HE) and another algorithm that skips the first m orders of the Hilbert decoding (SFO-HD). These algorithms exploit efficient bit operations and fast bit set detections to improve the encoding and decoding efficiencies for data skewed to the 4 corners of the Hilbert space. Extensive tests show that these two algorithms have good skewness adaptability and outperform existing algorithms on specific skewed data.
[1] SINGH H, BAWA S. A survey of traditional and mapreducebased spatial query processing approaches[J]. ACM SIGMOD Record, 2017, 46(2): 18-29. [2] QI J Z, LIU G L, JENSEN C S, et al. Effectively learning spatial indices[J]. Proceedings of the VLDB Endowment, 2020, 13(12): 2341-2354. [3] SCHRACK G, STOCCO L. Generation of spatial orders and space-filling curves[J]. IEEE Transactions on Image Processing, 2015, 24(6): 1791-1800. [4] ZHOU L, JOHNSON C R, WEISKOPF D. Data-driven space-filling curves[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(2): 1591-1600. [5] LU J, ZHAO Y H, TAN K L, et al. Distributed density peaks cluste6ring revisited[C]//2021 IEEE 37th International Conference on Data Engineering. Chania, Greece: IEEE Press, 2021: 2352-2353. [6] MORTON G M. A computer oriented geodetic data base and a new technique in file sequencing[J]. Physics of Plasmas, 2015, 24(7): 159-173. [7] HILBERT D. Ueber die stetige abbildung einer line auf ein flächenstück[J]. Mathematische Annalen, 1970, 38(3): 459-460. [8] BUTZ A R. Alternative algorithm for Hilbert's space-filling curve[J]. IEEE Transactions on Computers, 1971, 100(4): 424-426. [9] BURKARDT J. Hilbert-curve convert between 1D and 2D coordinates of Hilbert curve [R/OL]. (2015-12-23)[2021-08-02].http://people.math.sc.edu/Burkardt/c_src/hilbert_curve/hilbert_curve.html. [10] MOORE D. Fast Hilbert curve generation, sorting, and range queries [R/OL]. (2016-02-05)[2021-08-02].https://github.com/Cheedoong/hilbert. [11] FISHER A J. A new algorithm for generating Hilbert curves[J]. Software: Practice and Experience, 1986, 16(1): 5-12. [12] 李绍俊, 钟耳顺, 王少华, 等. 基于状态转移矩阵的Hilbert码快速生成算法[J]. 地球信息科学学报, 2014, 16(6): 846-851. LI S J, ZHONG E S, WANG S H, et al. An algorithm for Hilbert ordering code based on state-transition matrix[J]. Journal of Geo-information Science, 2014, 16(6): 846-851.(in Chinese) [13] XCTORRES. Hilbert spatial index [R/OL].(2017-03-30)[2021-08-02].https://github.com/xcTorres/hilbert_spatial_index. [14] 贾连印, 陈明鲜, 李孟娟, 等. 基于状态视图的高效Hilbert编码和解码算法[J]. 电子与信息学报, 2020, 42(6): 1494-501. JIA L Y, CHEN M X, LI M J, et al. State view based efficient Hilbert encoding and decoding algorithms[J]. Journal of Electronics & Information Technology, 2020, 42(6): 1494-1501. (in Chinese) [15] ROSETTA. Find first and last set bit of a long integer [R/OL].(2021-06-10). [2021-08-02].http://rosettacode.org/wiki/Find_first_and_last_set_bit_of_a_long_integer. [16] ZHENG Y. T-Drive trajectory data sample [R/OL].(2011-08-15)[2021-08-02].https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/.