COMPUTER SCIENCE AND TECHNOLOGY |
|
|
|
|
|
Multi-dimensional Hash table structure for massive data applications |
WU Quanyuan, PENG Can, ZHENG Yi, BU Junli |
School of Computer, National University of Defense Technology, Changsha 410073, China |
|
|
Abstract Traditional Hash table can quickly locate the target by calculating the Hash value of the target data to enable fast data access and retrieval. Good storage performance requires that the Hash table maintains a loose state by sacrificing 10%-25% of the space. This is a tremendous waste of space in massive data storage systems. This paper presents a multi-dimensional Hash table structure that by increasing the logical dimension of the Hash table to significantly reduce the collision rate in the Hash table for satisfactory performance with a high filling rate. Tests show that with ten million entries, the collision rate of a two-dimensional Hash table is 2-4 orders of magnitude lower than a traditional Hash table and the overall performance is improved by 1 order of magnitude. In addition, a failure rate concept is proposed to improve Hash table performance evaluations.
|
Keywords
multi-dimension
Hash table
massive data storage
failure rate
|
|
Issue Date: 15 June 2017
|
|
|
[1] |
Zwolenski M, Weatherill L. The digital universe rich data and the increasing value of the Internet of things[J]. Australian Journal of Telecommunications & the Digital Economy, 2014, 2(3):471-479.
url: http://dx.doi.org/alian Journal of Telecommunications
|
[2] |
DeCandia G, Hastorun D, Jampani M, et al. Dynamo:Amazon's highly available key-value store[J]. ACM SIGOPS Operating Systems Review, 2007, 41(6):205-220.
|
[3] |
Chang F, Dean J, Ghemawat S, et al. Bigtable:A distributed storage system for structured data[J]. ACM Transactions on Computer Systems (TOCS), 2008, 26(2):4-4.
|
[4] |
Lakshman A, Malik P. Cassandra:A decentralized structured storage system[J]. ACM SIGOPS Operating Systems Review, 2010, 44(2):35-40.
|
[5] |
Henke C, Schmoll C, Zseby T. Empirical evaluation of Hash functions for multipoint measurements[J]. ACM SIGCOMM Computer Communication Review, 2008, 38(3):39-50.
|
[6] |
Christian H, Carsten S, Tanja Z. Empirical evaluation of Hash functions for packetID generation in sampled multipoint measurements[C]//Passive and Active Network Measurement, International Conference 2009. Seoul, Korea:Proceedings, 2009:197-206.
|
[7] |
Rivest R L. The MD5 message-digest algorithm[J]. Network Working Group IETF, 1992, 473(10):303-311.
|
[8] |
Eastlake Rd D, Jones P. US Secure Hash Algorithm 1(SHA1)[M]. Los Argeles County:RFC Editor, 2001.
|
[9] |
Peterson W W, Brown D T. Cyclic codes for error detection[J]. Proceedings of the IRE, 1961, 49(1):228-235.
|
[10] |
Enbody R J, Du H C. Dynamic hashing schemes[J]. ACM Computing Surveys (CSUR), 1988, 20(2):850-113.
|
[11] |
Larson P A. Dynamic hashing[J]. BIT Numerical Mathematics, 1978, 18(2):184-201.
|
[12] |
Singh B, Yadav I, Agarwal S, et al. An efficient word searching algorithm through splitting and hashing the offline text[C]//Advances in Recent Technologies in Communication and Computing. Kottayam, India:IEEE Press, 2009:387-389.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|