Quadratic probing is often recommended as an alternative to linear probing because it incurs less clustering.[1] Quadratic probing exhibits better locality of reference than many other hash table such as chaining; however, for queries, quadratic probing does not have as good locality as linear probing, causing the latter to be faster in some settings.[2]
History
Quadratic probing was first introduced by Ward Douglas Maurer in 1968.[3] Several subsequent variations of the data structure were proposed in the 1970s in order to guarantee that the probe sequence hits every slot without cycling prematurely.[4][5] Quadratic probing is widely believed to avoid the clustering effects that make linear probing slow at high load factors. It serves as the basis for many widely-used high-performance hash-tables[6][7][8], including Google's open-source Abseil[9] hash table.[10]
It is conjectured[3][11] that quadratic probing, when filled to full, supports insertions in expected time . Proving this, or even proving any non-trivial time bound for quadratic probing remains open.[11] The closest result, due to Kuszmaul and Xi, shows that, at load factors of less than , insertions take expected time.[11]
Quadratic function
Let h(k) be a hash function that maps an element k to an integer in [0, m−1], where m is the size of the table. Let the ith probe position for a value k be given by the function
where c2 ≠ 0 (If c2 = 0, then h(k,i) degrades to a linear probe). For a given hash table, the values of c1 and c2 remain constant.
Examples:
If , then the probe sequence will be
For m = 2n, a good choice for the constants are c1 = c2 = 1/2, as the values of h(k,i) for i in [0, m−1] are all distinct (in fact, it is a permutation on [0, m−1][12]). This leads to a probe sequence of (the triangular numbers) where the values increase by 1, 2, 3, ...
For prime m > 2, most choices of c1 and c2 will make h(k,i) distinct for i in [0, (m−1)/2]. Such choices include c1 = c2 = 1/2, c1 = c2 = 1, and c1 = 0, c2 = 1. However, there are only m/2 distinct probes for a given element, requiring other techniques to guarantee that insertions will succeed when the load factor exceeds 1/2.
For , where m, n, and p are integer greater or equal 2 (degrades to linear probe when p = 1), then gives cycle of all distinct probes. It can be computed in loop as: , and
For any m, full cycle with quadratic probing can be achieved by rounding up m to closest power of 2, compute probe index: , and skip iteration when . There is maximum skipped iterations, and these iterations do not refer to memory, so it is fast operation on most modern processors. Rounding up m can be computed by:
If the sign of the offset is alternated (e.g. +1, −4, +9, −16, etc.), and if the number of buckets is a prime number congruent to 3 modulo 4 (e.g. 3, 7, 11, 19, 23, 31, etc.), then the first offsets will be unique (modulo ).[further explanation needed] In other words, a permutation of 0 through is obtained, and, consequently, a free bucket will always be found as long as at least one exists.
References
^Cormen, Thomas H.; Leiserson, Charles Eric; Rivest, Ronald Linn; Stein, Clifford (2009). Introduction to algorithms (3rd ed.). Cambridge, Massachusetts London, England: MIT Press. ISBN978-0-262-53305-8.
^ abcKuszmaul, William; Xi, Zoe (2024). Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola (eds.). "Towards an Analysis of Quadratic Probing". 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs). 297. Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik: 103:1–103:19. doi:10.4230/LIPIcs.ICALP.2024.103. ISBN978-3-95977-322-5.
^The Art of Computer Science Volume 3 Sorting and Searching, Chapter 6.4, exercise 20, Donald Knuth