1

GOAL

The goal is to use a chained hash table, with a hashing function h that satisfies the Simple Uniform Hashing Assumption, to store some numbers (the same number, if inserted multiple times, is stored multiple times in the linked list associated with h(n)). We have 3 functions to implement: one that inserts a new number n and counts the number of occurrences of the number, one that deletes a number and counts the number of occurrences of the number and one that just counts the number of occurrences of the number.

NOTES

We have to use pseudocode (similar to python) and can use method ".head" that returns the head of the linked list and the functions LIST_PREPEND, LIST_SEARCH that prepend an item (given as a pointer) and search a key (returning the associated pointer) in the linked list. We can also use CHAINED_HASH_DELETE which deletes an item (given as a pointer) from the linked list associated to it. Note that the variable x always denotes a pointer to an item in one of the linked lists of the hash table, n is always an integer, which we use as key, and M is the chained hash table. The syntax and ""programming"" really don't matter, what matters is the algorithm.

MY CODE

COUNT(M, n): 
    count = 0
    x = M[h(n)].head
    if x == NIL: #the linked list is empty
        return 0
    else:
        while x != NIL: #search the list until its end
            if x.key == n:
                count = count + 1
            x = x.next
        return count

DELETE(M, n)
    x = LIST_SEARCH(M[h(n)], n)
    CHAINED_HASH_DELETE(M, x)
    COUNT(M, n)

INSERT(M, n):
    <let x be a pointer to n>
    LIST_PREPEND(M[h(n)], x)
    COUNT(M, n)

MY QUESTION

  1. The idea behind my code is probably a little bit inefficient because every time I have to re-count the number of occurrences. But, the average complexity of all the functions (COUNT, INSERT, DELETE) the should anyway be O(1+a) where a is the load factor, because LIST_SEARCH and COUNT have complexity O(1+a). Is the average complexity correct? Are the algorithms themselves correct?

  2. Another solution was to use an attribute .counter on the first item of the linked list in M[h(n)] and add 1 or subtract 1 each time I insert or delete an item. Since the resulting complexities are also O(1+a) because I have to find the first item equal to n, it shouldn't matter a lot in a theoretical scenario, correct?

1 Answer 1

3

Hashing is inherently deterministic. However, we can use probabilistic analysis to understand the average-case performance of hash table operations under the assumption of a uniform distribution of keys.

Denoting the load factor by α=N/M where N is the number of entries in the hash table, M is the number of buckets. Under SUHA, each key is equally likely to hash to any bucket. Therefore, we expect the keys to be evenly distributed across the buckets. On average, there will be α entries per bucket, thus leading to average time complexity of lookups, insertion and deletion being O(1 + α). You might find this post, Simple Uniform Hashing Assumption and Worst-Case Complexity for Hash Tables insightful, which provides different takes on SUHA.

However, this result implies that if the size M of the hash table is at least proportional to the number N of elements in the table, we have N = O(M). Hence, α = N/M = O(M)/M = O(1), and a lookup takes O(1) on average. For O(1) performance on lookups, α should therefore be bounded above by a small constant, preferably below 1.

If α grows beyond an acceptable limit, we may choose to rehash the hash table. This involves enlarging the size of the table and re-inserting the existing entries into the new, larger table. Since M becomes larger relative to N, the load factor α decreases accordingly.

For your second question, you're able to keep the cost of your count function to O(1), so do just that. By using a .counter attribute on the first item of the linked list, you can add 1 or subtract 1 each time you insert or delete an item.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks! the only missing thing is COUNT asymptotic analysis: is it correct to say that its complexity is O(1+$\alpha$)?
By the way, I think in any case the complexity of COUNT remains O(1+α) because you can’t just keep the .counter attribute at the beginning of the list: in fact you might have multiple integers assigned to that linked list (collisions), and thus the best way is to go through the list until you find the first integer identical to the one you’re searching, and keep on it the .counter, but this still yields to O(1+α), correct?
@selenio34 I would suggest simply increment on a collision and decrement on removal (per bucket). These are constant time operations. Your COUNT operation can then just index into the array and get the return the count attribute. So COUNT would be O(1).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.