CS3240: Data Structures and Algorithms

Hash Function

Output = Hash(key)

There a many,many hash functions proposed.

 

How well any one hash function will work (distributing evenly) keys will be directly related to the distribution of the key values themselves.

 

For this reason, hash functions are proposed as related to the distribution of your data. If you know about this apriori it may be possible to design a good if not great hash function.

Keys

  • ints/ numbers
  • strings
  • mix data types

General Hash Function Ideas:

  • Division-remainder method:

The size of the number of items in the table is estimated. That number (N, size of table) is then used as a divisor into each original value or key to extract a quotient and a remainder. The remainder is the hashed value. (Since this method is liable to produce a number of collisions, any search mechanism would have to be able to recognize a collision and offer an alternate search mechanism.)

    Hash(key) = key % N

    key = number (integer)



  • String - sum of characters:

    Hash(key) =

  • for(i=0 to key.length) { v+= ascii value (key[i]) }

    return v;

     

    • Adds up ASCII values of characters in the string
    • Advantage: Simple to implement and computes quickly
    • Disadvantage: If TableSize large, function does not distribute keys well
      
      • Example: Keys are at most 8 characters.
        Maximum sum (8*256 = 2048), but
        TableSize 10007. Only 25 percent could be
        filled.
    • Problem = you can also with the current Hash() function about have the situation where the hash table is less than 2048, in this case you need to modify to the following:

      Hash(key) = Hash(key) % Table_Size

 

 

  • Horner's Rule for String Hashing:

Lets suppose that we have keys made up of the lower case alphabet a-z plus the numbers 0 to 9. This means a total of 37 possible characters. (you can generalize this to be any number of characters ...like to include uppercase and punctuation).

  • Idea: to get a better spread of values in a hash function over simply adding the ascii values, lets multiple (shift) the current sum by 37 before adding the next acii value in our character (string) sequence. We get the following

    Hash(key) =

  • for(i=0 to key.length) { v = 37*v + ascii value (key[i]) }

    return v % Table_Size; //in case v larger than Table_Size

  • Before the % command the new range of output for our previous example of 8 character long strings will now be from 0 to 37^8*max_ascii_value=3.5124e^12*max_ascii_value. This is much larger than our previous example.
  • Note: you should replace 37 with the number of different characters you expect. If the number of characters you expect is a power of 2 then you can do faster bit shifting over multiplication in the equation above.

There are more extensions to this algorithm and you can see it discussedi n more detail here.

 

  • Folding:

This method divides the original value (digits in this case) into several parts, adds the parts together, and then uses the last four digits (or some other arbitrary number of digits that will work ) as the hashed value or key.

  • Radix transformation:

Where the value or key is digital, the number base (or radix) can be changed resulting in a different sequence of digits. (For example, a decimal numbered key could be transformed into a hexadecimal numbered key.) High-order digits could be discarded to fit a hash value of uniform length.

  • Digit rearrangement:

This is simply taking part of the original value or key such as digits in positions 3 through 6, reversing their order, and then using that sequence of digits as the hash value or key.

 

Universal Hashing

A problem: for any one hash funciton there will some set of (bad) keys that will may to the same slot

Solution: create a set of hash functions and randomly select from this set what hash function to use. RANDOMIZATION will help reduce the probability of our problem. This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary (trying to find those bad keys).

More accurately, Univeral Hashing requires that you have a set of hash functions such that any two possible keys will map to the same slot with any hash function h randomly drawn from our set H with probability at most 1 / m where m is the size of our hash table.

Uses: Cryptography

 

 

© Lynne Grewe