Hash Function
Output = Hash(key)
There a many,many hash functions proposed.
How well any one hash function will work (distributing evenly) keys will be directly related to the distribution of the key values themselves.
For this reason, hash functions are proposed as related to the distribution of your data. If you know about this apriori it may be possible to design a good if not great hash function.
Keys
- ints/ numbers
- strings
- mix data types
General Hash Function Ideas:
-
Division-remainder method:
The size of the number of
items in the table is estimated. That number (N, size of table) is then used as a
divisor into each original value or key to extract a quotient
and a remainder. The remainder is the hashed value. (Since this
method is liable to produce a number of collisions, any search
mechanism would have to be able to recognize a collision and offer
an alternate search mechanism.)
Hash(key) = key % N
key = number (integer)
-
String - sum of characters:
Hash(key) =
for(i=0 to key.length) { v+= ascii value (key[i]) }
return v;
-
Horner's Rule for String Hashing:
Lets suppose that we have keys made up of the lower case alphabet a-z plus the numbers 0 to 9. This means a total of 37 possible characters. (you can generalize this to be any number of characters ...like to include uppercase and punctuation).
There are more extensions to this algorithm and you can see it discussedi n more detail here.
This method divides the original value (digits
in this case) into several parts, adds the parts together, and
then uses the last four digits (or some other arbitrary number
of digits that will work ) as the hashed value or key.
Where the value or key is digital,
the number base (or radix) can be changed resulting in a different
sequence of digits. (For example, a decimal numbered key could
be transformed into a hexadecimal numbered key.) High-order digits
could be discarded to fit a hash value of uniform length.
This is simply taking part of the
original value or key such as digits in positions 3 through 6,
reversing their order, and then using that sequence of digits
as the hash value or key.
A problem: for any one hash funciton there will some set of (bad) keys that will may to the same slot
Solution: create a set of hash functions and randomly select from this set what hash function to use. RANDOMIZATION will help reduce the probability of our problem. This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary (trying to find those bad keys).
More accurately, Univeral Hashing requires that you have a set of hash functions such that any two possible keys will map to the same slot with any hash function h randomly drawn from our set H with probability at most 1 / m where m is the size of our hash table.
Uses: Cryptography
|