CS3240: Data Structures and Algorithms

Hashing

  • is a means used to order and access elements in a list quickly -- the goal is O(1) time -- by using a function of the key value to identify its location in the list.

  • The function of the key value is called a hash function.

  • hash_function ( key) = storage cell location.

    maps a key representing the data to a storage location where you will store the data.

    Example 1: want to store the data representing an employee and the key is the persons social security number (a 9 digit number).

  • GOAL: make a hash function that distributes the keys "evenly" among the storage cells. Also, must handle situation of 2 keys going to same location.
  • PROBLEM: finite number of storage cells, often a very large sometimes inexhaustible supply or possibility of keys.

    Selection of Key - ideally should be unique to the data



    A Simple Example

HandyParts company makes no more than 100 different parts. But the parts all have four digit numbers.

 

This hash function can be used to store and retrieve parts in an array.     key = part Number

Hash(key) = partNum % 100

Example: Placing element into Hash Table

Use the hash function Hash(key) = partNum % 100 to place the element with part number 5502 in the array.

Hash(5502) = 5502 % 100

                    = 2

Problem: Collision in hash entry

Next place part number 6702 in the array.

Hash(6702) = 6702 % 100 = 2

But values[2] is already occupied. COLLISION

 

Solution 1: store at the array a linked list of collision elements. (chaining hashing or seperate chaining)

Example Code

 

Solution 2: Linear Probing

ReHash = (HashValue + 1) % 100

repeatedly until an empty location is found for part number 6702.

Where would the part with number 4598 be placed using linear probing?

Solution 3: open addressing

like solution 2 but, not necessarily increment by 1......but, need set algorithm.

Example Hash Functions

  • many and new ones being proposed all the time.
  • if you know anything about the data apriori you can customize

Rehashing

When table gets too full, time for operations takes too long, insertions start to practically fail.

Create a new larger table, (e.g. 2 times as big), with a new hash function and scan down the original hash table, and insert in the new table.

Relatively expensive O(N).

 

Applications of Hash

  • database storage and retrieval

  • cryptographic

  • error-checking purposes

 

© Lynne Grewe