CS3240

Calculating Time

Binary Search Revisited
Recall the recursive binary search algorithm presented earlier. The running time of search(a,low,high,value), used to determine if one of a[low], a[low+1], ..., a[high] is equal to value, depends on the size of high-low. As high-low increase, running time increases. We use T(n) to denote the number of steps used to execute search(a,high,low,value) where n=high-low+1. Calling search(a,low,high,value) could result in one of four possibilities:

1. low > high so the algorithm returns -1.
2. low <= high and value = a[mid] so the algorithm returns mid.
3. low <= high and value > a[mid] so the algorithm returns search(a,mid+1,high,value).
4. low <= high and value < a[mid] so the algorithm returns search(a,low,mid-1,value).

The first two possibilities each use some constant number of steps and the second two, by definition of T(n), use T(high-(mid+1)+1) and T(mid-1-low+1), respectively. Thus, we see that:

T(n) = c1 if n < 1;
T(n) = c2 if n >= 1 and value = a[mid];
T(n) = T(high-(mid+1)+1) + c3 if n >= 1 and value > a[mid]; and
T(n) = T(mid-1-low+1) + c4 if n >= 1 and value > a[mid]

where c1, c2, c3 and c4 are constants. We can rewrite this equation in terms of n rather than using low and high:

high-(mid+1)+1 = high-mid
= high-floor((high+low)/2)
= high+ceiling(-(high+low)/2) because -floor(x) = ceiling(-x)
= ceiling(high - (high+low)/2)
= ceiling((high-low)/2)
= ceiling((n-1)/2)

mid-1-low+1 = mid-low
= floor((high+low)/2)-low
= floor((high+low)/2 - low)
= floor((high-low)/2)
= floor((n-1)/2)

Thus, we have:

T(n) = c1 if n < 1;
T(n) = c2 if n >= 1 and value = a[mid];
T(n) = T(ceil((n-1)/2) + c3 if n >= 1 and value > a[mid]; and
T(n) = T(floor((n-1)/2) + c4 if n >= 1 and value > a[mid]

This is called a recurrence equation for T(n). Unfortunately, recurrence equations do not tell us much about actual running so we need to derive a direct equation for T(n). This will be difficult to do with the floor and ceiling functions so we obtain a recurrence inequality:

T(n) = c1 if n < 1;
T(n) <= T(n/2) + k1 otherwise (where k1=max(c3,c4))

This is true because binary search n/2 >= ceil((n-1)/2) and floor((n-1)/2) and binary search uses the same or a larger number of steps when searching larger subsequences. We ignore the case when value = a[mid] because we are interested in the worst case running time of binary search. Finding a match never gives a worst case running time because the search stops as soon as a match is found.

We can now find an upper bound for T(n) as follows:
T(n) <= T(n/2) + k1
<= (T(n/4) + k1) + k1 because T(n/2) <= T(n/4) + k1
= T(n/4) + 2k1
<= (T(n/8) + k1) + 2k1 because T(n/4) <= T(n/8) + k1
= T(n/8) + 3k1
<= (T(n/16) + k1) + 3k1 because T(n/8) <= T(n/16) + k1
= T(n/16) + 4k1
<= (T(n/32) + k1) + 4k1 because T(n/16) <= T(n/32) + k1
= T(n/32) + 5k1
T(n) <= T(n/2i) + ik1 if we generalize from above
T(n) = c1 when n < 1 so we can replace T(n/2i) with c1 when n/2i < 1:

n/2i < 1 if and only if n < 2i if and only if log2n < i.

Thus, T(n/2log2(n)+1) = c1 so:

T(n) <= T(n/2i) + ik1 <= c1 + (log2(n)+1)k1

We have just proven that T(n) < alog2n + b where a and b are constants. Therefore, T(n) is O(log2n).