Calculating Time
Binary Search Revisited
Recall the recursive binary search algorithm presented earlier. The running
time of search(a,low,high,value), used to determine if one of a[low],
a[low+1], ..., a[high] is equal to value, depends on the size of high-low.
As high-low increase, running time increases. We use T(n) to denote the
number of steps used to execute search(a,high,low,value) where n=high-low+1.
Calling search(a,low,high,value) could result in one of four possibilities:
1. low > high so the algorithm returns -1.
2. low <= high and value = a[mid] so the algorithm returns mid.
3. low <= high and value > a[mid] so the algorithm returns search(a,mid+1,high,value).
4. low <= high and value < a[mid] so the algorithm returns search(a,low,mid-1,value).
The first two possibilities each use some constant number of steps and
the second two, by definition of T(n), use T(high-(mid+1)+1) and T(mid-1-low+1),
respectively. Thus, we see that:
T(n) = c1 if n < 1;
T(n) = c2 if n >= 1 and value = a[mid];
T(n) = T(high-(mid+1)+1) + c3 if n >= 1 and value > a[mid]; and
T(n) = T(mid-1-low+1) + c4 if n >= 1 and value > a[mid]
where c1, c2, c3 and c4 are constants. We can rewrite this equation in
terms of n rather than using low and high:
high-(mid+1)+1 = high-mid
= high-floor((high+low)/2)
= high+ceiling(-(high+low)/2) because -floor(x) = ceiling(-x)
= ceiling(high - (high+low)/2)
= ceiling((high-low)/2)
= ceiling((n-1)/2)
mid-1-low+1 = mid-low
= floor((high+low)/2)-low
= floor((high+low)/2 - low)
= floor((high-low)/2)
= floor((n-1)/2)
Thus, we have:
T(n) = c1 if n < 1;
T(n) = c2 if n >= 1 and value = a[mid];
T(n) = T(ceil((n-1)/2) + c3 if n >= 1 and value > a[mid]; and
T(n) = T(floor((n-1)/2) + c4 if n >= 1 and value > a[mid]
This is called a recurrence equation for T(n). Unfortunately, recurrence
equations do not tell us much about actual running so we need to derive
a direct equation for T(n). This will be difficult to do with the floor
and ceiling functions so we obtain a recurrence inequality:
T(n) = c1 if n < 1;
T(n) <= T(n/2) + k1 otherwise (where k1=max(c3,c4))
This is true because binary search n/2 >= ceil((n-1)/2) and floor((n-1)/2)
and binary search uses the same or a larger number of steps when searching
larger subsequences. We ignore the case when value = a[mid] because
we are interested in the worst case running time of binary search.
Finding a match never gives a worst case running time because the search
stops as soon as a match is found.
We can now find an upper bound for T(n) as follows:
T(n) <= T(n/2) + k1
<= (T(n/4) + k1) + k1 because T(n/2) <= T(n/4) + k1
= T(n/4) + 2k1
<= (T(n/8) + k1) + 2k1 because T(n/4) <= T(n/8) + k1
= T(n/8) + 3k1
<= (T(n/16) + k1) + 3k1 because T(n/8) <= T(n/16) + k1
= T(n/16) + 4k1
<= (T(n/32) + k1) + 4k1 because T(n/16) <= T(n/32) + k1
= T(n/32) + 5k1
T(n) <= T(n/2i) + ik1 if we generalize from above
T(n) = c1 when n < 1 so we can replace T(n/2i) with c1 when n/2i <
1:
n/2i < 1 if and only if n < 2i if and only if log2n < i.
Thus, T(n/2log2(n)+1) = c1 so:
T(n) <= T(n/2i) + ik1 <= c1 + (log2(n)+1)k1
We have just proven that T(n) < alog2n + b where a and b are constants.
Therefore, T(n) is O(log2n).
|