# Analysis of Binary Search The analysis of a binary search is not the same as that of linear search because the loop of a binary search does not follow the pattern of going from the start of the array all the way to the end. It starts in the middle of an array and jump around. Not every element will be considered during the search process so this will be a bit different. {% hint style="info" %} A ***binary search*** can only be done to lists if two criteria are true: 1. the list must be sorted 2. access to any element given its position must have a constant run time. {% endhint %} ```cpp template int binarySearch(const vector& arr, const TYPE& key){ int rc=-1; int low=0; int high=arr.size()-1; int mid; while(low<=high && rc==-1){ mid=(low+high)/2; if(key < arr[mid]) high=mid-1; else if(key > arr[mid] ) low=mid+1; else rc=mid; }/*while*/ return rc; } ``` In the above code, there are two variables low and high. These represent the first and last index of the elements where key may be located. In other words, the only possible places key may exist is somewhere between arr\[low] and arr\[high] inclusive. In each iteration of the while loop we check the middle element between them. If key is smaller than the middle element, we move the high marker so that it is to one element before the middle one. If key is bigger, we move the low marker one element after the middle one. In other words we eliminate half of the remaining possibilities of key's location after each iteration of the while loop. We stop when we either find it or we have eliminated all possible locations for key. ### Analysis of an unsuccessful search Similar to a linear search we should analyze the run time for what happens in its worst case. In this case, it is also performing an unsuccessful search Let n represent the size of the array Let T(n) represent the number of operations needed to find key within the array In a binary search the following operators are done exactly one time: ``` int rc=-1; int low=0; int high=arr.size()-1; int mid; return rc; ``` The following are done for each iteration within the loop: ```cpp low<=high && rc==-1 //<-- 3 ops mid=(low+high)/2; //<-- 3 ops ``` Within the loop there is also an if/else if/else statement. there are 3 paths through this statement. The path that would would lead the the highest operation count would be to follow the else-if. The reason is that we would end up doing both checks. and there are more operations in the else-if than the else statement. Thus, we should use that pathway ```cpp if(arr[mid] > key) //<---check this (1 op) high=mid-1; //<---skip, check fails else if(arr[mid] < key) //<---check (1 op) low=mid+1; //<---do this (2 ops) else //<---skip rc=mid; //<---skip ``` Thus what we can say is that each time the loop runs, it will perform 10 operations at worst #### Number of Loop Iterations The next question then is to figure out how many times this loop will run. We are of course assuming that the search will fail. Thus, the only way for this loop to stop is for low<=high to become untrue. When we started off we have n elements to search. After one iteration through the loop we have approximately half of the original number of elements. After one more iteration we eliminate another half of the remaining elements. the last iteration would involve only 1 element because low == high == mid, after that we would end up with low > high Lets look at the pattern of the relationship between number elements within the search set and the iteration number. | Loop iteration | Number of elements between low and high inclusive (approx) | Denominator | | -------------- | ---------------------------------------------------------- | ------------ | | 1 | $$n = {n \over 1}$$ | $$1 = 2^0$$ | | 2 | $$n\over2$$ | $$2 = 2^1$$ | | 3 | $${n\over2}/2 = {n\over4}$$ | $$4 = 2^2$$ | | 4 | $${n\over4}/2 = {n\over8}$$ | $$8 = 2^3$$ | | 5 | $${n\over8}/2 = {n\over16}$$ | $$16 = 2^4$$ | | ... | ... | ... | | ?? | $$1 = {n \over n}$$ | $$n = 2^ ?$$ | Now look at the exponent 2 was raised to in the denominator and the relationship with its loop iteration. If we can find what we need to raise 2 to in order to get n we will be able to find the number of iterations the loop will run. By definition: $${b^x} = a$$ iff $$log\_ba = x$$ In other words, $$x$$ is the exponent we need to raise $$b$$ to in order to get $$a$$ . That value is $$log\_ba$$. Applied to our situation, the exponent we must raise $$2$$ to in order to get $$n$$ would be $$log\_2n$$. Our loop runs one time more than the exponent... thus our loop will run $$1+logn$$ time. Putting these ideas together, we can summarize the operation counts as follows: ```cpp template int binarySearch(const vector& arr, const TYPE& key){ int rc=-1; //1 int low=0; //1 int high=arr.size()-1; //2 (1 for = , 1 for - ) int mid; //1 while(low<=high && rc==-1){ //3*(1+logn) mid=(low+high)/2; //3*(1+logn) if(key < arr[mid]) //1*(1+logn) high=mid-1; else if(key > arr[mid] ) //1 *(1+logn) low=mid+1; //2 *(1+logn) else rc=mid; }/*while*/ return rc; //1 } ``` Adding it all up we find the following expression for $$T(n)$$ $$\begin{align\*} T(n) &= 10 ( 1 + logn) + 6 \ &= 10 + 10logn + 6 \ &= (10( log n)) + 16 \end{align\*}$$ The dominating term is $$10(logn)$$ . Thus, $$T(n)$$ is $$O(logn)$$