Data Structures and Algorithms
Primary version
Primary version
  • Data Structures and Algorithms
  • Algorithms Analysis
    • Measuring Resource Consumption
    • Growth Rates
    • Asymptotic Notation
    • Analysis of Linear Search
    • Analysis of Binary Search
    • How to do an analysis in 5 steps
  • Recursion
    • Writing a recursive function
    • How do recursive functions work?
    • Analysis of a Recursive Function
    • Drawbacks of Recursion and Caution
  • Lists
    • Implementation
    • Linked List
      • Concepts
      • Implementation - List and Nodes
      • Implementation - push_front(), pop_front()
      • Implementation - Iterators
      • Modification - Sentinel Nodes
  • Stacks and Queues
    • Stack Implementation
    • Queue Implementation
  • Table
    • A Simple Implementation
    • Hash Tables
      • Bucketing
      • Chaining
      • Linear Probing
  • Sorting
    • Simple Sorts
      • Bubble Sort
      • Insertion Sort
      • Selection Sort
    • Merge Sort
    • Quick Sort
    • Heap and Heap Sort
      • Priority Queues using Binary Heaps
      • Heapify and Heap Sort
  • Trees
    • Binary Trees
    • Binary Search Trees
    • BST Implemenation
    • Iterative Methods
    • Recursive Methods
  • AVL Trees
  • Red Black Trees
  • 2-3 Trees
  • Graphs
  • Introduction to Computational Theory
  • Appendix: Markdown
  • Appendix: Mathematics Review
Powered by GitBook
On this page
  • Assumptions
  • Analysis of an Unsuccessful Search
  • Where do the counts come from?
  • Analysis of a Successful Search
  1. Algorithms Analysis

Analysis of Linear Search

To look at how to perform analysis, we will start with a performance analysis of the following C++ function for a linear search:

template <class TYPE>
int linearSearch(const vector<TYPE>& arr, const TYPE& key){
    int rc=-1;
    for(int i=0;i<arr.size()&& rc==-1;i++){
        if(arr[i]==key){
            rc=i;
        }
    }
    return rc;
}

Assumptions

Looking at this code, we can see that there is a loop. The loop stops on one of two conditions, it reaches the end of the array or rc becomes something other than -1. rc changes only if key is found in the array. So, what are the possibilities when I run this program?

1) key is not in the array... in that case loop stops when it gets to the end of the array

2) key is in the array... but where? If it is at the front, then loop stops right away, if it is in the back then it needs to go through all the elements... clearly these will both affect how many operations we carry out.

The worst case scenario is that of an unsuccessful search... the loop will need to run more times for that. If the search was successful however, where will the key be? To perform an average case scenario, we need to consider the likelihood of finding the key in each element. Now, making the assumption that the key is equally likely to be in any element of the array, we should expect to go through half the array on average to find key.

We will look at each of these in turn below

Analysis of an Unsuccessful Search

Let n represent the size of the array arr. Let T(n) represent the number of operations necessary to perform linear search on an array of n items.

Next we go through the code and count how many operations are done. We can do this by simply adding a comment to each line with the total operation count per line. That is we count how many operations are on that line AND the number of times that particularly line is done.

template <class TYPE>
int linearSearch(const vector<TYPE>& arr, const TYPE& key){
    int rc=-1;                                //1
    for(int i=0;i<arr.size()&& rc==-1;i++){   //1 + 5n
        if(arr[i]==key){                      //n
            rc=i;                             //0
        }
    }
    return rc;                                //1
}

Where do the counts come from?

  • int rc=-1; is only ever done once per function call, there is one operator on that line, so the count is 1

  • for(int i=0;i<arr.size()&& rc==-1;i++) int i=0 is only done once per function call, so it is counted as 1, the other operations <, . (dot), &&, == and ++ happen for each loop iterations and since the loop happens n times, each of those 5 operations are multiplied by n

  • if(arr[i]==key) this is done every time we are in loop, loop runs n times, so count as 1

  • rc=i it is worth noting that because we are assuming our search is unsuccessful, the if statement never evaluates to true, thus, this line of code never runs. Thus, it is not counted

  • return rc; like the first initialization statement this return statement only happens once, so we count it as 1

Come up with the expression for T(n)

Recall that T(n) represents the total number of operations needed for an unsuccessful linear search. As such, we get T(n) by summing up the number of operations. Thus:

\begin{align*} T(n) &= 1 + 1 + 5n + n + 1 \\ &= 6n + 3 \end{align*}

Thus the expression for the nu-mber of operations that is performed is:

Now... can we actually prove this?

According to the formal definition for big-O

This is also the reason why it did not matter if we counted the operations for the . operator or the [i] operator. If we counted both of them,

The dominating term would be 8n but this is still O(n).

Analysis of a Successful Search

In the previous analysis, we assumed we wouldn't find the data and thus the worst possible case of searching through the entire array was used to do the analysis. However, what if the item was in the array?

Now since we will at some point find the item, the statement inside the if will also be performed. Thus, we will have 4 operations that we must run through once.

int rc = -1
int i = 0
rc=i
return rc;

These other operations will also run for each iteration of the loop:

i<arr.size() && rc == -1   --> 3 operations
i++                        --> 1 operation
if(arr[i]==key)            --> 2 operations

The difference is we do not expect that the loop would have to run through the entire array. On average we can say that it will go through half of the array.

Thus:

PreviousAsymptotic NotationNextAnalysis of Binary Search

Last updated 4 years ago

In the above code we are treating arr.size() function call as if its a constant value or variable. This can be done only because the size() function in vector class is constant. You can check this out here: under the heading complexity. If the complexity was something else, we would need to account for this in the analysis. For example, strlen() in cstring library does not have a constant run time and thus we can't count it as one operation

The 6n6n6n comes from the 6 operations that we do each time through the loop. We have n elements in the array. Thus, the loop must run 6n6n6n times.

The +3+ 3+3 comes from the 3 operations that we always have to do no matter what.

T(n)=6n+3T(n) = 6n + 3T(n)=6n+3

Now, imagine nnn becoming a very very large number. As nnn gets bigger and bigger, the 333 matters less and less. Thus, in the end we only care about 6n6n6n . 6n6n6n is the dominating term of the polynomial. This is similar to this idea. If you had 6 dollars, then adding 3 dollars is a significant increase to the amount of money you have. However, if you had 6 million dollars, then 3 dollars matter very little. Similarly if you had to find something in an array with 6 million elements

Using the dominating term, we can say that the linear search algorithm has a run time that never exceeds the curve for n. In other words, linear search is O(n)O(n)O(n)

" T(n)T(n)T(n) is O(f(n))O(f(n))O(f(n))" if for some constants ccc and n0n_0n0​​​, T(n)≤cf(n)T(n) \leq cf(n)T(n)≤cf(n) for all n≥n0n \geq n_0n≥n0​

In other words... to prove that T(n)=6n+3T(n) = 6n + 3T(n)=6n+3 is O(n)O(n)O(n) we must find 2 constants ccc and n0n_0n0​ ​​ such that the statement T(n)≤cnT(n)\leq cnT(n)≤cn is true for all n≥n0n \geq n_0n≥n0​

The important part about this... is to realize that we can pick any constant we want as long as it meets the criteria. For ccc , I can pick any number > 6. I will pick 10 (there isn't a good reason for this other than its bigger than 6, and the math is easy to do in our head for the purposes of presenting this) c = 10. I will also pick 1 for n0n_0n0​ ​​,

The following graph shows both T(n)=6n+3T(n) = 6n +3T(n)=6n+3 (blue) as well as the cf(n)=10ncf(n) = 10ncf(n)=10n (orange). At some point, the line for 6n+36n+36n+3 falls under 10n10n10n and it never gets above it after that point.

Thus, we have proven that the function is O(n)O(n)O(n) because we were able to find the two constants ccc and ​​ n0n_0n0​ needed for the T(n)T(n)T(n) to be O(n)O(n)O(n)

T(n)=8n+3T(n) = 8n + 3T(n)=8n+3

The proof would go exactly the same way except that we can't use n0=1n_0 = 1n0​=1 as the statement T(n)≤cnT(n) \leq cnT(n)≤cn is not true when n=1n = 1n=1 . However, when n=2 n = 2n=2 , T(n)=19 T(n) = 19T(n)=19 and cncncn would be 20. Thus, it would be true and stays true for all values of n>2n > 2n>2

Assuming that key is equally likely to be in any element, we would have to search through nnn elements at worst and n2​n \over 2​2​n​ ​​ elements on average.

T(n)=6∗0.5n+4=3n+4T(n) = 6 * 0.5 n + 4 = 3n + 4T(n)=6∗0.5n+4=3n+4

Now... as we said before... the constant of the dominating term does not actually matter. 3n3n3n is still nnn . Our proof would go exactly as before for a search where the key would not found.

https://en.cppreference.com/w/cpp/container/vector/size