Courses‎ > ‎AP Computer Science‎ > ‎Konstantinovich‎ > ‎notes‎ > ‎

2017-12-04 Big O Notation

posted Dec 4, 2017, 10:13 AM by Konstantinovich Samuel   [ updated Dec 4, 2017, 8:21 PM ]
Big-O notation and Run-time Analysis.

Overall Idea:

We are interested in how quickly an algorithm runs relative to the input of the problem. Generally, we think of N as the input size (a list with N elements), but N could be the actual input (such as when calculating the Nth term of a series, or the Nth prime.)

We do not care about an exact number of seconds. Instead, think along the lines:
"How MUCH LONGER does it take when we double N? And when we double N again?"

If the time stays the same, we say it runs in constant time.

If the time increases by a constant factor (double triple or something similar), we say it runs in linear time. The constant (slope of the line) is not important.

We care about very large input sizes, we want to know how the algorithm behaves as N gets "arbitrarily large."


Examples:

//constant time:  O(1)
public int foo(int [] ary){
    return ary[0];
}

//linear time: O(N)
public int foo2(int [] ary){
    int total = 0;
    for( int i : ary){
      total += i;
    }
    return total;
}


//STILL linear time, because the input is
//the N. We can consider:
// N = rows * cols 
//BUT we can say this is O(rows*cols) as well. It depends on what you consider the N to be

public int foo2(int [][] ary){
    int total = 0;
    for( int[] row : ary){
        for( int j : row){    
          total += j;
        }
    }
    return total;
}



//quadratic time O(N^2)
public int foo3(int [] ary){
    int total = 0;
    //outer loop runs N times.
    for( int i = 0; i < ary.length - 1; i++){
        //inner loop runs N times (on average)
        for( int j = i+1 ; j < ary.length; j++){   
          if(ary[i] == ary[j]){
             total +=1; 
           }
        }
    }
    return total;
}


NO CONSTANTS, and only count the FASTEST GROWING terms:
If f(x) is a sum of several terms, if there is one with largest growth rate, it can be kept, and all others omitted. 
If f(x) is a product of several factors, any constants (terms in the product that do not depend on x) can be omitted.

Consider the functions:

public int foo2(int [] ary){
    int total = 0;
    for( int i : ary){
      total += i;
    }
    return total;
}

public int foo4(int [] ary){
    int total = 0;
    for( int i : ary){
      total += i;
    }
    for( int i : ary){
      total += i;
    }
    return total;
}

Since they both double the runtime as we double the input size, we consider both
N and 2*N to be just O(N)   where N is the length of the array.

Most significant Term Only

When compared to N we do not worry about 2*N, so it should follow that we should not worry about N + 10. These are all O(N)

Similarly:
N^2  versus   N^2 + 5N  are both considered to be just O(N^2)

The reason is that as N becomes very large, the smaller terms become far less significant.  Consider: When N is a million, the difference between N^2 and 5N is very large.

Worst Case:

Sometimes the algorithms we write can stop early. This is great! Except we tend to consider the worst case scenario.

O(N) is an upper bound; in other words the worst case.

Formal Definition (semi formal):

You can say that 
f(x) = O(g(x))  as  x approaches infinity

if and only if there is a positive constant k, that makes:
|f(x)| <= k*|g(x)| for all x >= x0
note:  |f(x)| is how much work your function requires. 

e.g.
Your function f(x) is O(g(x))   if and only if |f(x)| <= k*|g(x)|
Your function foo() is O(N)  if an only if |foo()| <= k*N
Your function bar is O(N^2) if an only if |bar()| <= k*N^2

Upper Bound:
"<=" means that the big-O notation is an upper bound!
When f(x) is O(N) , it is alo O(N^2) but that is not very useful. We generally want the CLOSEST bound for the runtime!




Comments