2016-12-12

posted Dec 12, 2016, 7:15 AM by Samuel Konstantinovich   [ updated Dec 13, 2016, 6:00 AM ]
Big-O notation and Run-time Analysis.

Formal Definition (semi formal):

You can say that 
f(x) = O(g(x))  as  x approaches infinity

if and only if there is a positive constant k, that makes:
|f(x)| <= k*|g(x)| for all x >= x0  (edit added domain restriction)

This means that the big-O notation is an upper bound!

Overall Idea:

We are interested in how quickly an algorithm runs relative to the input of the problem. Generally, we think of N as the input size (a list with N elements), but N could be the actual input (such as when calculating the Nth term of a series, or the Nth prime.)

We do not care about an exact number of seconds. Instead, think along the lines:
"How many more seconds does it take when we double N? And when we double N again?"

If the time stays the same, we say it runs in constant time.

If the time increases by a constant factor (double triple or something similar), we say it runs in linear time. The constant (slope of the line) is not important.

We care about very large input sizes, we want to know how the algorithm behaves as N gets "arbitrarily large."


Examples:

//constant time:  O(1)
public int foo(int [] ary){
    return ary[0];
}

//linear time: O(N)
public int foo2(int [] ary){
    int total = 0;
    for( int i : ary){
      total += i;
    }
    return total;
}


//STILL linear time, because the input is
//the N. We can consider:
// N = rows * cols 
public int foo2(int [][] ary){
    int total = 0;
    for( int[] row : ary){
        for( int j : row){    
          total += j;
        }
    }
    return total;
}



//quadratic time O(N^2)
public int foo3(int [] ary){
    int total = 0;
    //outer loop runs N times.
    for( int i = 0; i < ary.length - 1; i++){
        //inner loop runs N times (on average)
        for( int j = i+1 ; j < ary.length; j++){   
          if(ary[i] == ary[j]){
             total +=1; 
           }
        }
    }
    return total;
}


NO CONSTANTS:

Consider the functions:

public int foo2(int [] ary){
    int total = 0;
    for( int i : ary){
      total += i;
    }
    return total;
}

public int foo4(int [] ary){
    int total = 0;
    for( int i : ary){
      total += i;
    }
    for( int i : ary){
      total += i;
    }
    return total;
}

Since they both double the runtime as we double the input size, we consider both
N and 2*N to be just O(N)

Most significant Term Only

When compared to N we do not worry about 2*N, so it should follow that we should not worry about N + 10. These are all O(N)

Similarly:
N^2  versus   N^2 + 5N  are both considered to be just O(N^2)

The reason is that as N becomes very large, the smaller terms become far less significant.  Consider: When N is a million, the difference between N^2 and 5N is very large.

Worst Case:

Sometimes the algorithms we write can stop early. This is great! Except we tend to consider the worst case scenario.

O(N) is an upper bound; in other words the worst case.
  

Comments