Algorithm Design

Problem 1

Give the best and worst case run times in O-notation.

a. The following code segment searches for an item needle in a sorted array using a variation on binary search. Instead of splitting the array in half on each iteration, it chooses a random position to split at.

function random_binary_search( sorted_array, needle ) {
    var left = 0;
    var right = sorted_array.length;
    var pivot;

    while( right >= left ) {
        pivot = Math.random() * (right - left) + left;
        if( sorted_array[pivot] == needle ) return true;
        if( sorted_array[pivot] > needle ) right = pivot - 1;
        if( sorted_array[pivot] < needle ) left = pivot + 1;
    }

    return false;
}


Suppose Math.random() always returned 0, then this code will just do a linear scan and run in something like $O(n)$.

However, it’s probably safe to assume that Math.random() is a uniform random variable. As such, the value of pivot is most likely going to be approximately halfway between left and right. Cutting the search space in half on every iteration yields something like $O(\lg n)$.

To be pedantic, you could say this code is bounded by $O(n)$ and $\Omega(\lg n)$.

b. The following code segment sorts an array of boxes, and then checks for overlapping pairs. Each box is defined by a min and max value for each axis. Two boxes overlap if they overlap on every axis.


Overlap only if they overlap on every axis.

function sort_and_sweep(boxes, sort_axis) {
    var overlaps = []

    // Sort the boxes along a given axis.
    boxes.sort(sort_axis); // O(n lg n)

    for(var i = 0; i < boxes.length; i++) {
        for(var j = i + 1; j < boxes.length; j++) {
            // Since boxes are sorted this gives us an early exit.
            if(boxes[i].max[sort_axis] < boxes[j].min[sort_axis] )
                break;

            // Check if they overlap along all axes.
            if( detectOverlap(boxes[i], boxes[j]) )
                overlaps.push( [boxes[i], boxes[j]] );
        }
    }

    return overlaps;
}


Sorting the list of rectangles is $O(n\lg n)$ assuming a decent sorting algorithm.

For best case behavior, assuming none of them are overlapping along sort_axis, the nested loops are $O(n)$, since the inner loop would be $O(1)$ (or close to it in near-optimal cases).

What happens when they cluster along a single axis (e.g. if we compared along the y-axis instead of x)? The inner loop won’t terminate early, and so it could potentially compare boxes[i] to every other element. Comparing an element to every element that follows it works out to $O(\frac{1}{2}n(n-1)) \rightarrow O(n^2)$.

So, the worst case is $O(n^2)$ and the best case is $O(n\lg n)$.


Problem 2

Design an algorithm to solve the following problems.

a. Given an array of $n + 1$ integers, all of which are between 1 and $k$, containing
exactly one duplicate, give a $O(n + k)$ time algorithm for finding the duplicated
element. It is possible for $k$ to be much larger than $n$.


Create a temporary array of length $k$ with all values set to zero (or false). Scan through the input array and use each element to index into the temporary array and increment the value. If the value is already 1 (or true), then you found the duplicate element. Initializing a temporary array is $O(k)$ and scanning the input is $O(n)$, so the result is the desired (not optimal) $O(n + k)$.

The optimal solution is to use a hash table or dictionary, which is $O(n)$.

b. Given $n$ integers, find the pair with the smallest difference between them. What is the run time of your solution?


One way to do this is to sort the integers and then find the difference between each consecutive pair – an $O(n\lg n)$ solution.

c. Given $n$ 2D points, find the pair with the smallest Euclidean distance between them. What is the run time of your solution?


A good chance to use divide-and-conquer. The algorithm is as follows:

The combine step is a little tricky, and one way to do it is to remove the points that are further away from the dividing line than the distance between the minimum pair found in the conquer step. Sort the remaining points, and try to find a closer pair from these (e.g. the pairs that go across the dividing line).

You get a recurrence of $T(n) \leq 2T(n/2) + O(n \lg n)$ which reduces to $T(n) \in O(n \lg^2 n)$. However, this problem can be done in $O(n \lg n)$ if you merge the results from the conquer step instead of sort them (its a little more complicated).

For reference, here’s how the code might look like:

function closest_pair( points ) {
    midpoint = compute_midpoint(points);

    // both of these are T( n/2 )
    left_delta = closest_pair( midpoint.getLeft() );
    right_delta = closest_pair( midpoint.getRight() );

    min_delta = Math.min(left_delta, right_delta);

    // Delete all points further than min_delta from midpoint 
    points = midpoint.shrink( min_delta ); // O(n)

    // Sort remaining points by y-coordinate.
    points.sortAlongY(); // O(n lg n)

    // O(n)
    for(var i = 1; i < points.length; i++) {
        if(distance( points[i-1], points[i] ) < min_delta) {
            min_delta = distance( points[i-1], points[i] );
        }
    }

    return min_delta;
}

You can see that each call is $2T(n/2) + O(n \lg n) + 2O(n) = 2T(n/2) + O(n \lg n)$, which yields $O(n \lg^2 n)$. See Wikipedia for more, and searching “closest pair of points” will give you a ton of information.