Sunday, October 11, 2009

When Noise is Your Friend: Smoothed Analysis

Have you ever encountered the phrase "the algorithm has exponential running time, in the worst-case scenario, but in practice we observed it to be pretty efficient"? It is the phrase that divides theoreticians and practitioners. Many theoretical computer scientists focus on the analysis of the worst case complexity, generating often results that contradict practice.

For example, the simplex algorithm for linear programming is well known to be pretty efficient in practice. In theory, the worst-case complexity of simplex is exponential, classifying the simplex algorithm as a "non-efficient" algorithm. However, simplex has exponential running time only for very special cases. Most practitioners would even argue that you will never encounter such strange cases in practice. Only an adversary could potentially design such inputs.

Similarly, the Traveling Salesman Problem is a hallmark example of an NP-complete problem, i.e., unlikely to have an efficient algorithm anytime soon. However, there are many implementations of TSP that can provide almost optimal solutions for TSP, for pretty big inputs.

K-means is another such algorithm. It has a horrible worst-case scenario but ask the millions of people that use it for clustering. One of the most efficient clustering algorithms, despite its wost-case exponential complexity.

So, how can we reconcile theory and practice?

A very nice approach towards this reconciliation is the case of smoothed analysis. I first learned about this approach for analyzing algorithms by attending the (fascinating) job talk of Jon Kelner. Jon showed that if you pertubate a little bit the input before feeding it to the simplex algorithm, then it is almost impossible for the pertubed input to generate an exponential running time. In other words, by adding a little bit of noise in the data, there is the guarantee that we avoid the "tricky" parts of the input space.

What is the beauty of this approach? It explains why in many cases "inefficient" algorithms work well in practice: Most real data contain noise, and this noise can actually be beneficial! The other big lesson is that sometimes an algorithm ends up having a horrible worst-case performance just due to a small number of potential inputs, that are almost adversarial. Adding noise, may take care of these strange cases.

The last issue of Communications of ACM, has a great review article by Spielman and Teng on Smoothed Analysis. Explains the difference between worst-case, average-case, and smoothed analysis, and points to a wide variety of problems that have been analyzed using this technique. Highly recommended!

5 comments:

  1. Reminds me of a topology course I took in which the instructor repeatedly used the term "general position" to rule out conditions that wouldn't hold if you jiggle things a little bit. For example, you might invoke general position to assume a simplex has no face perpendicular to a coordinate axis.

    ReplyDelete
  2. Wouldn't it be possible to assign probabilities to the problematic regions of the input space?

    I can imagine this would be possible if the question "how far away from the really bad point do you have to be so that the algorithm is efficient", given some measure of efficiency, can be answered.

    But in those happy cases where Kelner's result applies, the probability of accidentally hitting a bad input point should be almost zero.

    ReplyDelete
  3. Argh. The article's pay-per-view.

    Theoreticians have been proving polynomial time expected-case analyses of NP complete algorithms for quite a while now. Wikipedia says smoothed analysis is a "hybrid" of worst case and expected case.

    This also reminds me of the typical set analyses used in information-theoretic or statistical proofs. If you can prove a property of "almost every" set in some limit, then sets with the property are "typical" and you can reason about only typical sets. For instance a long series of fair coin flips with percentage of heads near 50% is typical (the boundaries of typicality here given by the central limit theorem).

    ReplyDelete
  4. The CACM article is a review article, essentially discussing the variety of results that came out of this research direction since its introduction in 2001.

    I knew about the proof for simplex and I have recently heard about k-means. I also noticed a result stating that an 2-competitive approximation for TSP (i.e., guaranteed to be at most twice as bad as the optimal) is smoothed-polynomial.

    > Wouldn't it be possible to assign
    > probabilities to the problematic
    > regions of the input space?

    In principle, we may assign probabilities to every part of the space. But it is unclear what the probabilities should be.

    So smoothed analysis assigns equal probabilities in all points but "smooths" each input by adding noise to it. Kind of a "worst-case" of "locally averaged cases".

    I am also getting really pissed with this ACM paywall, especially for CACM, its flagship journal. If not all the ACM publications, at least CACM should be open to everyone. And if not in its entirety, at least the current issue.

    ReplyDelete
  5. >If you pertubate a little bit the input before feeding it to the simplex algorithm, then it is almost impossible for the pertubed input to generate an exponential running time.

    This starts to look an awful lot like using randomized algorithms. E.g., choosing the pivot in quicksort randomly makes it virtually impossible to get the worst-case running time of O(n^2). Basically, you're using random bits as noise.

    ReplyDelete