## Sunday, March 13, 2016

### AlphaGo, Beat the Machine, and the Unknown Unknowns

In Game 4, of the 5-game series between AlphaGo and Lee Sedol, the human Go champion, Lee Sedol managed to get his first win. According to the NY Times article:

Lee had said earlier in the series, which began last week, that he was unable to beat AlphaGo because he could not find any weaknesses in the software's strategy. But after Sunday's match, the 33­ year­ old South Korean Go grandmaster, who has won 18 international championships, said he found two weaknesses in the artificial intelligence program. Lee said that when he made an unexpected move, AlphaGo responded with a move as if the program had a bug, indicating that the machine lacked the ability to deal with surprises.

This part reminded me of one of my favorite papers:  Beat the Machine: Challenging Humans to Find a Predictive Model’s “Unknown Unknowns”

In the paper, we tried to use humans to "beat the machine" and identify vulnerabilities in a machine learning system. The key idea was to reward humans whenever they identify cases where the machine fails, while also being confident that it provides the correct answer. In other words, we encouraged humans to find "unexpected" errors, not just cases where naturally the machine was going to be uncertain.

As an example case, consider a system that detects adult content on the web. Our baseline machine learning system had an accuracy of ~99%. Then, we asked Mechanical Turk workers to do the following task: Find web pages with adult content that the machine learning system classifies as non-adult with high confidence. The humans had no information about the system, and the only thing they can do was to submit a URL and get back an answer.

The reward structure was the following: Humans get \$1 for each URL that the machine misses, otherwise they get \$0.001. In other words, we provided a strong incentive to find problematic cases.

After some probing, humans were quick to uncover underlying vulnerabilities: For example, adult pages in Japanese, Arabic, etc., were classified by our system as non-adult, despite their obvious adult content. Similarly for other categories, such as hate speech, violence, etc. Humans were quickly able to "beat the machine" and identify the "unknown unknowns".

Simply told, humans were able to figure out what are the likely cases that the system may have missed during training. At the end of the day, the training data is provided by humans, and no system has access to all possible training data. We operate in an "open world" while training data implicitly assume a "closed world".

As we see from the AlphaGo example, since most machine learning systems rely on existence of training data (or some immediate feedback for their actions), machines may get into problems when they have to face examples that are unlike any examples they have processed their training data.

We designed our Beat The Machine system to encourage humans to discover such vulnerabilities early.

In a sense, our BTM system is s like hiring hackers to break into your network, to identify security vulnerabilities before they become a real problem. The BTM system applies this principle for machine learning systems, encouraging a period of intense probing for vulnerabilities, before deploying the system in practice.

Well, perhaps Google hired Lee Sedol with the same idea: Get the human to identify cases where the machine will fail, and reward the human for doing so. Only in that case, AlphaGo managed to eat its cake (figure out a vulnerability) and have it too (beat Lee Sedol, and not pay the \\$1M prize) :-)