Tuesday, January 17, 2017

Why was my Amazon Mechanical Turk registration denied?

(This is my answer to a question posted on Quora)

Mechanical Turk is a platform for work. Workers get paid, which makes now Amazon a payment processor. Payment processors are moving money on behalf of other people, and therefore are under heavy scrutiny from the US government for issues related to money laundering (AML), counter-terrorism, tax compliance, etc.

One of the key things that is required from financial institutions is to have a “Customer Identification Program” (CIP), also known as “Know Your Customer” (KYC) process. The CIP/KYC is a set of procedures that the financial institution needs to follow to establish that they know the true identity of a customer. The processes that each financial institutions follows vary, and the exact processes are rarely available to the public, as they are considered security measures. Furthermore, the practices are regularly monitored by regulators (OCC, Fed, FinCEN, etc) and change over time to follow best practices.

In your particular case, the most likely reason is that Amazon was not able to verify your identity.

If you are in the US, Amazon most probably can get your SSN and other personal details and verify whether you are a real person. However, even if you live in the US, if you have no credit history, no bank accounts, and so on, the verification will come back with low confidence. Following standard risk management processes, Amazon could plausibly reject such applications, as part of their CIP processes: it is better to have a false negative (rejecting a normal account) than having a false positive (e.g., accepting an account that will be involved in money laundering or tax-evasion schemes).

For other countries, the ability of Amazon, to follow CIP/KYC processes that conform to the US regulations, varies. I assume, for example, that the cooperation of US with UK or Australian authorities is much smoother compared to, say, Chinese authorities. So, if you live outside the US, the probability of having your account approved depends on how robust is the ability of Amazon to verify individual identities in your country.

Given that Amazon gets paid by requesters, I assume their focus is to establish CIP processes first in regions where potential requesters reside, which is not always the place where workers reside. This also means that you are more likely to be approved if you first register as a requester (assuming this is an option for you), and then try to create the worker account.

Sunday, March 13, 2016

AlphaGo, Beat the Machine, and the Unknown Unknowns

In Game 4, of the 5-game series between AlphaGo and Lee Sedol, the human Go champion, Lee Sedol managed to get his first win. According to the NY Times article:

Lee had said earlier in the series, which began last week, that he was unable to beat AlphaGo because he could not find any weaknesses in the software's strategy. But after Sunday's match, the 33­ year­ old South Korean Go grandmaster, who has won 18 international championships, said he found two weaknesses in the artificial intelligence program. Lee said that when he made an unexpected move, AlphaGo responded with a move as if the program had a bug, indicating that the machine lacked the ability to deal with surprises.

This part reminded me of one of my favorite papers:  Beat the Machine: Challenging Humans to Find a Predictive Model’s “Unknown Unknowns”

In the paper, we tried to use humans to "beat the machine" and identify vulnerabilities in a machine learning system. The key idea was to reward humans whenever they identify cases where the machine fails, while also being confident that it provides the correct answer. In other words, we encouraged humans to find "unexpected" errors, not just cases where naturally the machine was going to be uncertain.

As an example case, consider a system that detects adult content on the web. Our baseline machine learning system had an accuracy of ~99%. Then, we asked Mechanical Turk workers to do the following task: Find web pages with adult content that the machine learning system classifies as non-adult with high confidence. The humans had no information about the system, and the only thing they can do was to submit a URL and get back an answer.

The reward structure was the following: Humans get \$1 for each URL that the machine misses, otherwise they get \$0.001. In other words, we provided a strong incentive to find problematic cases.

After some probing, humans were quick to uncover underlying vulnerabilities: For example, adult pages in Japanese, Arabic, etc., were classified by our system as non-adult, despite their obvious adult content. Similarly for other categories, such as hate speech, violence, etc. Humans were quickly able to "beat the machine" and identify the "unknown unknowns".

Simply told, humans were able to figure out what are the likely cases that the system may have missed during training. At the end of the day, the training data is provided by humans, and no system has access to all possible training data. We operate in an "open world" while training data implicitly assume a "closed world".

As we see from the AlphaGo example, since most machine learning systems rely on existence of training data (or some immediate feedback for their actions), machines may get into problems when they have to face examples that are unlike any examples they have processed their training data.

We designed our Beat The Machine system to encourage humans to discover such vulnerabilities early.

In a sense, our BTM system is s like hiring hackers to break into your network, to identify security vulnerabilities before they become a real problem. The BTM system applies this principle for machine learning systems, encouraging a period of intense probing for vulnerabilities, before deploying the system in practice.

Well, perhaps Google hired Lee Sedol with the same idea: Get the human to identify cases where the machine will fail, and reward the human for doing so. Only in that case, AlphaGo managed to eat its cake (figure out a vulnerability) and have it too (beat Lee Sedol, and not pay the \$1M prize) :-)

Monday, February 29, 2016

A Cohort Analysis of Mechanical Turk Requesters

In my last post, I examined the number of "active requesters" on Mechanical Turk, and concluded that there is a significant decline in the numbers over the last year. The definition of "active requester" was: "A requester is active at time X if he has a HIT running at time X". A potential issue with this definition is that an improvement in the speed of HIT completion (e.g., due to increased labor supply) could drive down that number.

For this reason, I decided to perform a proper cohort analysis for the requesters on Mechanical Turk.  In the cohort analysis that follows, we will examine how many requesters that have first appeared in the platform on a given month (say September 2015), are still posting tasks in the subsequent months.

Here is the resulting "layer cake plot" that indicates that happens in each cohort. Each of the layers corresponds to requesters that were first seen on a given month. (code, data) (Read this post, if want a  little bit more background on how the plot should "look like".)

For example, the bottom layer corresponds to all the requesters that were first seen on May 2014 (the first month that the new version of MTurk Tracker started collecting data). We can see that we had ~2700 "new" requesters on that month. (The May-2014 cohort obviously contains all prior cohorts in our dataset, as we do not know when these requesters really started posting.) Out of these requesters, approximately 1700 also posted a task on June 2014 or later, approximately 1000 of these have posted a task on March 2015 or later, and approximately 500 have posted a task on February 2016.

The layer on top (slightly darker blue) illustrates the evolution of the June 2014 cohort. By stacking them on top of each other, we can see the composition of the requesters that have been active in every single month.

As the plot makes obvious, until March 2015, the acquisition of new requesters every month was compensating for the requesters that were lost from the prior cohorts. However, starting March 2015, we start seeing a decline in the overall numbers, as the total decline in requesters from prior cohorts dominates the acquisition of new requesters. So, the cohort analysis supports the conclusions of the prior post, as the trends and conclusions are very similar (always good to have a few robustness checks).

Of course, a more comprehensive cohort analysis would also analyze the revenue generated by each cohort, and not just the number of active users. That requires a little bit more digging in the data, but I will do that in a subsequent post.

Friday, February 26, 2016

The Decline of Amazon Mechanical Turk

It seems that after years of neglect, Mechanical Turk starts losing its appeal. In our latest measurement, we see Mechanical Turk losing 50% of its requesters in a YoY measurement.

A few days ago,  Kristy Milland (aka SpamGirl) asked me if there is a way to see the active requesters on Mechanical Turk over time. I did not have this dashboard on Mechanical Turk tracker, but it was an important metric, so I decided to add it in the MTurk Tracker website.

So, now MTurk Tracker has a tab called "Active Requesters" which shows how many requesters are "active" on Mechanical Turk at any given time. The definition of "Active at time X" means "had a task that was running on MTurk before time X and after time X".

Here is the chart for the active requesters between Jan 1, 2015 and February 28, 2016: 

As you can see, starting March 2016 (that is before the announcement of price increases), we see a decline in the number of active requesters. Interestingly, when the fee increases are announced, we see a small "valley" around the period of fee increases. The numbers remain stable until November, but after that we see a steady decline.

Overall, we observe a YoY decline of almost 50% in terms of active requesters.

What is driving the decline? Hard to tell. Perhaps requesters abandon crowdsourcing in favor of more automated solutions, such as deep learning. Perhaps requesters with long running jobs build their own workforce (eg using UpWork). Perhaps they use alternative platforms, such as Crowdflower. Or perhaps my own metric is flawed, and I need to revise it.

But, unless we have a bug in the code, the future does not seem promising for Mechanical Turk. And this is a shame.

Wednesday, June 10, 2015

An API for MTurk Demographics

A few months back, I launched demographics.mturk-tracker.com, a tool that runs continuously surveys of the Mechanical Turk worker population and displays live statistics about gender, age, income, country of origin, etc.

Of course, there are many other reports and analyses that can be presented using the data. In order to make easier for other people to use and analyze the data, we now offer a simple API for retrieving the raw survey data.

Here is a quick example: We first call the API and get back the raw responses:

In [1]:
import requests
import json
import pprint
import pandas as pd
from datetime import datetime
import time

# The API call that returns the last 10K survey responses
url = "https://mturk-surveys.appspot.com/" + \
resp = requests.get(url)
json = json.loads(resp.text)

Then we need to reformat the returned JSON object and transform the responses into a flat table

In [2]:
# This function takes as input the response for a single survey, and transforms it into a flat dictionary
def flatten(item):
    fmt = "%Y-%m-%dT%H:%M:%S.%fZ"
    hit_answer_date = datetime.strptime(item["date"], fmt)
    hit_creation_str = item.get("hitCreationDate")
    if hit_creation_str is None: 
        hit_creation_date = None 
        diff = None
        hit_creation_date = datetime.strptime(hit_creation_str, fmt)
        # convert to unix timestamp
        hit_date_ts = time.mktime(hit_creation_date.timetuple())
        answer_date_ts = time.mktime(hit_answer_date.timetuple())
        diff = int(answer_date_ts-hit_date_ts)
    result = {
        "worker_id": str(item["workerId"]),
        "gender": str(item["answers"]["gender"]),
        "household_income": str(item["answers"]["householdIncome"]),
        "household_size": str(item["answers"]["householdSize"]),
        "marital_status": str(item["answers"]["maritalStatus"]),
        "year_of_birth": int(item["answers"]["yearOfBirth"]),
        "location_city": str(item.get("locationCity")),
        "location_region": str(item.get("locationRegion")),
        "location_country": str(item["locationCountry"]),
        "hit_answered_date": hit_answer_date,
        "hit_creation_date": hit_creation_date,
        "post_to_completion_secs": diff
    return result

# We now transform our API answer into a flat table (Pandas dataframe)
responses = [flatten(item) for item in json["items"]]
df = pd.DataFrame(responses)

We can then save the data to a vanilla CSV file, and see how the raw data looks like:

In [3]:
# Let's save the file as a CSV

!head -5 data/mturk_surveys.csv

0,male,2015-06-10 15:57:23.072000,2015-06-10 15:50:23,"$25,000-$39,999",5+,kochi,IN,kl,single,420.0,4ce5dfeb7ab9edb7f3b95b630e2ad0de,1992
1,male,2015-06-10 15:57:01.022000,2015-06-10 15:35:22,"Less than $10,000",4,?,IN,?,single,1299.0,cd6ce60cff5e120f3c006504bbf2eb86,1987
2,male,2015-06-10 15:21:53.070000,2015-06-10 15:20:08,"$60,000-$74,999",2,?,US,?,married,105.0,73980a1be9fca00947c59b93557651c8,1971
3,female,2015-06-10 15:16:50.111000,2015-06-10 14:50:06,"Less than $10,000",2,jacksonville,US,fl,married,1604.0,a4cdbe00c93728aefea6cdfb53b8c489,1992

Or we can take a peek at the top countries:

In [4]:
# Let's see the top countries
country = df['location_country'].value_counts()
US    5748
IN    1281
CA      30
PH      22
GB      16
ZZ      15
DE      14
AE      11
BR      10
RO      10
TH       7
AU       7
PE       7
MK       7
FR       6
IT       6
NZ       6
SG       6
RS       5
PK       5
dtype: int64

I hope that the examples are sufficient to get people started using the API, and I am looking forward to see what analyses people will perform.