Of course, there are many other reports and analyses that can be presented using the data. In order to make easier for other people to use and analyze the data, we now offer a simple API for retrieving the raw survey data.
Here is a quick example: We first call the API and get back the raw responses:
In [1]:
import requests
import json
import pprint
import pandas as pd
from datetime import datetime
import time
# The API call that returns the last 10K survey responses
url = "https://mturk-surveys.appspot.com/" + \
"_ah/api/survey/v1/survey/demographics/answers?limit=10000"
resp = requests.get(url)
json = json.loads(resp.text)
Then we need to reformat the returned JSON object and transform the responses into a flat table
In [2]:
# This function takes as input the response for a single survey, and transforms it into a flat dictionary
def flatten(item):
fmt = "%Y-%m-%dT%H:%M:%S.%fZ"
hit_answer_date = datetime.strptime(item["date"], fmt)
hit_creation_str = item.get("hitCreationDate")
if hit_creation_str is None:
hit_creation_date = None
diff = None
else:
hit_creation_date = datetime.strptime(hit_creation_str, fmt)
# convert to unix timestamp
hit_date_ts = time.mktime(hit_creation_date.timetuple())
answer_date_ts = time.mktime(hit_answer_date.timetuple())
diff = int(answer_date_ts-hit_date_ts)
result = {
"worker_id": str(item["workerId"]),
"gender": str(item["answers"]["gender"]),
"household_income": str(item["answers"]["householdIncome"]),
"household_size": str(item["answers"]["householdSize"]),
"marital_status": str(item["answers"]["maritalStatus"]),
"year_of_birth": int(item["answers"]["yearOfBirth"]),
"location_city": str(item.get("locationCity")),
"location_region": str(item.get("locationRegion")),
"location_country": str(item["locationCountry"]),
"hit_answered_date": hit_answer_date,
"hit_creation_date": hit_creation_date,
"post_to_completion_secs": diff
}
return result
# We now transform our API answer into a flat table (Pandas dataframe)
responses = [flatten(item) for item in json["items"]]
df = pd.DataFrame(responses)
df["gender"]=df["gender"].astype("category")
df["household_income"]=df["household_income"].astype("category")
We can then save the data to a vanilla CSV file, and see how the raw data looks like:
In [3]:
# Let's save the file as a CSV
df.to_csv("data/mturk_surveys.csv")
!head -5 data/mturk_surveys.csv
Or we can take a peek at the top countries:
In [4]:
# Let's see the top countries
country = df['location_country'].value_counts()
country.head(20)
Out[4]:
I hope that the examples are sufficient to get people started using the API, and I am looking forward to see what analyses people will perform.