## Wednesday, June 10, 2015

### An API for MTurk Demographics

A few months back, I launched demographics.mturk-tracker.com, a tool that runs continuously surveys of the Mechanical Turk worker population and displays live statistics about gender, age, income, country of origin, etc.

Of course, there are many other reports and analyses that can be presented using the data. In order to make easier for other people to use and analyze the data, we now offer a simple API for retrieving the raw survey data.

Here is a quick example: We first call the API and get back the raw responses:

In [1]:
import requests
import json
import pprint
import pandas as pd
from datetime import datetime
import time

# The API call that returns the last 10K survey responses
url = "https://mturk-surveys.appspot.com/" + \
resp = requests.get(url)
json = json.loads(resp.text)

Then we need to reformat the returned JSON object and transform the responses into a flat table

In [2]:
# This function takes as input the response for a single survey, and transforms it into a flat dictionary
def flatten(item):
fmt = "%Y-%m-%dT%H:%M:%S.%fZ"

hit_creation_str = item.get("hitCreationDate")

if hit_creation_str is None:
hit_creation_date = None
diff = None
else:
hit_creation_date = datetime.strptime(hit_creation_str, fmt)
# convert to unix timestamp
hit_date_ts = time.mktime(hit_creation_date.timetuple())

result = {
"worker_id": str(item["workerId"]),
"location_city": str(item.get("locationCity")),
"location_region": str(item.get("locationRegion")),
"location_country": str(item["locationCountry"]),
"hit_creation_date": hit_creation_date,
"post_to_completion_secs": diff
}
return result

# We now transform our API answer into a flat table (Pandas dataframe)
responses = [flatten(item) for item in json["items"]]
df = pd.DataFrame(responses)
df["gender"]=df["gender"].astype("category")
df["household_income"]=df["household_income"].astype("category")

We can then save the data to a vanilla CSV file, and see how the raw data looks like:

In [3]:
# Let's save the file as a CSV
df.to_csv("data/mturk_surveys.csv")



,gender,hit_answered_date,hit_creation_date,household_income,household_size,location_city,location_country,location_region,marital_status,post_to_completion_secs,worker_id,year_of_birth
0,male,2015-06-10 15:57:23.072000,2015-06-10 15:50:23,"$25,000-$39,999",5+,kochi,IN,kl,single,420.0,4ce5dfeb7ab9edb7f3b95b630e2ad0de,1992
1,male,2015-06-10 15:57:01.022000,2015-06-10 15:35:22,"Less than $10,000",4,?,IN,?,single,1299.0,cd6ce60cff5e120f3c006504bbf2eb86,1987 2,male,2015-06-10 15:21:53.070000,2015-06-10 15:20:08,"$60,000-$74,999",2,?,US,?,married,105.0,73980a1be9fca00947c59b93557651c8,1971 3,female,2015-06-10 15:16:50.111000,2015-06-10 14:50:06,"Less than$10,000",2,jacksonville,US,fl,married,1604.0,a4cdbe00c93728aefea6cdfb53b8c489,1992

Or we can take a peek at the top countries:

In [4]:
# Let's see the top countries
country = df['location_country'].value_counts()

Out[4]:
US    5748
IN    1281
CA      30
PH      22
GB      16
ZZ      15
DE      14
AE      11
BR      10
RO      10
TH       7
AU       7
PE       7
MK       7
FR       6
IT       6
NZ       6
SG       6
RS       5
PK       5
dtype: int64

I hope that the examples are sufficient to get people started using the API, and I am looking forward to see what analyses people will perform.

## Monday, June 8, 2015

### Postdoc Position for Quality Control in Crowdsourcing

The Center for Data Science at NYU invites applications for a post-doctoral fellowship in statistical methodology relating to evaluating rater quality for a new research program in the application of crowdsourcing ratings of human speech production.

Duties and Responsibilities: This is a two-year postdoctoral position in the affiliated with the NYU Center for Data Science. The successful candidate will join a dynamic group of researchers in several NYU Centers including PRIISM, MAGNET, the Stern School of Business, the NYU Medical School and the Department of Communicative Sciences and Disorders. We are seeking highly motivated individuals to develop and test novel statistical and computational methods for evaluating rater quality in crowdsourced tasks. Responsibilities will include development, testing and implementation of statistical algorithms, as well as preparation of manuscripts for academic publication. Advanced knowledge of R is preferred.

Position Qualifications: Candidates will ideally have a doctoral degree in Statistics, Biostatistics, Data Science, Computer Science, or a related field, as well as genuine interests and experiences in interdisciplinary research that integrates study of human speech, citizen science games and computational statistics. Candidates will ideally have expertise in the following areas: Bayesian statistics, numerical methods and techniques, psychometrics and/or knowledge of programming languages. Outstanding computing and communication skills are required.

Please send CV, letter of intent, and three reference letters to Daphna Harel  (daphna dot harel at nyu dot edu) by July 31, 2015.

The position is for 2 years (subject to good research progress). The successful candidate will be based at the NYU Center for Data Science, under the primary supervision of NYU faculty members Panos Ipeirotis and Daphna Harel, and will closely work with a multidisciplinary team including NYU faculty members Tara McAllister Byun, R. Luke DuBois, and Mario Svirsky. The position will preferably start by September 2015 (start date negotiable).

## Friday, May 29, 2015

### The World Bank Report on Online Labor

I am often asked about statistics and data about the global population of "crowdsourcing" workers, going beyond Mechanical Turk. I am happy to say that from now on I will be able to point everyone to a study from The World Bank, which I was fortunate to participate. The reports examines the global landscape of online labor, identifying the opportunities, and providing statistics about the global landscape.

The study will be officially released on Wednesday June 3rd, and for those of you willing to attend the launch event through Webex, here is the information:

---
When
Wednesday, June 3, 2015, 9:00AM - 11:30AM EDT

Where:
Webex URL
Meeting number: 730 125 194
Audio connection: 1-650-479-3207 Call-in toll number (US/Canada)
Access code: 730 125 194

Title:
The New Online Outsourcing Approach for Jobs, Youth and Women's Empowerment and Services Exports

Abstract
This event will discuss the new online outsourcing (OO) phenomena in the world today, its implications for developing countries, and how your clients can leverage it as an innovative approach for jobs, youth employment and women's empowerment.

OO refers to the contracting of third-party workers and providers (often overseas) to supply services or perform tasks via Internet-based marketplaces or platforms. Also known as paid crowdsourcing, online work, microwork and other names - these technology-mediated channels allow clients to outsource their paid work to a large, distributed, global labor pool of remote workers, to enable performance, coordination, quality control, delivery, and payment of such services online.

The global OO marketplace today includes numerous emerging and growing platforms; such as Upwork (formerly Elance-oDesk), Crowdflower, CloudFactory, Amazon Mechanical Turk, etc. There are also wide variety of services that can be performed online - such as data entry, digitization, graphics rendering and design, programming and apps development, accounting and legal services, etc. Workers in developing countries can have access and perform jobs from all over the world - as long as they have computer and Internet access. In addition to jobs and income - OO offers workers flexible time and working environment, develop skills for professional, and drive positive social change for youth and women.

The event will share with participants the OO study that covers comprehensively the definition and segments, trends and market size, economic and non-financial impact on workers, and the implications and policy recommendations. In addition the event will show how u can apply the online toolkit to assess the readiness of your client countries for OO.

The World Bank's ICT Unit is excited to share this new global study and toolkit, which was developed in partnership with the Rockefeller Foundation and Dalberg Global Development Advisors.

Who:
• Chair: Mavis Ampah, Lead ICT Policy Specialist and Practice Lead on Jobs, GTIDR
• Siou Chew Kuek, Senior ICT Specialist and TTL, GTIDR
• Cecilia Paradi-Guilford, ICT Innovation Specialist and Co-TTL, GTIDR
• Saori Imaizumi, ICT Innovation and Education Consultant, GTIDR

## Monday, April 6, 2015

### Demographics of Mechanical Turk: Now Live! (April 2015 edition)

One of the most common question that I receive is whether I have new data about the demographics of Mechanical Turk workers. The latest data that I had collected were back in 2010, and it was not clear how things have changed since then. The key problem was not that I could not run additional surveys; that would have been trivial. However, the results of the surveys were always changing over time: the aggregate data varied too much across surveys, so I refrained from publishing data that seemed to be unreliable.

So, I thought of how I tackle two problems at once:
• Make it easy for people to see current data about the demographics of Mechanical Turk workers
• Make it easy to understand the inherent variability of the collected data, and potentially understand the source of the variability
For that reason, we built a new site:

The site displays live data about the demographics of the workers, based on a small 5-question survey that users are asked to answer (paying 5 cents for each). To be able to capture the time variability, we post one survey every 15 mins, allowing us to observe changes in the answers over time. We also restrict each worker to be able to answer the survey only once per month.

A few key results:

Country

Overall, we see that approximately 80% of the Mechanical Turk workers are from the US and 20% are from India.

However, this mix is not stable during the day. Around 8-10am UTC (ie 3am NYC time, 1.30pm India time), there is much higher number of workers from India (~50%), which then goes down to 5% at 8-10pm UTC.

The gender participation seems to be balanced, with roughly 50% males and 50%. The charts that examine variability based on hour of day and day of the week do not show any change in this pattern.

Roughly 50% of the workers are born in the 1980's and are around 30 yrs old. Approximately 20% of the workers are born in the 1990's, and another 20% are born in the 1970's.

Marital Status

Approximately 40% of the workers are single, 40% are married, and 10% are cohabitating.

Household Size

Approximately 15% live alone. Then 25% have a household size of two and 25% have a household size of three. Around 25% live in a household of four, and around 10% have five or more members in their household.

Income level

The median household income is around \$50K per year for US Turkers, which is on par with the median US household income. Indian workers have considerably lower household income, with most of them being around \$10K/yr.

Next steps

In our next steps, we plan on making the (anonymized) survey responses available through an API, and potentially add a few more graphs of interest. If you have any idea or suggestion, please send it my way.

## Monday, June 9, 2014

One of the components that I use in my class is student presentations.

While I like having students present, I had always a hard time grading the presentations. Plus, many students seemed to target the presentation to me, trying to sound too technical and advanced, leaving the audience in the class bored and uninterested.

For that reason, I adopted a peer-grading scheme. Students have to present to the class, and get rated by the class, and not me. (Although, I still reserve a small degree of editorial judgement for assigning the grades.) Here is how my scheme works, after a few years of experience.
1. Rating scale: Students assign a grade from 0 to 10 to the presentations.
2. No self-grading: Students do not grade their own presentations. (Early on, there were students that were assigning 10 to themselves, and lower grade to everyone else. Now they can still grade themselves if they want but the grade is ignored.)
3. Normalization: All assigned grades are normalized, to have a zero mean and one standard deviation. (This normalization was introduced to fight the problem where a student would try to game the system by assigning low grades to everyone else, hoping to lower the average rating of all other students.)
4. Grade assignment: The presentation grade is the average of the assigned normalized scores. Formally, each student $s_i$ assigns to presentation $t$ a grade $z(s,t)$. The overall grade of the presentation is the mean value $E[z(*,t)]$ of the $z(s_i,t)$ grades.
5. Ensuring careful grading by asking students to estimate class rating: One problem with the peer grading scheme was that many students did not take it seriously enough, and assigned random grades (typically, the same grade to everyone). To avoid indifferent grading, I decided to give credit (~10%) based on the correlation of the assigned grades $z(s,t)$ against the mean value $E[z(*,t)]$ (across all presentations $t$). This ensured that students will at least try to figure out what other students will assign to the presentation, and will not assign random grades.
6. Separate assigned and estimated grades: The problem with introducing the requirement to agree with the class was that some students believed to be better assessors than the rest of the class. So, they felt that their own grade was the correct one, and did not like losing credit for assigning their own "true" grade. To address that issue, I now ask students to assign two grades: their own grade $z_p(s,t)$, and an estimate of the class grade $z_c(s,t)$. The personal grade $z_p$ is used to compute $E(z(*,t)]$ in Step 4, and I use the $z_c$ to compute the correlation in Step 5.
7. Examine self-grading: Given that the class-estimate grades are not directly used to grade a presentation, students are also asked to provide an estimate of their own grade as part of Step 6. Effectively, students are encouraged to estimate properly their own grade.
The only thing that I have not tried to far is to modify Step 4 to take into consideration the different correlations from Step 5, effectively weighting each student's grades based on their correlation with the rest of the class. However, most students tend to exhibit the same, moderate agreement with the class (typical correlation values are in the 0.4-0.6 range, after rating 15-20 presentations), so in practice I do not expect to see a difference.

Overall, I am pretty happy with the scheme. Students indeed try to impress the class (and not me), and many presentations are interesting, interactive, and engaging. The grades are also very consistent with the overall feeling that I get for each presentation, so I did not have to practice my "editorial oversight" and adjust the grade very often (only in a couple of cases, where the students ran into technical problems during the presentation). I would be really interested to try this scheme in one of the big MOOC classes that use peer grading, and see if it can instill the same sense of responsibility in peer grading.