Saturday, September 22, 2007

The Price of Privacy (comScore edition)

The data from comScore are used extensively to analyze trends about internet companies, and are also used by academic researchers when doing web research and need user behavior data. The breadth of information that comScore captures about the users is breath-taking. Almost all the clicks, URLs, and queries submitted by a user are available to researchers that get access to the collected data. Even though comScore does not release any personally identifiable information about its panelists, it is not impossible to infer a lot about each individual user by looking at their Internet behavior. (The examples from the AOL query log are plenty.)

So, I have been wondering what comScore offers to the users to convince them to participate in the comScore panel. According to comScore, the benefits that their panelists get are:
  • Security software applications such as server-based virus protection, remote data storage, encrypted local storage, Internet history removal
  • Attractive sweepstakes prizes
  • Opportunity to impact and improve the Internet
Still, I thought that this cannot be everything. The offerings were not at all lucrative to convince someone to release so much personal information. Today, though, I got my answer. Someone called me from comScore and after asking me a long list of demographic-related questions (to which I lied systematically), I was offered to be one of the panelists for comScore, and have all my internet behavior tracked and recorded, so that I can "have the opportunity to impact and improve the Internet" (yeah!). Then the coveted lucrative prize was revealed: $25 when I download the software and $5 per month afterwards, (plus the benefits listed above).

So, now you know. Your privacy is worth a couple of double espressos per month. You have already surrendered for free everything about you to Google researchers. Now you can get paid $5/month to have the rest of the world to look at your Internet habits.

Ambiguous First Names and Disambiguation

I was preparing an assignment for my class, trying to introduce students to issues of data quality, and I was using Facebook data for this.

As a simple example, I wanted students to find automatically the gender of a person, given only the first name, since 1/3 of the Facebook users do not list their gender. (The "homework motivation" was the need to send letters to customers, and we need to decide whether to put "Dear Mr." or "Dear Ms." as a greeting.) In general, the task is relatively easy and the majority of the names are not ambiguous. However, there is a set of highly ambiguous names, for which inference based on first name is problematic. For your viewing pleasure, the most ambiguous first names, together with the confidence that the name belongs to a male:

Ariel 50.00%
Yang 50.00%
Kiran 50.00%
Nikita 50.00%
Casey 49.30%
Min 46.67%
Paris 53.85%
Dorian 53.85%
Adi 45.45%
Kendall 45.45%
Quinn 54.55%
Aubrey 54.55%
Sunny 44.83%
Angel 55.32%
Yan 41.67%
Yi 41.67%
Yu 58.33%
Devon 59.46%
Nana 40.00%
Jin 38.89%
Ji 38.46%
Ming 61.54%
Taylor 37.80%
Rory 62.50%
Carey 36.36%
Sami 63.64%
Robin 34.55%
Ali 34.45%
Jean 34.09%

The next part of the homework, motivated by the ambiguity for some of the first names, asks students to guess the gender of a person based on the other stated preferences on Facebook profiles, regarding movies, books, TV shows and so on.

Based on the analysis of these features, women favor overwhelmingly the books "Something Borrowed," "Flyy Girl," "Good In Bed," "The Other Boleyn Girl," "Anne Of Green Gables", the movie "Dirty Dancing" and they like dancing as an activity.

On the other hand, characteristics that are unique to men are movies like "Terminator 2," "Wall Street," "Unforgiven," "The Good the Bad and the Ugly," "Seven Samurai"; the book "Moneyball"; sports-related activities (baseball, lifting) and sports-related TV shows (e.g., PTI, Sportscenter, Around the Horn). Another distinguishing feature of men is that they list "women" and "girls" as their interests (and in this case they should also think about taking perhaps some dancing lessons :-)

Friday, September 7, 2007

Experiences using Amazon Mechanical Turk

Everyone who has ever done web-related research knows that user studies and obtaining user evaluations is one of the most frustrating parts of the process. We often need to create data sets with annotated data, users have to look at the results of the proposed systems and evaluate their quality, and so on. Commonly, we tend to rely on "volunteers" for such tasks, i.e., asking friends and colleagues to participate in the experiment, doing mundane, repetitive and tedious work. Typically, after the "volunteer" has participated in one experiment it was extremely difficult to be convinced to participate in another one. (PhD students are the exception --- they typically have no choice.) The whole process can easily take 2-3 weeks just to gather enough data, plus it is unclear how accurate are the results that are gathered by such overused human subjects.

When I joined Stern, I was glad to find out about the "behavioral lab", which made recruiting of users much easier. We just have to specify on a web form the target demographics and then a set of 1000 registered (real) volunteers are notified. This facilitates greatly the process and makes sure that the participating users are indeed willing to participate in the experiments. Still though, the process is tiring as someone has to wait in the lab for the users, give directions, pay the participants, and so on.

One interesting alternative is Amazon Mechanical Turk, a service introduced by Amazon in November 2005. Mechanical Turk allows requesters to post a large number of small tasks, and pay a small amount to the person who completes the task. Examples of such tasks include:
  • can you see a person in the photo?
  • is the document relevant to a query?
  • is the review of this product positive or negative?
We have started experimenting with Mechanical Turk early in 2007, to complete various tasks. I was truly surprised with the speed that "Turkers" complete these tasks. Asking users to annotate 150 articles took less than 2 hours. Clearly, much faster than traditional techniques.

One of the problems that we faced was the uncertainty about the validity of the submitted answers. We had no way to ensure that an answer submitted by an "Turker" was a carefully thought answer or just a random response.

To avoid this problem, we decided to get multiple, redundant answers for the same question. To be able to decide about statistical significance, we asked for 5 answers for the same question. We marked an answer as correct only if at least 4/5 answers agreed. Furthermore, to discourage users from submitting random responses, we clarified in the instructions that we will pay only for submissions that agree with the responses submitted by other annotators. This followed the spirit of the ESP game by Luis von Ahn, and ensured that the Turkers had the appropriate incentives to submit correct answers. Even though this approach increases the cost, it ensures that the received answers are consistent and the level of noise is low.

A second approach for minimizing noise in the answers is the use of the "qualification tests." Instead of letting users submit directly answers, we wanted to see if they are competent enough to participate in these experiments. For example, we had a task where we were soliciting relevance feedback for sets of queries and documents. To make sure that the users follow the instructions, we asked users to submit their answers for already labeled query-document pairs (in this case, the pairs were coming from TREC collections). We also required annotators to retake the qualification tests if they wanted to mark a large number of query-document pairs. This ensured that annotators that submit a large number of evaluations also pass a proportionally larger number of qualification tests. However, the presence of qualification tests slows down the process by a factor of 3 to 4. (Nothing comes for free :-)

Overall, our experience with Mechanical Turk has been very positive. The interface is very clean, easy to program, and the answers come back very quickly. It is not uncommon to send thousands of requests in the evening and have all the results ready by the following morning.

Now, let's see if the reviewers will like the results :-)

Update (Nov 24, 2007): Our first paper that uses Amazon Mechanical Turk for conducting the experimental evaluation has been accepted at IEEE ICDE 2008 and is now available online. Hopefully, more will follow soon.