Wednesday, March 19, 2008

Mechanical Turk: The Demographics

Update: The results in this blog post are now obsolete. Please read the results of the new survey.

One of the common misbeliefs about Mechanical Turk is that it is a virtual sweatshop, essentially taking advantage of poor people in third world countries that are doing tedious tasks for pennies. Therefore, many people are afraid of outsourcing research tasks on Mechanical Turk, being afraid that the results will be either of very poor quality, or they will not be representative of the actual U.S. population.

Those who read the previous, qualitative survey about Mechanical Turk would have realized that the profile of the typical Turker is not of a person that completes tasks for a living in a developing country. Instead, Mechanical Turk tends to be often a replacement for TV, or simply something to spend some free time and get some spare cash in reward.

The next survey that I conducted focused more on the demographics of the Turkers. Are they uneducated, unemployed people with no income? Well, as you will see below, the Turkers are a pretty representative sample of the online population, perhaps with a slight bias towards females and towards young participants. (See a detailed comparison on how the demographics of Mechanical Turk users compare to general demographics of Internet users)

Let's see the main results!

First, I would start with the country breakdown.

United States76.25%
India8.03%
United Kingdom3.34%
Canada2.34%

The clear result is that most of the participants are coming from the US and not from a third world country, despite the common misconception. This is due to the fact that in order to get paid, someone has to have a US bank account, or be willing to be paid using Amazon gift certificates.

Then, the gender breakdown:

As you can see, there are slightly more females than males. I do not have a definite reason yet, but I get a feeling that females are less inclined to "waste time" and find that if they can exploit their spare time to get a little bit of income, then they would do it.

Next, the age distribution:

Not surprisingly, many young people participate on Mechanical Turk, mainly as a way to get some extra cash and to be able to drive their car, get some items from Amazon and so on. (The Mechanical Turk payment can be either deposited in a US bank, or be given as an Amazon gift certificate.)

And what about education?

Turkers are a pretty representative sample. Most of them have a college education, and some of them even have PhDs! In fact, the distribution seems pretty similar to the distribution for the overall US population.

Similarly, the income distribution also follows closely the income distribution in the US:

Finally, why do people participate in Mechanical Turk? From the qualitative survey, you could see that most of the participants mention money, one way or another. However, very few participate only for the money. (See also the detailed responses.) Here is the breakdown of the responders when they had to choose (not exclusively) between the choices "for money," "for fun," and "for killing time":

I hope that the results above shed some light. I have to thank my student Beibei for preparing and running the survey for me. The next steps now are to present the results of the qualitative survey in a coded/tabulated manner, and to give more details about the different tasks that we had run on Mechanical Turk and the lessons that we learned.

If you have any more questions that you would like to see answered, let me know!

See also

Thursday, March 13, 2008

Why People Participate on Mechanical Turk?

This question comes up often when I describe the tasks that Turkers complete on Mechanical Turk. Therefore, I decided to run a set of studies that will aim to answer this question. Instead of trying to impose my own interpretation, I will let the Turkers speak for themselves. I posted the following question on Mechanical Turk:

Why do you complete tasks in Mechanical Turk? Please describe the reasons that motivate you for completing tasks on Amazon Mechanical Turk. Do you do this for the monetary awards? For killing time? Do you consider the tasks fun? What discourages you from completing tasks? What attracts you to participate in a particular task?

The Turkers were then asked to simply answer this question in an essay-like manner, and they got 10 cents for their answer. I list below some of the answers. I will not try to tabulate and analyze them in this post. (This will be done in a separate posting later. You can check this post for the summarized results.) You can get some further insight by reading the responses below.

I'm on Mechanical Turk for two reasons. First, it kills time when I'm bored and restless. Second, I like making money! Sure, it's just a little here and there, but the money does add up. If I'm going to be sitting around being bored, I might as well be working on HITs and making a little cash. The tasks I don't like are tasks that have many, complicated steps. I also don't like tasks that require a lot of work for only a small amount of money. In addition, I won't complete tasks that require me to install a program on my computer. That's just too risky, in my opinion. I'm most attracted to tasks that are surveys or web research. I love giving my opinion on something and participating in market research. I also like searching for answers using the Internet, so web research tasks are my favorite. Of course, I like most any task that pays well that I am able to complete.

I complete tasks primarily for the money. I don't mind doing quick penny hits, but I really like writing the larger articles. Discouraging is when I have to do many things to earn a dime - like the 2 cent category summary things. Or - when people don't pay for weeks at a time. Or - when I cannot block stupid hits such as the maps of Pakistan - like anyone has maps of those villages? They are on forever, it's hard to search past things like that, and they are a waste of my energy. Attractive? Simple quick tasks like this survey or penny hits that require quick judgement like the venue hits. High rewards for completable tasks are nice too. Thanks.

I have a lot of down time at work waiting for various things to complete. Used to play little games and such - this way I can make some money that I won't feel guilty about spending. I do hits I can either do quickly, or that I can leave and come back to with time to spare. I do HITs based on time to work vs money, but the more interesting or well organized the HIT, the less it needs to pay to get my attention.

I do it for the monetary awards, but I also think it's a good way to kill time. I have a lot of down time at my office. Some of the tasks are fun...I like word puzzles and things like that. I like the tasks that are all written out on the Turk screen, I don't like to have to do a lot of clicking around or uploading my work. I like tasks that are easy enough that I don't have to do a lot of research.

I complete tasks as a way to kill downtime at work, but also as a way to earn a little extra spending money. In order for me to undertake a task, it has to be either interesting, or very easy to do. Additionally, it has to pay a reasonable amount for the work, since others are effectively prospering off the discounted labor being provided. Even by Mechanical Turk standards, paying 1 cent for a full minute of work is not acceptable. I particularly like tasks which leverage my personal skills or knowledge, because it means that the work is more specialized and thus likely to pay better than a task that relies merely on mechanically performing a simple task.

My primary motivation is the money, with entertainment being second. I like tasks that are doable quickly, and try to estimate the amount of money I can earn per minute before accepting a task. I am attracted by high payouts and to clients known to pay quickly.

I do this for the monetary awards. Some of the tasks are fun and some are boring. Sometimes I don't even know why I'm doing what I'm doing. I like short tasks so I usually won't accept a long one. I look for tasks that look easy or a longer one that looks fun. The price attracts me also. I'm not going to sit there for 5-10 minutes tagging street signs for 3 cents.

I do it mostly for money...it is a fun and easy way to earn a little extra. I found most of the tasks fun, while some of the other ones are just a waste of time. I get discouraged from completing a task when it is too low pay for the work involved and when there is no clear instruction from the requester. I am interested in HITs that require creative thinking and the ones that require original answers as opposed to just a simple web search. Thanks.

I am a retired senior citizen on a limited income. I have been Turking for a little over a year now. I have found it to be an enjoyable way to occupy some of my time, and to add a bit to my monthly income for the extras I might not have with just my normal retirement income. The extra income becomes even more important now with higher gas prices, and the grocery bill becoming more costly each week. I am not as computer wise as I should be to complete some tasks, but seem to find enough of them I can do to make my time worthwhile. Turking is much better for the mind than watching the TV for hours at a time.

I complete tasks on Amazon Mechanical Turk for the money. I don't accept the hits that make you do a lot of thinking. This is just a part time job and I have been thinking all day and want to relax. I want to do the hits that are quick.

I complete tasks mostly for the fun of it. It's also kind of a challenge to see how much money I might possibly make. I'm not expecting to make tons--but it's nice to make a little for doing pretty much nothing. I get discouraged when there are lots of specific directions. I like tasks that are quick and easy!

The reason I complete tasks on MTurk is because they are a good way to waste time while making little money in the process. For example whenever I am at the library waiting for sports practices and have some time to spare I just log onto mturk.com and make a little bit of cash. I only do hits that are easy, usually ranging from 0.01-0.20 cents and try to get as many done as I can in the time I have.

I am currently unemployed and so am almost a full-time Turker. Although the rewards are rarely great, they build up rather quickly over time. I prefer to complete tasks that pay greater than 10 cents per task. I enjoy those that involve data entry or rewriting sentences-paragraphs. Tasks that only pay 1 cent are OK if they are extremely quick, otherwise they are a waste of time.

Reasons that would motivate me for completing any tasks are: 1st - How much it would pay me 2nd - How much time I am going to spend to finish the task and 3rd if the HIT/Task is worth the money I will get. I don't really do it to earn because I'd probably go broke before I earn a lot turking but I wouldn't do any task that would cost me so much of my time but would pay like .05 or .20. I wouldn't mind doing those .01 or .05 as long as I don't need to read A LOT (because it would take so much time just reading the requirements), or go to so many linked websites. Generally, I do this for fun but earning a little extra is cool too!

The reasons that motivate me for completing tasks on Amazon Mechanical Turk include monetary awards (mainly), since it is my main source of income! Without Amazon Mechanical Turk, I wouldn't be able to drive my automobile, especially with these very high gas prices. I sometimes use it for killing time. The tasks are especially fun because you never know what to expect - every day is always different! There's always hits that are challenging but are very fun and interesting to complete. What discourages me from completing tasks is a huge task with a requester that is very picky and needs information exactly the way they want it (i.e., if you don't enter everything or miss something, you're automatically disqualified). What attracts me to participate in a particular task is if it is relatively simple, quick, and easy to complete. If the reward is one cent, I'll do it. There's other criteria, as well, but it all depends on how the hit is designed. You can almost always tell if there's going to be too much work to do the hit. On the other hand, if you're really bored, it may be suitable. That's what MTurk is all about!

I do it for the money, and killing time. And some are fun too! I am discouraged by all the "Jump to another Web Site" hits, that you just need to register (ie EMAIL), complete something, then copy something back to MTurk....that's all I need is more Spam....just not worth it. One time Hits (like this one) are usually not worth learning what to do to get paid. So a One time Hit, requiring you to retype War & Peace all for USD.03...Not me. Attractions: The shorter the better. Hits must have some time in case I get busy (at work) and can't finish right away. Should be a group of Hits to learn the requirements and get paid. I like Hits that pay well (not just USD, but rejects and take your work, or 30day time out to get paid)

I do it for some extra cash to buy books etc. I do consider most of them fun. What discourages me is the ones that have you do a lot of things for hardly anything.

The tasks are generally interesting. Although the rewards are small it is important and pays for books, magazines and goods that I otherwise would not have purchased. Personally I try to choose tasks that are short as my time is limited and I may be called away at any time during a HIT so I would not want to waste my time investment doing a task that I may not complete. It is discouraging when HITs are rejected with no reason given.

I do turking to earn some money from home. I am a housewife and I love to turk because I can make money without going out. Besides this I also learn some new things and gain knowledge. Some tasks are very time consuming and reward amount associated with them is very meager. So this is a great discouraging factor. I love to complete tasks that are easy and require less time. I don't accept hits that require lot of work and attention.

I do it for the money! I don't like .01 cent awards.

Started for fun and also for monetary awards. Easy tasks and appropriate or more money attracts me to participate. Lengthy tasks and very little money discourages me from completing tasks.

Q. Please describe the reasons that motivate you for completing tasks on Amazon Mechanical Turk A. I complete tasks on Mturk for the money and for the interesting nature of the tasks involved. Q. What discourages you from completing tasks? A. Lengthy/complex instructions, inadequate monetary rewards discourage me from attempting certain HITs. Q. What attracts you to participate in a particular task? A. Adequate compensation and the level of interest I have in doing the task.

I do the tasks just to provide a bit of a break from doing other work on the computer and because after a while the money adds up to something not so insignificant. Some of the tasks are interesting, but most are not. I like tasks that ask for opinion or to complete a survey or really easy tasks such as the amazon product ID ones. I don't want to complete tasks that take more than a few minutes or ask you to register with another site.

It seems to me an easy way to make some spare money. It's nice to be paid for using your intelligence. Usually I discard tasks that need more than 2-3 minutes to be completed and that reward less than 5 cents. I think that one would earn in one hour at least 3-4 dollars, if this seems to me not possible, I reject the task.

I like many of the tasks. They're fun to do if I have a few extra minutes. The money adds up quickly and I'm trying to pay down credit card debts. I usually pick tasks that are short, easy to finish, and pay at least .02 to .03 cents. Once in a while I do longer tasks like paraphrasing or searching the web.

I complete tasks in MTurk for the money. I try to earn USD3 per day. I now avoid the Amazon "Are These Items Different?" task because it gives too many wrong rejects. I like the transcription tasks because I am a sound technician.

I complete tasks on MTurk for the same reason I pick up a penny off the sidewalk. It doesn't take long and it is some loose change.

I participate on Amazon Mechanical Turk for several reasons: 1) the challenge - I find it interesting to try and create something useable in a finite amount of time 2) monetary - earning some change is a positive way to spend time rather than play couch/internet potato 3) I'd rather spend time productively than watching TV Discouragement: Some HITs I've worked haven't been paid although product was delivered Some HITS are restrictive enough that regardless of effort, product can't be that good Qualifying for types of work that never show up as HITS (need to clean out dead accounts) Few well paying HITS Attractions: I like working on HITS that provide a challenge (writing, translation, data research) HITS that are remunerated well attract best results. I define "well" as something that has the potential of at least returning a couple of USD per hour

I complete tasks on Mturk for the money but I only select simple tasks that I can finish while doing something else, e.g., watching TV/surfing. Even if a task pays well, if it's too complicated or requires information I don't have offhand, then I will not do it.

I mainly do Mechanical Turk tasks to earn a bit of extra money. I do consider some of the tasks fun depending upon the content. If a task seems lengthy for the money paid, I will usually be discouraged from completing the task. The things that attract me to tasks are familiarity with the subject, money paid and number of tasks available.

It's all about killing time as well as earning some pennies. I Turk especially since most of my downloads take ages to complete. This definitely beats Digging (not unless I can be paid to do that as well).

I complete tasks for the initial purpose of making a few extra bucks here and there. There are some tasks that I enjoy doing, such as reviewing old magazine covers, rewriting sentences, etc. However, most of the "hits" on mturk reek of cheap labor outsourcing and I wouldn't waste my time with them. There are hits in which you are asked to rewrite an essay for 8 cents. Seriously, such hits seem like a joke and I skip right over them as these are types of hits that are discouraging. Further, many hits lack specific instruction, fail to outline an approval process, or flat out propose mere pennies for a task that will obviously take an hour or so to complete. There are tasks that do attract my attention. These are the ones that are well formatted with clear and precise instruction. Generally, the first hit I look at is the one with the most hits available as these are normally the easiest to complete. The oddball hits that show up are often absurd in the fact that there is an hour of work involved for mere pennies.

I became aware of Mechanical Turk through my nephew. He said he enjoyed completing all sorts of tasks and as he was disabled and unable to hold a job, it at least gave him the opportunity to enjoy a little recreational activities from time to time...beer and karaoke! I picked up on Mechanical Turk when we retired and I, too, needed a little extra cash from time to time. Actually, so far I haven't cashed anything in...may need it for gas money down the road! I enjoy performing the easier tasks. They are fun just to search out...better than wasting my time playing Solitaire! I steer away from the tasks that require moving away from Mechanical Turk's site. Bouncing all over the web isn't something that interests me.

I complete mturk tasks for the money, and to kill a little bit of time. Sometimes I enjoy doing mundane tasks like some of the ones presented on Mturk. I skip tasks that pay very little for tons of reading. I choose tasks that can be completed very quickly where bits of information are needed from pictures or text.

I do it to keep myself occupied, for fun, to keep my mind sharp, and for the money, even though it is "play money" for me (since I don't live in the US, I can't have it transferred to a bank account, and so I can only spend it on Amazon through the Amazon Gift Account). What discourages me are tasks with high (and often unfair) rejection rates, slow payment, but most of all: HITs that ask you to do a shitload of work for just a few cents. Sometimes I really do wonder: what WAS this requester thinking?! No one is going to do all that for so little reward. What attracts me: simple, fun stuff for cents like the ask500people HITs, but also sometimes the more challenging stuff like Castingwords transcriptions, especially when I've seen something cool on Amazon that I want to buy, because the pay is good (more specifically the expedited transcripts). It just depends on my mood, I guess. Oh, and stuff that has to do with pictures are usually fun (Review user submitted images, for example).

Money. I like free stuff. It kills time. I would be sitting around doing nothing. I get discouraged if it is lengthy and only for a penny. If it's easy and has a nice reward I won't hesitate to do it!

I do it for the money and for fun. I consider them mostly fun. When someone wants you to do an enormous amount of work for little payment, it's discouraging. I like the surveys the best.

I do it for fun, when I'm bored (killing time) and for money sometimes.

I like to do the different tasks on Amazon's mechanical turk for a couple of reasons. The biggest one is killing time, but if by killing time I can earn a little money it is just a bonus. Some of the different things I have done have also been entertaining. I do mostly the same tasks. I have tried to branch out, but some of the descriptions are very vague and hard to follow. I don't like to do tasks that require more than 5 or 10 minutes to do. Searching the internet for hours on end trying to find maps or 150 business documents is just something I am not willing to do. I am more than happy to rewrite sentences, answer questions or provide stories though.

There are three reasons that motivate me to complete tasks. They are: for the monetary awards. It gives me something to do. They are fun hits. Overall, I only select hits that I enjoy completing. What discourages me from completing tasks is some of the 1 and 2 cent hits. They are so involved that it takes from 12 to 30 minutes to complete. Although, I will complete these but, only if it takes 1 to 10 minutes to complete and they are fun hits.

All about the money. Yes the tasks are fun!!! Longer tasks are discouraging. Quick money lures me to certain tasks.

I do the hits for the money. Some are fun but it depends on the hit. If I get discouraged with a hit I will sometimes stop doing it and come back to it before the time runs out. I say that it attracts people two different ways the pay and how hard the hit is. The easier the more people will do them.

Monetary awards. The simple polls are sometimes fun. I don't complete tasks that are too hard for too little pay. I prefer a simple task for a simple reward, not multiple questions in one task.

I typically complete tasks for the monetary awards and to kill time during slow times at work. Certain tasks are more enjoyable than others. I do get tired of doing some of the repetitive tasks. However, the tasks that I won't complete are the ones that require a great deal of work (writing stories, essays etc) for only a few cents.

Monetary rewards, pure and simple. If a task is not worth the time, it won't get done.

I do this for fun and monetary gain. I do not do tasks that require too much time or are prone to be rejected. I also avoid tasks that are lots of work for very little money. I like tasks that I can do quickly and pay enough to make them worthwhile. It also helps if the task is somewhat interesting.

I complete tasks in Mechanical Turk for a number of reasons. Since I do not enjoy watching TV, they are a way of killing time and the monetary awards are nice. Also I enjoy doing the GIS Hits because they enable me to see other locations that I have never seen before. The Unspun tasks are fun to do, and I probably enjoy them the most, but the thing that discourages me about them is getting the same question over and over. A question that I would not have skipped in the first place if I had wanted to answer it. And one more thing that discourages me is some do not give me enough time to complete the task, and do it very carefully. But all in all I appreciate being able to do some of them. Thank you for allowing me to do this.

Monetary award, to be able to buy stuff at Amazon.com. Some tasks are fun, others are a pain in the neck. Naturally, the biggest selling point of a HIT is the reward to duration ratio. Then, the time the HIT stays pending. Little HITs that pay too little are generally bad because, even if they end up paying much, we end up with a sore mouse hand.

I enjoy doing MTurk for a variety of reasons. I was first attracted to the site for the monetary rewards. However, the longer I do it, the more I enjoy doing it. I like many of the hits available to me. The tasks for the most part are fun and I have found many other useful websites while performing hits. I do not do the tasks that seem like more work than what the people are paying. One that annoys me to no end is finding maps of other countries. They are nearly impossible to do despite the payment offered. On the other hand, I love doing quizzes, tagging pics such as GIS Imaging. I also enjoy finding info for amazon.

I complete tasks for the money. Plus, it is kind of fun. I don't like the tasks that take a long time to complete. I like the quick tasks.

I do enjoy completing tasks on Mechanical Turk, but I also do it for extra money. I won't complete a task with a very low reward if it will take a lot of time. I like to do quick, simple tasks while watching tv in the evening. I also like the GIS tagging HITs from geospatial vision -- mainly because the pictures often show places I've visited and I enjoy the nostalgia and occasional nice rural views. I like tasks with a payment level that is not insulting (should be appropriate for the time and effort involved) and tasks that offer bonuses for good work and/or quantity of good work. My favorite HITs are the mathematical problems from Sarsen education. I like being able to write problems and contribute to an educational program. Combining creative writing with math is quite fun, and the bonuses are also attractive.

I am trying to pay off a credit card, so I have set a goal for myself to earn a certain amount of extra money each day on Mturk. I use my Mturk money to make payments above and beyond the minimum. I know that in the long run these small amounts will help me to get out of debt quicker. I mostly Turk during slow periods of my regular work day, but if I don't make my goal I will also Turk in the evenings from home. Sometimes I will sit in front of the TV and Turk just because I have the time, even if I have already met my daily goal. I don't think the tasks are particularly fun, but the extra money is worth it. I try to find tasks that generate the most USD for the least amount of time. It's not worth it to me to spend 10 minutes for three cents when there are many available tasks that will bring in three or more cents per minute.

I'm doing it for the little money I earn at it. I'm trying to see how much money I can accumulate through unconventional means (collecting cans, ppc, m turk, grocery bag rebate at the store) I've turned it into a game. Since I'm not depending on this income in any way I don't like difficult tasks. Let me take a one click survey, or verify search results. (I've earned USD8 + over the last few months)

I participate in mturk because I actually do enjoy the kinds of tasks that I choose to do. It allows me to keep my skills sharp in certain areas in a fun kind of way and I get paid a bit to do it as well. It's also kind of an interesting (and, quite possibly, weird!) way of sort of keeping abreast of the kinds of things other people are interested in, are doing, find important, etc. You never know who's going to post a HIT, from where, and what it's going to be about. I've seen some really interesting ones in my time here (about a year or so). What discourages me from doing HITs is if I really don't have any particular skill or knowledge in the area - I won't even accept or attempt it in the first place - or, if I have accepted it, if it turns out to be more difficult and involved than I anticipated or expected and I don't really think my time and energy is worth what I'm being paid for it (or if I really am not going to enjoy doing it even if I'm not getting paid enough), I'll return it. Hope that helps!

All my income goes to bills. Paycheck goes to wife. By turking I can make 50USD - 100USD a week for myself if the tasks are available when I am. I like quick, simple tasks. Tasks that are too involved I do not accept.

For my wife and I, this is strictly a monetary endeavor. We have our Mturk account linked up to a long term savings account and all the money we earn on it goes straight into savings. As such, we make decisions about which tasks to do or not do based on the reward versus the time investment. All things being equal, we will gravitate towards hits that are slightly more enjoyable, but in truth, cold hard cash is the main factor.

I do it for the fun of it plus it does help out on buying groceries for the house. What discourages me is taking on the task and then not being able to do it or understand it. But what really gets me is to do a lot of the task and then they don't go through because I put the wrong number in or answer. What attracts me: Can I do the job or it looks like it would be fun just to try and see if I can do it or not.

Mostly killing time and there are some interesting HITs as well. I do not do tasks with overly long instructions, blog/website pimping, etc. I am attracted to any tasks that incorporate the spirit of Mechanical Turk, that is, tasks that are easy to do by humans but impossible programmatically.

I complete tasks on Amazon Mechanical Turk to kill time and make money while I'm doing it. It really doesn't take a lot of effort or work and I get paid to do it, what a deal. What discourages me from completing tasks, there are three reasons: 1) If I have to click on a link to another website 2) If I have to sign up for something 3) If the work is complicated or if there are several steps to complete it. An example of this is diagramming street signs. What attracts me to a particular task is: 1) Is it easy 2) Are there several HITS available or will there be more available 3) The reward amount

I complete tasks on Mechanical Turk for the monetary rewards. I don't have a regular job, so this really helps to bring in some extra income. What discourages me from completing tasks is when the instructions are too complicated, or you are being asked to do too much for too little, especially when there are simultaneously similarly priced tasks to be completed. I also won't review user submitted images of people's genitalia anymore. What attracts me to a particular task is ease of the task and whether or not it is in great supply.

I complete tasks for monetary reward. If the hit is too long for very little money, I won't complete it. Good money attracts me.

I do it for fun and pocket change. Long directions and multiple steps for really low pay will be skipped by me every time. If the pay matches the task, it's ok by me. I decide if it's worth it by figuring out how much an hour the task would make if there were enough hits for an hour, then asking myself, am I willing to accept (USD3.00 or whatever it works out to be) that amount to sit at home in my PJs and take whatever breaks I want for that pay.

I complete tasks on Mturk for a few reasons. The main reason being it is a great help in earning some extra cash which I desperately need at the moment. Aside from that I do find a lot of the tasks to be fun and interesting and it is also good experience for future jobs I might apply for such as transcription. Usually what attracts me to a task is if the topic really interests me or it is in an area I am trying to improve such as typing. If the task looks to be easy and short I will generally do it no matter how low the payout. What would discourage me though is if it was a task that would take a while and payout was very low. If I think I am going to average any less than USD5 an hour completing any HIT or set of HITS I won't do it.

Nifty way to make money in spare time. What attracts is a nice time to money ratio. HITs in the pennies range should be able to be completed in less than a minute. Some tasks are genuinely fun. What discourages? Anything that results in spamming another site, or forum. HITs that require you to go to another site to complete the task. Also, for transcription based HITs, media software that doesn't let you freely scroll through the content. I shouldn't have to keep listening from the beginning of a video or audio whenever I need to go backwards.

I almost always MTurk for the money. Most of the tasks I do are not fun at all. I only do tasks that take less than a minute. I'm attracted to participate in surveys which require no real work.

Motive: For money and even though it's not much but since I can learn something from the task given it seems an OK thing to do. Some of the tasks can be fun but some can be very boring too. What discourages me from completing the task are: 1) if I have to download something suspicious 2) if I have to wait for quite some time for a download 3) the instruction given is not clear Attraction to MT: alternative better way to surf the net!

Honestly, I do it for the money. Some of the tasks I find fun but for me it comes down to the monetary compensation. Obviously, the higher the reward the more likely I am to do a task. If a task requires an hour of time but pays less than USD5 then I won't do it. My favorite tasks have been forum posts looking for good deals online then posting them on the forum. I love bargain hunting and I could average 6 or 7 dollars per hour. My other favorite

Liveblogging from Dagstuhl: Day 4 (March 13)

  • Stefan Klinger: PathfinderFT (full text) and how to propagate scores in the engine. Based on the XML2relational XQuery Pathfinder engine.

  • Martin Theobald: TopX 2.0. Object store for top-k query processing. Supports BM25 for full text, and employs many IR optimization techniques for speeding up query execution. The 2.0 version implements inverted indexes for XML and various optimizations.

  • Ralf Schenkel: Extended the discussion about TopX.

Break

  • Mariano Consens: Why retrieval effectiveness measures matter. In DB we measure efficiency, scalability, simplicity, elegance, but rarely effectiveness (yours truly begs to differ, but I was not even classified in DB to start with :-). In INEX you need to retrieve a ranked list of *non-overlapping* elements. Therefore, in the results it makes sense to eliminate overlaps. Since we assign a "monotonic" measure of relevance in the atomic elements, the parent container models will have a relevance that depends on the relevance of the leaf items.

  • Harold Schoning: Discussion on implementing full text search on Tamino and other interesting topics. Need to check in more detail.

Break

  • Pierre Senellart: Using CRFs for generating wrappers automatically for hidden web databases. Using a tree-based probabilistic model to model dependencies between annotations and assumes conditional independence. Using an iterative approach for enriching the description of the wrappers. Identifies types of important entities, learns how they are connected and constructs a wrapper.

  • Ihab Ilyas: Uncertainty-aware top-K. Generate possible worlds for the instantiation of each relation, and compute the probability of each world. For example, in an information extraction scenario, we can define probability of existence for each tuple, and define the possible "world instantiations" of the relation, together with some "world probability". Now when we want to generate the most probable world, we can take either a Maximum Likelihood approach and return the most probable world, or take a Bayesian approach and integrate across worlds. One issue is how to do the integration efficiently and the presented research describes a few algorithms under different scenarios.

  • Thomas Rolleke: Describe the different retrieval layers in a set of abstractions. How to build a probabilistic system. Nice overview of literature for probabilistic approaches in DB and IR communities, plus overview of approaches that try to connect the two. Discussion on how to implement all the different retrieval models of IR (log-likelihood, vector space, language models, etc.) in SQL.

  • Ingo Frommholz: The POLAR framework. How to use annotations in a principled, probabilistic manner.

Break

  • Yours truly: How to structure and rank opinions using econometrics. Essentially, instead of relying on semantics, just associate opinion phrases with some measurable economic variable and discover correlations. Most of the time you need the correct econometric model (aka correct statistical techniques) to get proper results.

  • Ranking Wikipedia using the structural (graph) connections. Personalized PageRank applied for Wikipedia retrieval.

Wednesday, March 12, 2008

The "Good Movie" Talk and the "Anytime" Talk

One of the great things about being in Dagstuhl is the fact that every evening the only thing that we can do is to gather together over cheese and wine and talk. As part of one of these wine conversations, Gerhard Weikum told us about the definition of the "anytime" talk: a talk that can be interrupted at any minute and still allow the attendees to get the main message of the talk. (For the non-initiated: an "anytime algorithm" is an algorithm that can be stopped at any point and return the best possible outcome that would be possible to get within the time limit.) This is in contrast to the "good movie" talk, which, as in a good movie, is intriguing throughout but makes sense only at the very end :-)

I think that I naturally prefer the "anytime" talk, but (a) it needs much more preparation, and (b) may be bad for job candidates (how good can it be if you can explain the solution in 2 minutes?). Furthermore, I do not know how well it will work for teaching. It does work great for research talks when interaction is expected and encouraged. The "good movie" talk works best when the time allocation is prespecified, and there is no interaction with the audience. For example, TED talks tend to be "good movie" talks, where the message comes across strongly at the end of each talk.

Tuesday, March 11, 2008

Liveblogging from Dagstuhl: Day 2 (Mar 11)

  • Holger Bast: The CompleteSearch Engine http://search.mpi-inf.mpg.de/. IR vs DB: IR index is compressible and high locality of access, ranks well, not even simple selects. DB vs IR: can query nicely, no locality of access. CompleteSearch: Performs prefix search and range search on the IR search. Ability to perform joins using a keyword search interface. Locality of reference claimed to be the main advantage of using IR-based indexes instead of "traditional" database indexes. The whole architecture seems similar (in terms of benefits) to the column-store database systems.

  • Arjen de Vries: Flexible and Efficient IR using Array Databases. Many standalone retrieval prototypes, no clean separation of the different aspects of the experiments, and things are typically monolithic and tied to specific datasets. Goal is to have flexibility and efficiency. The idea is to specify the type of documents, as a set of matrices (make sure to compress them). Then define a set of metrics using the matrix data. Then combine the metrics and matrices into database queries and be able to have an engine to run experiments efficiently in a data-independent manner. (So that we do not have to reinvent the wheel every time that we want to do something new.)

  • Yosi Mass: Adaptive XML Retrieval System. Given a query in free text, retrieve XML components that satisfy the query. One approach is first to retrieve documents and then score the fragments within. Second approach: index only XML leaves (need to perform aggregations for retrieving more complex elements). Third approach: index every possible subtree (overlapping of items, an issue when computing frequencies). Solution: split elements into multiple indexes, making sure that we have complete coverage and no overlap of elements within the same index. (Comments indicate that it is a good idea when the number of tags is small, to group all similar tags to the same index, instead of mixing apples and oranges, or "chapter" and "section" tags. This becomes a problem with Wikipedia, when we start having too many tags, and it is not possible to generate that many indexes --- what about grouping tags together to populate.)

  • Djoerd Hiemstra: Sound Ranking Algorithms for XML Search. Pathfinder: XQuery->Relational compiler. Tijah=XML search system for NEXI (Narrow Extended Path). NEXI is being used as sublanguage to XQuery. Need to devise metrics that will allow consistent rankings.

Break

  • Amelie Marian: Filesystem search. Keyword search for ranking, and filters on metadata. Pure IR model not sufficient, due to the need for "fuzzy predicates" (e.g., "get me my proposals from around March 2006"). Needs to accommodate approximate predicates naturally, going beyond "binary" in the features. Proposed a multidimensional approach, scoring each "field" independently, and aggregating the scores afterwards. Contributions in query processing: multiple indexes and DAG-based approaches. Used relaxation hierarchies for allowing relaxation of predicates (day to month to year...).

  • Kostas Stefanidis: Get best results based on contextual user preferences. Give the best contextual results by inferring the context from the query itself. Implementation: use of a profile tree. Relaxation using hierarchies.

  • Irini Fundulaki: Personalized XML (Pimento). XML queries are both on structure and content. Therefore customize query content with this in mind, and customize the results appropriately. Add scoping rules in the user profiles (for what the query should contain -- with some relaxation) and ordering rules on how the results should be preferably ordered. Described how to achieve efficiently and effectively the relaxation. Deriving rules from narratives.

Break

  • Maarten Marx: Talked about the use of named entity recognizers to create a graph that can help in various tasks, plus makes it possible to generate concise summaries of a topic.

  • Benny Kimelfeld: Keyword proximity on XML graphs

  • Reza Curtmola: XML Distributed Retrieval. When we have XML documents distributed and we want to run queries over them, we can have one centralized model where a central server gets the queries and asks for all documents to be aggregated in a single location. Emiran describes a distributed system using an overlay network.

Monday, March 10, 2008

Liveblogging from Dagstuhl: Day 1 (Mar 10)

For some strange reason I was invited to Dagstuhl for a seminar on "Ranked XML Querying". Why strange? Because I have done nothing on ranked XML querying, or XML at all. I have done some work in ranking (the Economining project is all about economically-induced ranking at its very core) but why worry? You cannot say no to an invitation!

So, after a very very bumpy flight from New York to Paris, a train trip from Paris to Saarbrücken, and a cab ride from Saarbrücken to Dagstuhl, here I am. Dagstuhl is a very interesting place, everything based on an honor system (you keep track yourself of the snacks, beers, wines, etc that you consume), and the rooms do not even have keys.

We started today, with a set of tutorials and short 5-minute introductions of the research that each one of us is doing. There were 3 sets: the "database" people, the "IR" people, and the "web" people. I was classified in the "web" track. :-)

The "database" people.

  • Ihab Ilyas: RankSQL, uncertain databases, especially interesting a bullet on "probabilistic data cleaning" that uses model lineage.
  • David Toman: Description logics and query combination. Description logic for physical design. How to create a query optimizer using fine-grained information about physical design.
  • Harald Schöning: Talked about Tamino, "the first native XML database" and applications for police information systems, airport logistics, financial derivatives fleet management, and newsroom system.
  • Peter Apers: Described early 1990's efforts to bring together database and IR communities. In retrospect, he considered a problem the fact that as database community we forgot about the hierarchical and network models, and now we need to talk to IR and web that really use such models for information representation.
  • Benny Kimelfeld: Keyword search over structured databases and how to do flexible and inexact queries over structured databases and how to query probabilistic data
  • Emiran Curtmola: Optimization of XML queries and XQFT (XML full text) queries. How to rank and evaluate the quality of search results; how to summarize such results.
  • Stefan Klinger: Graph theory and XML schema validation. Started working on the Pathfinder compiler that converts XML to relational expressions; extends the PathFinder compiler to full-text.
  • Kostas Stefanidis: Personalized systems with application in personalized search, how to manage context-dependent preferences, database selection based on contextual preferences.
  • Irini Foundoulakis: Personalized XML full text search and experiments with INEX data. XML Access control and how to formalize the semantics and apply them; security for provenance data.
  • Amelie Marian: Data corroboration: large amount of low-quality data, and use of corroboration can improve the quality; Understanding user reviewing patterns (structure and query reviews); Multi-dimensional search for file systems.
  • Gerhard Weikum: How to turn the web into a semantic database: Harvest and combine data (a) hand-crafted data, (b) automatic knowledge extraction, (c) social networks and human computing. The Yago system, NAGA queries. Plus: p2p search, personalized search, social search, time-travel search on web archives.
  • Ralf Schenkel: TopX, bridging the DB and IR gap. XML query languages for real users.

Ranked XML Querying: The DB Tutorial (Weikum)

Started with a quadrant: (structured vs unstructured, search & data):

  • Both structured: Databases
  • Both unstructured: IR
  • Structured search, unstructured data: information extraction and text mining workflows.
  • Unstructured search, structured data: keyword search over relational and XML data.
  • Motivation 1: Text matching: Add keyword search for searching relational and XML data. We need the (principled) ranking approaches for result ranking. We also need probabilistic integration of different relations. Question: what defines a ranking function as "principled"? Answer: tf.idf is not "principled" but ad hoc performs well, language models, BM25, and so on, are built on theoretical models and can be reused in different contexts. XML and searching: XPath and similar languages add multiple predicates. Typically we cannot satisfy them all (plus, they are difficult to write them in a semantically correct manner). Therefore we need relaxation.
  • Motivation 2: Too-many-answers. We resort to "top-K" or skyline (Pareto optimal). Probabilistic ranking for SQL and how to adopt the likelihood model for SQL ranking. How to fit together deterministic predicates with "soft" predicates.
  • Motivation 3: Schema relaxation. We can relax not only content queries but schema as well.
  • Motivation 4: Information Extraction and Entity Search. We can extract our data and build (uncertain) tables from the data. How can we extract and query, and rank such results in an efficient and effective manner? If we take a graph-based approach, with multiple link types, how can we effectively exploit the generated network? We can rank by confidence, by informativeness, or even by compactness (Steiner tree).

Lunch: We were sitting in prearranged tables with our names assigned to prespecified seats, randomly assigned, to encourage/force interaction.

The "IR" people.

We continued with the introduction of the IR people.

  • Holger Bast: All data is text (In the beginning was the word...). All text is semi-structured. Make fancy searches fast and easy to use. Demo of CompleteSearch of DBLP (impressive!) and of FacetedDBLP.
  • Maarten Marx: NEXI query language, doing XML retrieval IR-first.
  • Martin Theobald: Probabilistic databases (uncertainty and lineage, Trio project). Efficient XML-IR. TopX system, plus call for INEX.
  • Djoerd Hiemstra: IR language models, multimedia and XML & Entity Search. PathFinder/Tijah.
  • Yosi Mass: XML Query and XML fragments. Vector space model for XML ranking and relevance feedback for XML. Desktop search and UIMA annotations.
  • Arjen de Vries: Improve search system engineering efficiency. Given a declarative specification of the collection, background, context, and of a retrieval model, generate a "Parameterized Search System" (PSS).
  • Thomas Rölleke: Seamless DB+IR, HySpirit retrieval engine.
  • Ingo Frommholz: Annotations and meta-annotations (annotations on annotations). Searching documents with annotations, or doing discussion search (finding documents that get positive comments(?)).

IR Tutorial by Djoerd Hiemstra: History of IR developments: STAIRS, introduction of GML (separation of content from formatting), Codd's relational model,... Discussion of INEX plus some experimental results. Discussion of LM, BM25, etc.

The "Web" people.

  • Sihem Amer-Yahia: Her story from monolithic to atheist to agnostic, all in terms of data management. Serving socially relevant content to users (e.g., what I enjoy to watch, depending on the company). She plagiarized the "show me the money" slogan!
  • Pierre Senellart: Research on the hidden web. Discovery of web services, probing and wrapper induction.
  • Sebastian Michel: P2P web search, distributed indexing, social search.
  • Debora Donato: NLP applied to IR, Usage and Link Analysis. Mining social networks, web spam, reputation management.
  • Panos Ipeirotis: Yours truly. SQoUT, EconoMining, Noisy multilabeling, faceted interfaces etc.

Tutorial from Sihem: Making DB&IR socially meaningful. Talked about recommendations: why, when, dealing with long tails, time-awareness, diversity-awareness.

Wednesday, March 5, 2008

Mechanical Turk: The Foundations

Over the last year or so, I have been using Mechanical Turk as a very useful tool for my research. While in practice it is a very useful tool, there is high uncertainty about the quality of the answers that someone can get back from such a system. Some of the Turkers will be lazy and submit random answers, some will have good intentions but still submit an incorrect answer, and some others will do a good job. However, since we do not know beforehand the actual answers for the questions, we need methods for extracting the signal from the noise, evaluate the quality of the individual Turkers, and to decide how much effort to spend annotating our data.

So, here are a few questions, and pointers to related research:

Question 1

We have a set of labelers and we ask them to label a set of examples, using a predefined set of labels. We do not know the correct labels of the examples. Can we identify the correct labels of the examples and estimate the error rates for each of the labelers?

A seminal paper that answers this question is the paper "Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm" by Dawid and Skene, published in 1979. The authors show how to use an Expectation Maximization algorithm for estimating the true labels of the examples, and show how to estimate the "confusion matrices" for each labeler. Smyth et al., and Wiebe built on the work of Dawid and Skene to infer ground truth in such noisy environments.

Question 2

When we get labels that are noisy, there are two approaches for dealing with the noise. The "frequentist" approach, in which we use the "majority" of the votes (potentially weighted according to the noise of the labeler), or a "Bayesian" approach in which we consider the labels to be inherently uncertain. In the first approach, we essentially treat the data as if they are noise-free and let existing learning algorithms work without any modification. If we take a more uncertainty-based approach, then we need to modify the learning algorithms to deal with the label uncertainty.

Question 3

We know that the examples that we get back are not perfect. We can compensate for the noise in the data by asking the annotators to be more careful (potentially by paying them more for each example), or we can simply get more training data hoping that the signal will emerge from the noise, or we can multilabel the existing data with multiple labels hoping to improve the quality of the labels.

Hal Daumé discussed the tradeoff of quality vs. quantity a while back, assuming that we can "tune" the individual annotators to be slower and of higher quality or faster but generate output of lower quality.

In Mechanical Turk, we can decide whether it makes sense to allocate the budget by annotating multiple times the same example, or whether it is preferable to label more examples. If we decide to label more examples, should we do this uniformly and ask for N labels per example, or does it make sense to select carefully the examples for which we want to acquire more labels? This was a problem that we examined with Victor Sheng and Foster Provost, and we have now produced a first draft of our paper "Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers", which outlines our approach. (Update: The paper was accepted to KDD 2008, and the reviewers thought that using the term "multilabeling" will only confuse people, since the term is also used to describe cases where an example has multiple, true labels, as opposed to multiple noisy ones with a single underlying true label.)

In short, for those of you who are bored with reading the paper: If there is noise in the data, it is often preferable to multilabel existing examples compared to acquiring new examples. Furthermore, it pays off to selectively multilabel examples that have high degree of uncertainty. How can you measure the uncertainty? There are multiple approaches:

  • Entropy-based: The most intuitive way to measure uncertainty is to measure the entropy of the set of assigned labels. Intuitively, the label set {+,+,+} has lower entropy than {+,+,-,+,+,-,+,+,-} and it makes sense to label the latter. Appealing approach but it does not work. Why? Because in a noisy environment the entropy of the label set does not decrease. The latter set, in an environment of high noise is almost certain to be a "+" example, while the former is likely to be "+" but with comparatively lower confidence.
  • Label uncertainty: Based on the example above, we can now try to estimate the confidence that we have for the "true" label. Using a Bayesian estimation approach, we can get quite good results (see the paper for details).
  • Model uncertainty: Instead of using the label uncertainty, we can use active learning approaches and multilabel the "hard" and "ambiguous" examples as judged by the machine learning model that uses the generated training data. Intuitively, this technique works well, as it identifies quickly the examples for which the classifier cannot make confident decisions and which, in turn, are probably labeled incorrectly.
  • Label and Model uncertainty: The best approach. Model uncertainty directs to examples that are likely to be incorrectly labeled, and label uncertainty picks the ones for which we are uncertain about their true labels. We have observed that this approach works best and consistently outperforms simpler baselines.

Future Directions

I am very excited about this line of work. Of course, there are quite a few other questions and interesting directions for future research. Here are a few:

  • If we identify the quality of each labeler (e.g., using the framework of Dawid and Skene), can we improve the quality of the labeling by assigning the difficult examples to the most capable annotators?
  • What are the tradeoffs of quality vs. payment? Does it make sense to pay higher prices to increase quality? What if the annotators become strategic knowing that lower quality work will result in resubmissions of the same work for higher payoffs?
  • What if feature acquisition is also noisy? Can we build unifying frameworks that unify noisy feature acquisition, noisy label acquisition, and take cost and expected utility of the generated examples into account?

Applications for Conference and Journal Reviewing

I am also thinking this line of research has applications to conference and journal reviewing. Again there we have a finite set of noisy annotators, each one with its own noise level, and we are trying to estimate the "true label" of a paper. How should we optimally manage the resources? The current SIGMOD mode of operation (2 reviewers initially, extra reviewers for potential acceptances) seems to come close to the "label and model uncertainty" that we discussed above. I guess we can mathematically model this operation further and see how we can optimize it. Of course, the (imho, classic, for its own reasons) "Reviewing the Reviewers" by Ken Church, combined with the references above, are good starting points.

Tuesday, March 4, 2008

Course Evaluations and Prediction Markets: The Results

A couple of weeks back, I described my attempt to use prediction markets to predict my final course evaluation. The final result of the market was:

PREDICTIONS CURRENT VALUE TODAY
$49.95 (closed)
$36.86 (closed)
$4.60 (closed)
$4.34 (closed)
$4.23 (closed)

Taking the weighted average of these predictions, the market shows a predicted outcome of:

\(\begin{align}49.95 \cdot 6.25 & + \\ 36.86 \cdot 6.75 & + \\ 4.60 \cdot 5.25 & + \\ 4.34 \cdot 5.75 & + \\ 4.23 \cdot 3.0 & = 6.227 \end{align}\)

And what was the final course evaluation? Did the market work? Well, the final course evaluation was a 6.212 with 35 student ratings. A relative error of 0.002 or 0.2%. I cannot think that I could have gotten a more accurate prediction!

Interestingly enough, by observing the market, I can see that very few people actually picked the 6.0-6.5 range. Most of the players bought contracts in the 6.5-7.0 range. These contracts played their role of counterbalancing the few players that bought contracts in the 1.0-5.0 and in the 5.0-6.0 ranges. Therefore, while most of the action was in contracts that did not predict the correct range, this activity was crucial for the market to balance and give the correct prediction.