A Computer Scientist in a Business School

Saturday, May 29, 2010

Accepted Papers for HCOMP 2010

The list of accepted papers for HCOMP 2010 is now available. We received a total of 35 submissions (18 long paper submissions, 11 short paper submissions and 6 demo submissions). This was a healthy increase compared to last years numbers (12 long, 9 short, 9 demos).

Unfortunately, due to the half-day duration of the workshop, we had space for only 8 papers (4 long and 4 short). We really had to make very hard decisions to fit as many papers as possible in the program. We reduced the length of the talks, "downgraded" some papers to short, some others to posters, and at the end of the day we even had to reject papers that had no reject ratings! Quite a few of the rejected papers were interesting and I would love the opportunity to talk to the authors about their work. Let's hope that next time we will manage to get a full day for the workshop in order to accommodate the demand. Having acceptance rates of ~25% for a workshop is simply too low.

I expect to see a very exciting program! If you can be in DC on July 25th, you should make the effort to come to the workshop!

Special thanks go to Chandra for running the show this year!

Below, I list the accepted papers, with links to the unofficial versions that the authors posted on their websites. Once we have the camera-ready versions available, I will put the proper links.

Long Papers

Toward Automatic Task Design: A Progress Report
Eric Huang, Haoqi Zhang, David Parkes, Krzysztof Gajos, Yiling Chen
Abstract: A central challenge in human computation is in understanding how to design task environments that effectively attract participants and coordinate the problem solving process. In this paper, we consider a common problem that requesters face on Amazon Mechanical Turk: how should a task be designed so as to induce good output from workers? In posting a task, a requester decides how to break down the task into unit tasks, how much to pay for each unit task, and how many workers to assign to a unit task. These design decisions affect the rate at which workers complete unit tasks, as well as the quality of the work that results. Using image labeling as an example task, we consider the problem of designing the task to maximize the number of quality tags received within given time and budget constraints. We consider two different measures of work quality, and construct models for predicting the rate and quality of work based on observations of output to various designs. Preliminary results show that simple models can accurately predict the quality of output per unit task, but are less accurate in predicting the rate at which unit tasks complete. At a fixed rate of pay, our models generate different designs depending on the quality metric, and optimized designs obtain significantly more quality tags than baseline comparisons.
Exploring Iterative and Parallel Human Computation Processes
Greg Little, Lydia Chilton, Max Goldman, Robert Miller
Abstract: Services like Amazon's Mechanical Turk have opened the door for exploration of processes that outsource computation to humans. These human computation processes hold tremendous potential to solve a variety of problems in novel and interesting ways. However, we are only just beginning to understand how to design such processes. This paper explores two basic approaches: one where workers work alone in parallel and one where workers iteratively build on each other's work. We present a series of experiments exploring tradeoffs between each approach in several problem domains: writing, brainstorming, and transcription. In each of our experiments, iteration increases the average quality of responses. The increase is statistically significant in writing and brainstorming. However, in brainstorming and transcription, it is not clear that iteration is the best overall approach, in part because both of these tasks benefit from a high variability of responses, which is more prevalent in the parallel process. Also, poor guesses in the transcription task can lead subsequent workers astray.
Task Search in a Human Computation Market
Lydia Chilton, John Horton, Robert Miller, Shiri Azenkot
Abstract: "In order to understand how a labor market for human computation functions, it is important to know how workers search for tasks. This paper uses two complementary methods to gain insight into how workers search for tasks on Mechanical Turk. First, we perform a high frequency scrape of 36 pages of search results and analyze it by looking at the rate of disappearance of tasks across key ways Mechanical Turk allows workers to sort tasks. Second, we present the results of a survey in which we paid workers for self-reported information about how they search for tasks. Our main findings are that on a large scale, workers sort by which tasks are most recently posted and which have the largest number of tasks available. Furthermore, we find that workers look mostly at the first page of the most recently posted tasks and the first two pages of the tasks with the most available instances but in both categories the position on the result page is unimportant to workers. We observe that at least some employers try to manipulate the position of their task in the search results to exploit the tendency to search for recently posted tasks. On an individual level, we observed workers searching by almost all the possible categories and looking more than 10 pages deep. For a task we posted to Mechanical Turk, we confirmed that a favorable position in the search results do matter: our task with favorable positioning was completed 30 times faster and for less money than when its position was unfavorable. "
The Anatomy of a Large-Scale Human Computation Engine
Shailesh Kochhar, Stefano Mazzocchi, Praveen Paritosh
Abstrat: In this paper we describe RABJ, an engine designed to simplify collecting human input. We have used RABJ to collect over 2.3 million human judgments to augment data mining, data entry, data validation and curation problems at Freebase over the course of a year. We illustrate several successful applications that have used RABJ to collect human judgment. We describe how the architecture and design decisions of RABJ are affected by the constraints of content agnosticity, data freshness, latency and visibility. We present work aimed at increasing the yield and reliability of human computation efforts. Finally, we discuss empirical observations and lessons learned in the course of a year of operating the service.

Short Papers

Sellers' problems in human computation markets
M. Six Silberman, Joel Ross, Lilly Irani, Bill Tomlinson
Abstract: "Tools for human computers" is an underexplored design space in human computation research, which has focused on techniques for buyers of human computation rather than sellers. We characterize the sellers in one human computation market, Mechanical Turk, and describe some of the challenges they face. We list several projects developed to approach these problems, and conclude with a list of open questions relevant to sellers, buyers, and researchers.
Sentence Recall Game: A Novel Tool for Collecting Data to Discover Language Usage Patterns
Jun Wang, Bei Yu
Abstract: Recently we ran a simple memory test experiment, called sentence recall, in which participants were asked to recall sentences that they had just seen on the screen. Many participants, especially non-native English speakers, made various deviations in their recalled sentences. Some deviations represent alternative ways to express the same meaning, but others suggest that there are missing pieces in the participants' language knowledge. The deviation data, on the one hand, can provide individual users valuable feedback on their language usage patterns that they may never notice, on the other hand, can be used as training data for automatically discovering language usage patterns in a subpopulation of language learners. This paper presents our attempts to create an enjoyable sentence recall game for collecting a large amount of deviation data. Our results show that the game is fun to play and the collected deviation data can reveal common language usage patterns among non-native speakers.
Quality Management on Amazon Mechanical Turk
Panagiotis Ipeirotis, Jing Wang, Foster Provost
Abstract: Crowdsourcing services, such as Amazon Mechanical Turk, allow for easy distribution of small tasks to a large number of workers. Unfortunately, since manually verifying the quality of the submitted results is hard, malicious workers often take advantage of the verification difficulty and submit answers of low quality. Currently, most requesters rely on redundancy to identify the correct answers. However, redundancy is not a panacea. Massive redundancy is expensive, increasing significantly the cost of crowdsourced solutions. Therefore, we need techniques that will accurately estimate the quality of the workers, allowing for the rejection and blocking of the low-performing workers and spammers. However, existing techniques cannot separate the true (unrecoverable) error rate from the (recoverable) biases that some workers exhibit. This lack of separation leads to incorrect assessments of a worker's quality. We present algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error. Our algorithm generates a scalar score representing the inherent quality of each worker. We illustrate how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers. We present experimental results demonstrating the performance of the proposed algorithm under a variety of settings.
Human Computation for Word Sense Disambiguation
Nitin Seemakurty, Jonathan Chu, Luis von Ahn, Anthony Tomasic
Abstract: One formidable problem in language technology is the word sense disambiguation (WSD) problem: disambiguating the true sense of a word as it occurs in a sentence (e.g., recognizing whether the word "bank" refers to a river bank or to a financial institution). This paper explores a strategy for harnessing the linguistic abilities of human beings to develop datasets that can be used to train machine learning algorithms for WSD. To create such datasets, we introduce a new interactive system: a fun game designed to produce valuable output by engaging human players in what they perceive to be a cooperative task of guessing the same word as another player. Our system makes a valuable contribution by tackling the knowledge acquisition bottleneck in the WSD problem domain. Rather than using conventional and costly techniques of paying lexicographers to generate training data for machine learning algorithms, we delegate the work to people who are looking to be entertained.

Poster/Demos

Frontiers of a Paradigm: Exploring Human Computation with Digital Games
Markus Krause
GiveALink Tagging Game: An Incentive for Social Annotation
Li Weng, Filippo Menczer
Crowdsourcing Participation Inequality; A SCOUT Model for the Enterprise Domain
Osamuyimen Stewart, David Lubensky, Juan Huerta, Julie Marcotte, Cheng Wu, Andrzej Sakrajda
Mutually Reinforcing Systems: A Method For The Acquisition Of Specific Data From Games With By-Products
John Ferguson, Marek Bell, Matthew Chalmers
A Note on Human Computation Limits
Paul Rohwer

Demos

Reconstructing the World in 3D: Bringing Games with a Purpose Outdoors
Kathleen Tuite, Noah Snavely, Zoran Popovic, Dun-Yu Hsiao, Adam Smith
Improving Music Emotion Labeling Using Human Computation
Brandon Morton, Jacquelin Speck, Erik Schmidt, Youngmoo Kim
Webpardy: Harvesting QA by HC
Hidir Aras, Markus Krause, Andreas Haller, Rainer Malaka
Measuring Utility of Human-Computer Interactions
Michael Toomim, James Landay
Translation by Iterative Collaboration between Monolingual Users
Chang Hu, Benjamin Bederson, Philip Resnik

Friday, May 21, 2010

Prediction Optimizers

The announcement of the Google Prediction API created quite a lot of discussion about the future of predictive modeling. The reactions were mainly positive, but there was one common concern: Google Predict API operates as a black box. You give the data, you train, you get predictions. No selection of the training algorithm, no parameter tuning, no nothing. Black box. Data in, predictions out.

So, the natural question arises: Is it possible to do machine learning like that? Can we trust something if we do not understand the internals of the prediction mechanism?

Declarative query execution and the role of query optimizers

For me this approach, being trained as a database researcher, directly corresponds to the approach of a query optimizer. In relational databases, you upload your data, and issue declarative SQL queries to get answers. The internal query optimizer selects how to evaluate the SQL query in order to return the results in the fastest possible manner. Most of the users of relational databases today do not even know how a query is executed. Is the join executed as a hash-join or as a sort-merge join? In which order do we join the tables? How is the GROUP BY aggregation computed? I bet that 99% of the users have no idea, and they do not want to know.

Definitely, knowing how a database works can help. If you know that you will perform mainly lookup queries on a given column, you want to build an index. Or create a histogram with the distribution statistics for another column. Or create a materialized view for frequently executed queries. But even these tasks today are increasingly automated. The database tuning advisor on SQL Server routinely suggests indexes and partitionings for my databases that I would never thought of building. (And I have a PhD in databases!)

Declarative predictive modeling

I see an exact parallel of this approach in the Google Prediction API. You upload the data and Google selects for you the most promising model. (Or, more likely, they build many models and do some form of meta-learning on top of these predictions.) I would call this a "prediction optimizer" making the parallel with the query optimize that is built within every relational database system today. My prediction (pun intended) is that such prediction optimizers will be an integral part of every predictive analytics suite in the future.

Someone may argue that it would be better to have the ability to hand tune some parts. For example, if you know something about the structure of the data, you may pass hints to the "prediction optimizer", indicating that a particular learning strategy is better. This has a direct correspondence in the query optimization world: if you know that a particular execution strategy is better, you can pass a HINT to the optimizer as part of the SQL query.

Can we do better? Yes (but 99% of the people do not care)

The obvious question is, can we do better than relying on a prediction optimizer to build a model? The answer is pretty straightforward: Yes!

In fact, if you build any custom solution for handling your data, it will be significantly better than any commercial database. Databases have a lot of extra baggage (e.g., transactions) that are not useful in all applications but are slowing down execution considerably. I will not even go to the discussion about web crawlers, financial trading systems, etc. However, these solutions come at a cost (time, people, money...). Many people just want to store and manage their data. For them, existing database systems and their behind-the-scenes optimizers are good enough!

Similarly, you can expect to see many people building predictive models "blindly" using a blackbox approach, relying on a prediction optimizer to handle the details. If people do not care about hash joins vs. sort-merge joins, I do not think that anyone will care if the prediction came from a support vector machine with a radial basis function, from a random forest, or from a Mechanical Turk worker (Yes, I had to put MTurk in the post).

The future

I know that predictions, especially about the future, are hard, but here is my take: We are going to see a market for predictive modeling suites, similar to the market for databases. Multiple vendors will built (or have already built) similar suites. In the same way that today Oracle, Microsoft, IBM, Teradata and so on, compete for the best SQL engine, we will see competition for such turnkey solutions for predictive modeling. You upload the data and then the engines complete for scalability, speed of training, and for the best ROC curve.

Let's see: Upload-train-predict. Waiting for an answer...

Thursday, May 20, 2010

Google Prediction API: Commoditization of Large-Scale Machine Learning?

Today Google has announced the availability of the Google Prediction API: In brief, it allows users to upload massive data sets into the Google Datastore and then let Google built a supervised machine learning model (aka classifier) from the data. This is simply big news!

Google seems to promise great simplicity: Upload data in CSV format and Google takes care of the rest. They select the appropriate model for the data, train the model, report accuracy statistics, and let you classify new instances. Building classifiers from large-scale datasets becomes trivial.

While I have not had the chance to access the API, this seems to be a game changer. The ability to scale models to work with massive datasets was beyond the reach of many, and now suddenly becomes a commodity. Research labs that wanted to built classifiers as tools (and not as the focus of their research) will be able to do so without requiring much expertise. Similarly, startups will be able to use a scalable machine learning infrastructure, without having access to an inhouse expert.

In a sense, it seems to bring machine learning to the masses, bringing the performance baseline to very high level. If Google Predict is "good enough", will people seek for more advanced solutions? The optimizer of MySQL pretty much sucks but it is "good enough" for many.

Will Google Predict make large-scale machine learning a commodity? Does it mean that the value is now in having the data and in feature engineering? Unclear, but definitely a plausible scenario.

I will withhold further commentary until I manage to get access to the API. But I am excited!

Sunday, May 9, 2010

Google Scholar now Supports Email Alerts

While searching Google Scholar, I noticed a new icon:

By clicking this button, you can create an email alert, which notifies you for any new papers that may appear for the given query. You can use it:

For normal queries, getting notifications for queries such as [author:lastname] or [intitle:titleword], or by any other query on Google Scholar
For citation queries, getting information about new papers that cite a given paper. For that, you need to first go to the "Cited by X" page, and then click the alert icon.

At the end, you get a nice list of alerts that can notify you when new papers of a particular author appear on Google Scholar, when new papers about a topic get indexed, or when a paper gets cited (the almighty citation!). Here is how the alert list looks like:

I tries to find some official announcement for this feature and I could not find anything. Not sure if this is rolled out to everyone but it is certainly very very useful!

My Citation Tracker tool becomes less useful now, but I am glad that we will stop trying to implement this feature on top of Google Scholar and Google will start doing it natively. Next step Google: RSS alerts!

Tuesday, May 4, 2010

Crowdsourcing: Not just a cost saver

Whenever I am discussing crowdsourcing, people tend to ask about the "killer app" for crowdsourcing. In my mind, a new technology is being used first for cost reduction, then due to the afforded flexibility, and finally to achieve tasks that were previously infeasible.

As with many technologies (e.g., VoIP), the first applications of crowdsourcing tend to focus on cost reduction. Label images, moderate comments, verify addresses, collect emails... All tasks that were traditionally done by low-paid interns, are now crowdsourced, for significant cost savings. Although these applications tend to be useful, it is hard to see applications focused on cost cutting to be the ones that will drive the wide adoption of crowdsourcing. In any case, if someone has big tasks like that, they can always contact an outsourcing shop in a developing country, and achieve similar rates.

So, flexibility of deployment tends to be another angle that tends to drive the adoption of crowdsourcing. Pretty much as in the case of cloud computing, crowdsourcing can come pretty handy for companies that need significant resources for just a small period of time. LiveOps was a pioneer of this model, applying "crowdsourcing" before the terms was invented, to dynamically staff call centers for telemarketing companies, handling sudden spikes in demand, and so on.

However, the really interesting applications are those that are truly being enabled by such technology. One such idea was given to me by Amanda Michel of Propublica, last September after a Mechanical Turk Meetup: Propublica wanted to monitor the spending of stimulus money, and examine if the projects assigned to different contractors were being completed properly. By analyzing the collected data, it would be possible to identify systematic problems or outright corruption.

Of course, collecting such data would have been impossible for a journalist, or even for a team of journalists. However, using crowdsourcing it would be pretty easy for local residents to check if a bridge has been fixed, if a road was properly paved, and so on. I loved the idea! It was a clear demonstration of how to take an infeasible task, and use crowdsourcing to make it feasible. Even better: it was encouraging the involvement of citizens and their active collaboration with the government. A win-win project!

Interestingly enough, a recent article on BusinessWeek, indicates that the Obama administration is also employing crowdsourcing for the same task! From the article: "It's the surest way to prevent, say, a convicted contractor from reincorporating a new company under his wife's name and applying for stimulus money, explains Earl Devaney, the special inspector general who oversees stimulus spending. "Only local folks can connect those kinds of dots," he says.

Indeed! If only we had such ideas in Greece...

Monday, May 3, 2010

KDD Accepted Papers, Deadlines for HCOMP 2010 and SNAKDD 2010 workshops

The accepted papers for KDD 2010 are now posted and available at http://www.kdd.org/kdd2010/papers.shtml

As a reminder, the KDD this year will take place in Washington DC, from July 25th to July 28th.

I would also like to draw your attention to two KDD workshops that I am involved with, the Human Computation Workshop (HCOMP 2010) and the Workshop on Social Network Mining and Analysis (SNAKDD 2010). Both have submission deadlines on May 7th. It is pretty easy to travel to DC, so if you have any cool idea, or demo that would be appropriate for these workshops, please submit!

For those too bored to visit the respective websites, here are the call for papers:

Human Computation Workshop (HCOMP 2010) - Call for Papers

Most research in data mining and knowledge discovery relies heavily on the availability of datasets. With the rapid growth of user generated content on the internet, there is now an abundance of sources from which data can be drawn. Compared to the amount of work in the field on techniques for pattern discovery and knowledge extraction, there has been little effort directed at the study of effective methods for collecting and evaluating the quality of data.

Human computation is a relatively new research area that studies the process of channeling the vast internet population to perform tasks or provide data towards solving difficult problems that no known efficient computer algorithms can yet solve. There are various genres of human computation applications available today. Games with a purpose (e.g., the ESP Game) specifically target online gamers who, in the process of playing an enjoyable game, generate useful data (e.g., image tags). Crowdsourcing marketplaces (e.g. Amazon Mechanical Turk) are human computation applications that coordinate workers to perform tasks in exchange for monetary rewards. In identity verification tasks, users need to perform some computation in order to access some online content; one example of such a human computation application is reCAPTCHA, which leverages millions of users who solve CAPTCHAs every day to correct words in books that optical character recognition (OCR) programs fail to recognize with certainty.

Human computation is an area with significant research challenges and increasing business interest, making this doubly relevant to KDD. KDD provides an ideal forum for a workshop on human computation as a form of cost-sensitive data acquisition. The workshop also offers a chance to bring in practitioners with complementary real-world expertise in gaming and mechanism design who might not otherwise attend this academic conference.

The first Human Computation Workshop (HComp 2009) was held on June 28th, 2009, in Paris, France, collocated with KDD 2009. The overall themes that emerged from this workshop were very clear: on the one hand, there is the experimental side of human computation, with research on new incentives for users to participate, new types of actions, and new modes of interaction. This includes work on new programming paradigms and game templates designed to enable rapid prototyping, allow partial completion of tasks, and aid in reusability of game design. On the more theoretic side, we have research modeling these actions and incentives to examine what theory predicts about these designs. Finally, there is work on noisy results generated by such games and systems: how can we best handle noise, identify labeler expertise, and use the generated data for data mining purposes?

Learning from HComp 2009, we have expanded the topics of relevance to the workshop. The goal of HComp 2010 is to bring together academic and industry researchers in a stimulating discussion of existing human computation applications and future directions of this new subject area. We solicit papers related to various aspects of both general human computation techniques and specific applications, e.g. general design principles; implementation; cost-benefit analysis; theoretical approaches; privacy and security concerns; and incorporation of machine learning / artificial intelligence techniques. An integral part of this workshop will be a demo session where participants can showcase their human computation applications. Specifically, topics of interests include, but are not limited to:

Abstraction of human computation tasks into taxonomies of mechanisms

Theories about what makes some human computation tasks fun and addictive

Differences between collaborative vs. competitive tasks

Programming languages, tools and platforms to support human computation

Domain-specific implementation challenges in human computation games

Cost, reliability, and skill of labelers

Benefits of one-time versus repeated labeling

Game-theoretic mechanism design of incentives for motivation and honest reporting

Design of manipulation-resistance mechanisms in human computation

Effectiveness of CAPTCHAs

Concerns regarding the protection of labeler identities

Active learning from imperfect human labelers

Creation of intelligent bots in human computation games

Utility of social networks and social credit in garnering data

Optimality in the context of human computation

Focus on tasks where crowds, not individuals, have the answers

Limitations of human computation

Workshop on Social Network Mining and Analysis - Call for Papers

Social networks research has come a long way since the notable “six-degree separation” experiment. In recent years, social network research has advanced significantly, thanks to the prevalence of the online social websites and the availability of a variety of offline large-scale social network systems such as collaboration networks. These social network systems are usually characterized by the complex network structures and rich accompanying contextual information. Researchers are increasingly interested in addressing a wide range of challenges residing in these disparate social network systems, including identifying common static topological properties and dynamic properties during the formation and evolution of these social networks, and how contextual information can help in analyzing the pertaining social networks. These issues have important implications on community discovery, anomaly detection, trend prediction and can enhance applications in multiple domains such as information retrieval, recommendation systems, security and so on.

The fourth SNA-KDD '2010 aims to bring together practitioners and researchers with a specific focus on the emerging trends and industry needs associated with the traditional Web, the social Web, and other forms of social networking systems. Both theoretical and experimental submissions are encouraged. The interesting topics include (1) data mining advances on the discovery and analysis of communities, on personalization for solitary activities (like search) and social activities (like discovery of potential friends), on the analysis of user behavior in open fora (like conventional sites, blogs and fora) and in commercial platforms (like e-auctions) and on the associated security and privacy-preservation challenges; (2) social network modeling, scalable, customizable social network infrastructure construction, dynamic growth and evolution patterns identification and discovery using machine learning approaches or multi-agent based simulation.

The fourth SNA-KDD '2010 solicits contributions on social network analysis and graph mining, including the emerging applications of the Web as a social medium. Papers should elaborate on data mining methods, issues associated to data preparation and pattern interpretation, both for conventional data (usage logs, query logs, document collections) and for multimedia data (pictures and their annotations, multi-channel usage data). Topics of interest include but are not limited to:

Communities discovery and analysis in large scale online and offline social networks

Personalization for search and for social interaction

Recommendations for product purchase, information acquisition and establishment of social relations

Data protection inside communities

Misbehavior detection in communities

Web mining algorithms for clickstreams, documents and search streams

Preparing data for web mining

Pattern presentation for end-users and experts

Evolution of patterns in the Web

Evolution of communities in the Web

Dynamics and evolution patterns of social networks, trend prediction

Contextual social network analysis

Temporal analysis on social networks topologies

Search algorithms on social networks

Multi-agent based social network modeling and analysis

Application of social network analysis

Anomaly detection in social network evolution