A Computer Scientist in a Business School

Friday, July 31, 2009

Workshops: Official or Unofficial Proceedings?

In the process of organizing DBRank 2010, we had to answer the following question: Should the proceedings for the workshop be "official" or "unofficial"?

Official workshop proceedings are undergoing the same process as the conference papers: Specific camera-ready format, submission by a given date to the proceedings chair, and then are officially hosted at the digital library of the publisher, with all the metadata, digital identifiers (DOI), and so on. (For DBRank 2010, that would be IEEE Xplore.) For buraucratic purposes, these papers are considered "publications."

Unofficial proceedings are, well, unofficial. Typically the workshop chair posts the papers up to the website, and potentially brings printed copies for distribution at the workshop. There is no official publisher, there is no DOI assigned to the papers, and in principle this is not more of a publication than a paper posted to a website.

So, should workshops have official or unofficial proceedings?

There are some arguments aganst official proceedings:

Increasingly, there is a significant conflict between workshop and conference publications. With some workshops allowing 8- or even 10-page workshop papers, it becomes hard for the authors of these papers to publish the same work in a conference, as there is typically significant overlap. Most database conferences will consider any past paper that is 3 pages or longer, to be a prior publication, and the conference version should have significant new content in order to be considered a "new" paper.
As conference become increasingly competitive many authors submit to workshops papers that could not "make it" to a conference. A workshop is typically easier to get into, and at the end "you get a paper" out of it. Needless to say, this pretty much violates the spirit of workshops that are supposed to be places for new, relatively immature research, not an archival publication.

On the other hand, there are advantages in having official proceedings:

It makes the workshop more attractive in the eyes of many authors. Authors get an official timestamp for their work and can point to a paper that has at least been lightly refereed, instead of pointing to a technical report or working paper.
It makes it easier for someone to locate the papers that were presented in the workshop. The websites for the workshops are not always hosted in "stable" websites and they disappear for various reasons. (For example, the websites for WebDB'99, WebDB 2000, WebDB 2001, and WebDB 2003 are not available any more, because the organizers have moved to different institutions.)

So, what to do? Official or unofficial?

Thursday, July 30, 2009

Is Amazon Mechanical Turk a black market?

According to Wikipedia, a black market is: "a market where all commerce is conducted without regard to taxation, law or regulations of trade". How is this related to Mechanical Turk?

Today, I received an email, asking about the tax and employment issues regarding Amazon Mechanical Turk. What are the rules about posting tasks on Mechanical Turk? How should these tasks be handled by accounting and human resources departments?

Unfortunately, Amazon did not design Mechanical Turk in a requester-friendly way. In an effort to relieve their accounting and HR department from a big overhead, Amazon transferred to the requesters the risk of violating the US Tax Code and engaging into illegal employment activities.

How can this happen? The key issue is whether there is an employer-employee relationship between the requesters and the workers on Mechanical Turk. The crucial question is:

When you submit funds to your Mechanical Turk account, who are you paying? Amazon.com or the worker?

If it is Amazon, then you are simply letting Amazon deal with all the tax and employment issues associated with the worker: Amazon needs to verify that the worker is eligible for employment, takes care of tax issues, and so on. In this case, hiring someone for a micro-task on Amazon is the same as getting an agency to provide cleaning services to your home: you do not need to care if the person coming to clean your place is eligible for employment, whether the taxes are properly withheld from the paycheck and so on. It is the agency's task to take care of that.

However, Amazon does not follow this route. According to the terms and conditions, paragraph 6.a:

In addition to the disclosures described in our Privacy Notice, we will disclose to Requesters [....] Provider Tax Information. "Provider Tax Information" means tax identification information of Providers, such as a Social Security Number or Employer Identification Number. Requesters use Provider Tax Information to fill out an IRS Form 1099 and send it to Providers. If you are a Requester and want Provider Tax Information from us to complete IRS Form 1099s for Providers you have paid, you must provide us with your company name and employer identification number ("Requester Tax Information"). You hereby consent to disclosure of Provider Tax Information, Requester Tax Information, and other data as described in this Section 6 and our Privacy Notice.

This provision is there because a requester that paid a worker more than 600USD per year, is required to submit 1099-MISC tax forms to these workers. In other words, this tiny provision means that the employer-employee relationship is not between Amazon and the worker but between the requester and the worker. This is in contrast to other marketplaces (e.g., Rent-A-Coder), where the requester pays the marketplace provider, and then the marketplace provider contracts individually the workers, taking care of tax issues, issues of employment authorization and so on.

What are the implications of this policy?

Requesters may be open to the risk of violating employment laws. It is possible that a requester is illegally employing US-based workers that do not have the right to work in the US.
Requesters may be open to the risk of violating the US tax code. The requester needs to keep track of how much they paid each individual worker (out of potentially thousands of workers), and send 1099-MISC tax forms to the workers that did more than 600USD worth of HITs over the year for the requester.

OK, these are the risks. What are the potential counter arguments and how can somene avoid these issues?

The employment-eligibility issue: Amazon pays in cash only people that have US bank accounts. This means that the person, if US-based, is legally in the US. I do not know if Amazon checks for employment eligibility (they should). If the person is not US-based, then Amazon pays through gift cards: From what I know, gift cards are not considered compensation, as we regularly give gift cards as awards to students, without worrying about their eligibility to work, and our accounting department never worried about this practice. So, the issue of illegal employment seems to be rather controlled but it would be nice if Amazon took explictly care of that. Yes, it is a big headache for the HR department of Amazon to handle thousands of micro-contractors, but this is the price to pay for running this service.

The tax issue: At the very least, Amazon should have an automatic service to take care of this issue rather than leaving requesters scramble to track all the micro-payments and send the paperwork. It is trivial: If a given requester-worker pair generated more than 600USD worth of HITs over the year, request tax information and send the 1099-MISC forms on their behalf.

A better solution: Request tax and employment-eligibility information from workers BEFORE they can work on the MTurk marketplace. Also, request tax information from all the requesters BEFORE they can post any tasks on MTurk. Then submit the tax forms automatically at the end of the year.

An even better solution: Adopt the Rent-A-Coder model, and consider the MTurk workers as Amazon contractors. Then requesters buy services from Amazon, in the same way they buy computing power on EC2, storage on S3, and so on. In this case, it is very simple to add the MTurk expense under the "software services" line in the accounting report.

Tuesday, July 14, 2009

How Prices Evolve in Prediction Markets?

Last week, Nikolay presented our paper on "Modeling Volatility in Predictions Markets" at the EC'09 conference. One of the questions that we are answering in this paper is, "what is the most likely price of a prediction market contract at some point in the future?"

Let's start with the expected price. If we assume that the markets are efficient, then the current price of the contract is the best possible estimate for the future expected price. However, the current price is NOT the most likely price in the future. In fact the probability of the contract will have the same price in the future is decreasing with time. Why? Because the final price of the contract as we get closer to the expiration will get closer to 0 or 1, as the uncertainty about the outcome decreases over time. So, while the expected price will be equal to the current price, most of the future prices will be closer to 0 and 1.

Below you can see some 3d plots of the "future price density" as a function of the future price P and the time to expiration t. We assume that "now" is t=0 and the contract expires at t=1.

If the current price is 0.5, then the future price density, as a function of the future price P and the time to expiration t, is:

As you can see, the possible prices, when we are close to t=0, are clustered around the current price (in this case 0.5). Then, as we move closer to the expiration, the probability density moves closer to 0 and 1. As this contract had price 0.5, the plot is completely symmetric around the axis P=0.5.

If we have a current contract price at 0.4, then the density becomes more skewed towards 0:

And here is an even more skewed plot, with the current contract price at 0.25:

Just in case you want to create your own plots, here is the Maple code:

with(stats);

normpdf:=(x,mu,sigma)->statevalf[pdf,normald[mu,sigma]](x);

spdf:=x -> normpdf(x,0,1);

normicdf:=(p,mu,sigma)->statevalf[icdf,normald[mu,sigma]](p);

sicdf:=x->normicdf(x,0,1);

f:= (pnow,pfuture,lambda) -> spdf ( sqrt(1/lambda) * sicdf(pnow) - sqrt(1/lambda-1) * sicdf(pfuture))*sqrt(1/lambda-1)/spdf(sicdf(pfuture));

plot3d(eval((f(p,P,t)), {p=0.5}), P=0..1, t=0.1..0.75, axes=boxed, shading=zhue, orientation=[-120, 50]);

So, what can we do with these results? One application is to price the X contracts on Intrade: In the "X" contracts, the question is about the future price movement of a prediction market contract (e.g., "will the contract for Democrats winning the 2012 election be above 0.75 on December 31st, 2010?").

These X contracts are similar to the existing "call" and "put" options on the stock market, where people try to guess where the price of a stock will be in the future. There is a significant difference, though: When a trader prices and trades a call/put option for a share, (e.g., using the vanilla Black-Scholes formula) the trader needs to guess the future volatility of the share price. Through this process, the trade gives to the public valuable information about the future volatility of share price. For prediction markets, trading an X contract does not reveal the same information. Our work shows what the exact form of future price distributions, without the need to provide any volatility estimates. (Volatility can be largely determined by the current price and time to expiration; see the past blog post and the EC'09 paper for details.) So, pricing an X contract requires just to plug in the current price, time to expiration, and strike price (information that is already public) to find the "correct" price for the X contract.

So, am I saying that the X contracts are completely useless? No. But the information revealed by trading these contracts is significantly less compared to the information revealed by trading options on stocks.

Saturday, July 4, 2009

Books, Journals, Conferences, Blogs

I was reading the overview on Open Access Overview by Peter Suber, and I ran into the following paragraph:

Scholarly journals do not pay authors for their articles, and have not done so since the first journals were launched in London and Paris in 1665. Journals took off because they were more timely than books. For readers, journals were better than books for learning quickly about the recent work of others, and for authors they were better than books for sharing new work quickly with the wider world and, above all, for establishing priority over other scientists working on the same problem. They gave authors the benefit of a fast, public time-stamp on their work. Because authors were rewarded in these strong, intangible ways, they accepted the fact that journals couldn't afford to pay them. Over time, journal revenue grew but authors continued in the tradition of writing articles for impact, not for money.

It was amusing to see that there was this transition from books to journals, for pretty much the same reason that in computer science we have seen a transition from journals to conferences. I am wondering if the senior scholars of the day were commenting on this transition in the same way that Mike Trick commented on the similar tension between journal and conference publications:

if a subgroup chooses a method of evaluation antithetical to the mores of the rest of academe, don’t be surprised if the group gets little respect outside their narrow group

So may be a few years from now, we will see a similar problem as people will start leaving "traditional" peer-reviewing behind, opting for new modes of publication, such as self-publishing. Michael Nielsen has an excellent article on the disruption of scientific publishing. Micheal points to the high quality blog posts from high-quality researchers:

Look at Terry Tao’s wonderful series of posts explaining one of the biggest breakthroughs in recent mathematical history, the proof of the Poincare conjecture. Or Tim Gowers recent experiment in “massively collaborative mathematics”, using open source principles to successfully attack a significant mathematical problem. Or Richard Lipton’s excellent series of posts exploring his ideas for solving a major problem in computer science, namely, finding a fast algorithm for factoring large numbers.

So, does the future of publication rely on self-publishing? Daniel Lemire may be right saying:

To me, the single most important recent event in academic publishing has been the publication by Perelman of his solution to the Poincarré conjecture on arxiv. This is truly a historical event.

Will this change alter fundamentally the way academia works? I do not think so. It will simply mean that every scholar will be very careful about the quality of the work that is self-published. When everyone can speak, people will only listen to those that generate content of high quality, effectively ignoring those that publish for the sake of publishing.