In crowdsourcing, redundancy is a common approach to ensure quality. One of the questions that arises in this setting is the question of equivalence. Let's assume that a worker has a known probability $q$ of giving a correct answer, when presented with a choice of $n$ possible answers. If I want to simulate one high-quality worker workers of quality $q$, how many workers of quality $q' < q$ do we need?
Information Theory and the Noisy Channel Theorem
Information theory, and the noisy channel theorem, can give an answer to the problem: Treat each worker as a noisy channel, and measure the "capacity" of each user. Then, the sum of the capacities of the different workers should give us the equivalent capacity of a high-quality worker.
We have that the capacity $C(q,n)$ of a worker with quality $q$, who returns the correct answer with probability $q$, when presented with $n$ choices, is:
$C(q,n) = H(\frac{1}{n}, n) - H(q, n)$
where $H(q,n) = -q \cdot \log(q) - (1-q) \cdot \log(\frac{1-q}{n-1}) $ is the entropy (aka uncertainty) of the worker.
Examples
The value $H(\frac{1}{n}, n) = \log(n)$ is the initial entropy that we have for the question, when no answers are given. Intuitively, when $n=2$, the initial uncertainty is equal to $\log(2)=1$ bit, since we need one bit to describe the correct answer out of the 2 available. When $n=4$, the uncertainty is equal to $\log(4)=2$ bits, as we need 2 bits to describe the correct answer out of the 4 available.
Therefore, a perfect worker, with quality $q=1$ will have $H(1,n)=0$ entropy, and therefore the capacity of a perfect worker is $\log(n)$.
Can Two Imperfect Workers Simulate a Perfect One?
Now, here comes the puzzle. Assume that we have $n=4$, and the workers have to choose among 4 possible answers. We also have also two workers with $q=0.85$, that select with 85% probability the correct answer out of 4 available. These workers have each capacity equal to $C(0.85, 4) = 1.15$ bits. At the same time, we have one perfect worker with $q=1$. This worker has a capacity of $C(1,4)=2$ bits. So, in principle, the two noisy workers are sufficient to simulate a perfect worker (and would leave a remaining 0.3 bits to use :-)
What am I missing?
My problem is that I do not get how to reach this theoretical limit. I cannot figure out how to use these two workers with $q=0.85$, in order to reconstruct the correct answer. Asking two workers to work in parallel will not cut it (still possible for both workers to agree and be incorrect). Sequential processing (get first a worker to select two out of the four answers, then the second one pick the correct out of the two) seems more powerful, but again I do not understand how to operationalize this.
According to information theory, these two $q=0.85$ workers are equivalent, on average, with one perfect $q=1.0$ worker. (Actually, they seem to carry more information). And even if we avoid perfection, and we set target quality at $q=0.99$, $C(0.99,4)=1.9$. I still cannot see how I can combine two workers with 85% accuracy to simulate a 99% accurate worker.
- Update 1 (thanks to Michael Nielsen): Information theory operates over a large amount of transmitted information, so posing the question as "answering a single question" makes it sound more impossible than it should.
We need 2 bits of information to transfer the answer for a multiple choice question with n=4 choices. Say that we have a total of N such questions. So, we need 2N bits to transfer perfectly ann the answers. If we have perfect workers, with $q=1$, we have that $C(1,4)=2$, and we need 2N bits / 2 bits/answer = N answers, from these workers.
Now, let's say that we have workers with $q'=0.85$. In that case $C(0.85, 4) = 1.15$ bits per answer. Therefore, we need 2N bits / 1.15 bits/answer = 1.74N answers from these 85% accurate workers in order to perfectly reconstruct the answers for these N questions.
So, if we get from these 85% workers a total of 100 answers (each one 85% correct), we should be able to reconstruct the 100% correct answer for ~57 (=100/1.74) questions.
Of course we should be intelligent of what exactly to ask and get these 100 answers.
I see in Wikipedia, in the article about the
noisy channel theorem, that "
Simple schemes such as 'send the message 3 times and use a best 2 out of 3 voting scheme if the copies differ' are inefficient error-correction methods" and that "
Advanced techniques such as Reed–Solomon codes and, more recently, turbo codes come much closer to reaching the theoretical Shannon limit". Unfortunately, my familiarity with such coding schemes is minimal (i.e., I have no clue), so I cannot understand their applicability in a crowdsourcing setting.
So, here is my question: What coding schemes should we use in crowdsourcing in order to get closer to the theoretical limits given by Shannon? Or what is the fundamental thing that I miss? Because I do feel that I am missing something...
Any help would be appreciated.
- Update 2 (thanks to the comments by stucchio and syrnik): Information theory predicts that we can always recover the perfect answers from noisy workers, given sufficient worker capacity. For anyone that has worked in crowdsourcing, this sounds very strange, and seems practically infeasible. The problem does not seem to be in the assumptions of the analysis; instead it seems to rely on the feasibility of implementing a proper encoding scheme on top of human computation.
The key concept in information theory is the coding scheme that is used to encode the information, to make the transmission of information robust to errors. Information theory does not say how we can recover this perfect information using a noisy channel. Over time, researchers came up with appropriate encoding schemes that approach very closely the theoretical maximum (see above, Reed-Solomon codes, turbo codes, etc). However, it is not clear whether these schemes are translatable into a human computation setting.
Consider this gross simplification (which, I think, is good enough to illustrate the concept): In telecommunications, we put a "checksum" together with each message, to capture cases of incorrect information transmission. When the message gets transmitted erroneously, the checksum does not match the message content. This may be the result of corruption in the message content, or the result of corruption in the checksum (or both). In such cases, we re-transmit the message. Based on the noise characteristics of the channel, we can decide how long the message should be, how long the checksum should be, etc., to achieve maximum communication efficiency.
For example, consider using a parity bit, the simplest possible checksum computation. We count the number of 1 bits in the message: if the number of 1's is odd, we set the parity bit to 1, if the number of 1's is even, we set the parity bit to 0. The extra parity bit increases the size of the message but can be used to detect errors when the message gets transmitted over a noisy channel, and reduce the error rate. By increasing the number of parity bits we can reduce the error rate to arbitrarily low levels.
In a human computation setting, computing such a checksum is highly non-trivial. Since we do not really know the original message, we cannot compute at the source an error-free checksum. We can of course try to create "meta"-questions that will try to compute the "checksum" or even try to modify all the original questions to have an error-correcting component in them.
See now the key difference: In information theory, we have computed error-free the message to be transmitted with built-in error-correction. Consider now the same implementation in a human computation setting: We ask the user to inspect the previous k questions, and report some statistics about the previously submitted answers. The user now operates on the noisy message (i.e., the given, noisy answers), therefore even the error-free computation of the checksum is going to be noisy, defeating the purpose of an error-correcting code.
Alternatively, we can try to take the original questions, and try to ask them in a way that enforces some error-correcting capability. However, it is not clear that these "meta-questions" are going to have the same level of complexity for the humans, even if in the information-theoretic sense, they carry the same amount of information.
It seems that in a human computation setting we have noisy computation, as opposed to noisy transmission. Since the computation is noisy, there is a good chance that the computation these "checksums" is going to be correlated with the original errors. Therefore, it is not clear whether we can actually implement the proper encoding schemes on top of human computers, to achieve the theoretical maximums predicted by information theory.
Or, at least, this seems like a very interesting, and challenging research problem.