### Future of Education: Fighting Obesity or Fighting Hunger?

I have been following with interest the discussion about the future of education.

Some people criticize existing educational institutions, indicating that they offer little in terms of real training, and that real learning occurs outside the classroom, by actually doing. "Nobody learns how to build a system in a computer science class." "Nobody learns how to build a company in an entrepreneurship program."

Others are lamenting that by shifting to training-oriented schemes, we are losing the ability to offer deeper education, on topics that are not marketable. Who is going to study poetry if it has no return on investment? Who is going to teach literature if there is no demand for it?

These two criticisms seem to be pushing in two different directions.

In reality, we need to address two different needs:

One need is to really try and democratize education, trying to take the content of the top courses and make it accessible and available to everyone. People that want to learn machine learning, can now take courses from top professors, instead of having to read a book. People can now advance their careers easily, without having to enroll to expensive degree programs.

The other need is to preserve the breadth of education, shielding it from market forces. This need wants to preserve the structure where students during their education get exposed to diverse fields, no matter if there is a market and demand for these fields.

Mass production of food pretty much solved the problem of world hunger. A few decades ago, there was a real problem with world hunger. Famine was a real problem in many areas of the world, due to the inability to produce enough food to feed the growing population: floods, droughts, diseases were disrupting production, resulting in shortages. Today, the advances in agriculture allow the abundant production of grains and food: wheat and rice varieties are now robust, resistant to diseases, adaptable to many different climates, and allow us to feed the world.

The advances that solved the problem of world hunger, ended up creating other problems. Processed carbohydrates and causing obesity, diabetes, gout, and many other "luxury" diseases in the developed world. The poor in the developed world are not dying because they are hungry. They are dying by starving themselves from essential ingredients in their diet.

The parallels are striking. The MOOCs, Khan Academies, and Code Academies of the world are the genetically modified foods for those living in the "third world of education". These courses may not be the most nutritious, and they may not provide all the "nutrition" for their education. However, the choice for many of these people in the "third world of education" is not Stanford vs. a Coursera MOOC. It is nothing vs. a Coursera MOOC. Given the choice, take the MOOC at any time.

Those that live in the "developed world of education" can be pickier. They may have access to the genetically modified MOOCs, but if they can afford it, the organic, artisanal, locally sourced education can be potentially better than the mass produced MOOC.

Horses for courses (pun intended).

### Crowdsourcing research: What is really new?

A common question that comes up when discussing research in crowdsourcing, is how it compares with similar efforts in other fields. Having discussed these a few times, I thought it would be good to collect all these in a single place.
• Ensemble learning: In machine learning, you can generate a large number of "weak classifiers" and then build a stronger classifier on top. In crowdsourcing, you can treat each human as a weak classifier and then learn on top. What is the difference? In crowdsourcing, each judgement has a cost. With ensembles, you can trivially easy create 100 weak classifiers, classify each object, and then learn on top. In crowdsourcing, you have a cost for every classification decision. Furthermore, you cannot force every person to participate, and often you have a heavy-tailed participation: A few humans participate a lot, but from many of them we get only a few judgments.
• Quality assurance in manufacturing: When factories create batches of products, they also have a sampling process where they examine the quality of the manufactured products. For example, a factory creates light bulbs, and wants 99% of them to be operating. The typical process involves setting aside a sample for testing and testing if they meet the quality requirement. In crowdsourcing, this would be equivalent to verifying, with gold testing or with post-verification, the quality of each worker. Two key differences: The heavy-tailed participation of workers means that gold-testing each person is not always efficient, as you may end up testing a user a lot, and the the user may leave. Furthermore, it is often the case that a sub-par worker can still generate somewhat useful information, while for tangible products, the product is either acceptable or not.
• Active learning: Active learning assumes that humans can provide input to a machine learning model (e.g., disambiguate an ambiguous example) and the answers are assumed to be perfect. In crowdsourcing this is not the case, and we need to explicitly take the noise into account.
• Test theory and Item Response Theory: Test theory focuses on how to infer the skill of a person through a set of questions. For example, to create a SAT or GRE test, we need to have a mix of questions of different difficulties, and we need to whether these questions really separate the persons that have different abilities. Item Response Theory studies exactly these questions, and based on the answers that users give to the tests, IRT calculates various metrics for the questions, such as the probability that a user of a given ability will answer correctly the question, the average difficulty of a question, etc. Two things make IRT unapplicable directly to a crowdsourcing setting: First, IRT assumes that we know the correct answer to each question; second, IRT often requires 100-200 answers to provide robust estimates of the model parameters, a cost that is typically too high for many crowdsourcing applications (except perhaps the citizen science and other volunteer based projects).
• Theory of distributed systems: This part of CS theory is actually much closer to many crowdsourcing problems than many people realize, especially the work on asynchronous distributed systems, which attempts to solve many coordination problems that appear in crowdsourcing (e.g. agree on an answer). The work on analysis of byzantine systems, which explicitly acknowledges the existence of malicious agents, provides significant theoretical foundations for defending systems against spam attacks, etc. One thing that I am not aware of, is the explicit dealing of noisy agents (as opposed to malicious ones), and I am not aware of any study of incentives within that context that will affect the way that people answer to a given question.
• Database systems and User-defined-functions (UDFs): In databases, a query optimizer tries to identify the best way to execute a given query, trying to return the correct results as fast as possible. An interesting part of database research that is applicable to crowdsourcing is the inclusion of user-defined-functions in the optimization process. A User-Defined-Function is typically a slow, manually-coded function that the query optimizer tries to invoke as little as possible. The ideas from UDFs are typically applicable when trying to optimize in a human-in-the-loop-as-UDF approach, with the following caveats: (a) UDFs were considered to be return perfect information, and (b) the UDFs were assumed to have a deterministic or a stochastic but normally distributed execution time. The existence of noisy results and the fact that execution times with humans can be often long-tailed make the immediate applicability of UDF research in optimizing crowdsourcing operations rather challenging. However, it is worth reading the related chapters about UDF optimization in the database textbooks.
• (Update) Information Theory and Error Correcting Codes: We can model the workers are noisy channels, that get as input the true signal and return back a noisy representation. The idea of using advanced error correcting codes to improve crowdsourcing is rather underexplored, imho. Instead we rely too much on redundancy-based solutions, although pure redundancy has been theoretically proven to be a suboptimal technique for error correction. (See an earlier, related blog post.) Here are a couple of potential challenges: (a) The errors of the humans are very rarely independent of the "message" and (b) It is not clear if we can get humans to compute properly functions that are commonly required for the implementation of error correcting codes. See a related e
• (Update) Information Retrieval and Interannotator Agreement: In information retrieval, it is very common to examine the agreement of the annotators when labeling the same set of items. My own experience with reading the literature, and the related metrics is that they implicitly assume that all workers have the same level of noise, an assumption that is often violated in crowdsourcing.
Any other fields and what other caveats that should be included in the list?

### Badges and the Lake Wobegon effect

For those not familiar with the term, the Lake Wobegon effect is the case when all or nearly all of a group claim to be above average, and comes from the finctional town where "all the women are strong, all the men are good looking, and all the children are above average."

Interestingly enough, as Wikipedia states, this effect of the majority of the group thinking that they are performing above-average "has been observed among drivers, CEOs, hedge fund managers, presidents, coaches, radio show hosts, late night comedians, stock market analysts, college students, parents, and state education officials, among others."

So, a natural question was whether this effect also appears in an online labor setting. We took some data from an online certification company, similar to Smarterer, where people take tests to show how well they know a particular skill (e.g., Excel, Audio Editing, etc.) The tests are not pass/fail but more like a GRE/SAT score: there is no "passing" score, only a percentile indicator that shows what percentage of other participants have a lower score.

Interestingly enough, we noticed a Lake Wobegon effect there as well: Most of the workers that displayed the badge of achievement, have scores above average, giving yet another point for the Lake Wobegon effect.

Of course, this does not mean that all users that took the test performed above average. Test takers have the choice to make their final score public to the world, or keep it private. Given that the user's profile is also used in a site where employers look for potential hires, there is some form of strategic choice in whether the test score is visible or not. Having a low score is often worse than having no score at all.

So, we wanted to see what scores make users comfortable with their performance, and incentivizes them to display their badge of achievement. Marios analyzed the data, and compared the distribution of scores for workers that decided to keep their score private, compared to the workers that made their performance public. Here is the outcome:

It becomes clear that scores below 50% are not posted often, while scores that exceed 60% have significantly higher odds of being posted online for the world to see. This becomes more clear if we take the log-odds of a worker deciding to make the score public, given the achieved percentile:

So, in the world of online labor if you ever hire someone who chose to display a certification, you know that there are good chances that you picked a worker that is better than average, at least in the test. (We have some other results on the predictive power of tests in terms of work performance, but this is a topic that cannot fit into the margins of this blog post :-)

Needless to say, this effect illustrates a direction that will take crowdsourcing, and labor markets in general, out of the race-to-the-bottom, market-for-lemons-style, pricing, where only price can separate the various workers. As education history serves in an offline setting as signaling for the potential quality of the employee, we are going to see more and more globally recognized certifications replacing educational history for many online workers.

### CrowdScale workshop at HCOMP 2013

• Few people know how to hire: Ask any startup CEO how easy is to hire an employee. It is a pain. The art and craft of inferring the match of an individual to a given task is a very hard problem. Few people know how to do it right. Even within Google and Microsoft, with their legendary interviewing processes, interviewing is seen by many as a hard, time-consuming, and unrewarding experience.
• Few people know how to manage a project: Even fewer people know how to manage a project. The harrowing fact is that most people believe that they can. Most people hire someone, hoping that the employee will be in their head, will understand what these vague specifications mean, will know everything that is not documented in a project, and will be able to do a great job. Very few people realize that outsourcing a project means that you will need to spend significant amount of time managing the project.
The result of the combination of these factors? Online labor does not scale through manual hiring. (Of course, this is not unique to online outsourcing. Offline hiring has the same problem.) There are simply not enough qualified employers that can hire effectively, who will be able to create demand for jobs for the online labor markets to continue to grow.

Online hiring vs online shopping

The counter-argument is that labor was always like that. Since the market for labor operates "manually," the transition to electronic hiring will allow for growth. In the same way people were initially afraid of shopping online, they started buying things online, they are going to switch to hiring online.

I do not buy this argument. When people buy an item online, they buy a standardized product. They are not ordering a bespoke item, which is created according to the customer specifications. Customization is typically limited and allowed on a specific set of dimensions. You can customize your Mac to have a better processor, more memory, and a larger hard disk. But you cannot order a laptop with a 19 in screen, and cannot ask for 96 Gb of memory.

But in online markets this is what happens. The random customer comes and asks for a web application ("just the functionality of the X website"), and wants this app to be built for $500. It is the same as if someone goes to a computer store and asks for a laptop with a 19 inch screen, with 128Gb of memory, and 10Tb disk. And, since 1Gb of memory costs 7 dollars, it is reasonable to just pay$1000 for 128Gb, right?

Lessons from online shopping

Based on the experience for the transition of shopping from offline to online, let's see how online labor can move forward.
1. Standardize and productize: Currently, in online markets, most people ask for a specific set of tasks. Content generation, website authoring, transcriptions, translations, etc. Many of these can be "productized" and be offered as standardized packages, perhaps with a few pre-set customizations available. (Instead of "select the hard disk size, you have a "select blog post length".) This vertical-oriented strategy is followed by many crowdsourcing companies and offers to the client a clean separation from the process of hiring and managing a task. This vertical strategy works well to create small offerings but it is not clear if there is sufficient demand within each vertical to fuel the growth expected for a startup. This is a topic for a new blog post.
2. Productize the project creation/management: When a standardized offering is not sufficient, the client is directed into hiring a product manager that will spec the requirements, examine if there is sufficient supply of skills in the market, hire individual contractors, manage the overall process, etc. This is similar to renovating a house. The delivered product is often completely customized, but the client does not seek to hire separately electricians, carpenters, painter, etc. Instead, the owner hires a "general contractor" who creates the master plan for the renovation, procures the materials, hires subcontractors, etc. While it eases some of the problems, this is a process suitable only for reasonably big project.
3. Become a staffing agency: A problem with all existing marketplaces is that they are not acting as employers, but only as matching agents. Few, if any, marketplaces are guaranteeing quality. Every transaction is a transaction between "consenting adults." Unfortunately, very few potential employers understand that, and hire with the implicit assumption that the marketplace is placing a guarantee on the quality of the contractors. So, if the contractor ends up being unqualified for the task, there is very little recourse. By guaranteeing quality, the employer (who is the one spending the money) gets some minimum level of guarantee about the deliverable. Unfortunately, providing such quality guarantees is easier said than done.
4. Let contractors build offerings: By observing the emergence of marketplaces like Etsy, you can see that people are becoming more comfortable with ordering semi-bespoke, handcrafted items online, for which they have little information. A potential route is to allow the contractors in online markets to build such "labor products" and price them themselves, in the same way that Etsy sellers are putting up their handcrafted stuff online.
All these approaches are fine, and I expect most current marketplaces to adopt one or more of these strategies over time. However, all of them rely on the same assumption: That hiring, as shopping, will be a human activity.

What happens, though, if we stop assuming that hiring is a human-mediated effort?

Crowdsourcing practices to the rescue

I will not pretend that the current state of the crowdsourcing industry offers concrete solutions to the problems listed above. But today's efforts in crowdsourcing move us towards an algorithmically-mediated work environment.

Of course, like all automatic solutions, the initial environment is much worse than "traditional" approaches. We see that in all the growing pains of Mechanical Turk. It is often easier to just hire a couple of trusted virtual assistants from oDesk to do the job, instead of trying to implement the full solution stack to get things done properly on MTurk.

However, the initial learning curve starts paying off later. Production environments that rely on a "crowd" need to automate as much as possible the hiring and management of workers. This automation makes the tasks much more scalable than traditional hiring and project management. High-startup costs, then lower marginal costs of adding workers to a process.

This leads to easier scalability. Of course, the moment the benefits of easier scalability start becoming obvious, it will be too late for players that rely on manual hiring to catch up. It is one of the reasons that I believe that Mechanical Turk has the potential to be the major labor platform, even if this seems a laughable proposition at this point.

I will make a prediction: Crowdsourcing is currently at the forefront of defining the methods and practices in the workplace for the next few decades. Assembly lines and integration of machines in the work environment led to the mass production revolution of the 20th century. The current crowdsourcing practices will define how the majority of people are going to work on knowledge tasks in the future. A computer process will monitor and manage the working process, and hiring manually will be soon a thing of the past, for many "basic" knowledge tasks.

Some will find this prospect frightening. I do not find it any more frightening than having traffic lights regulate traffic in intersections, or having the auto-pilot taking care of my flight.