Thursday, January 19, 2012

Identify Verification (and how to bypass it)

Most marketplaces, in order to function properly, require some form of identification of their users. It is a well-known problem that the ability of participants to generate easily new identities can lead to many problems.

At the very basic level, if participants can easily create new identities, then reputation systems lose part of their power: If a user gets scores that are mediocre or bad, then it is often preferable to abandon the account with the bad scores and start again. Even more importantly, Sybil attacks, where one participant generates multiple accounts, can fool many systems that rely on peer evaluation, or assume that users are independent of each other.

For example, in services such as Mechanical Turk, which rely on redundancy to ensure high-quality answers, many spammers create multiple accounts and try to attack simple tasks by entering the same answer in all questions. I also remember Luis von Ahn was describing an attack against reCAPTCHA, where 4chan users attacked reCAPTCHA by trying to guess which of the two words was the known one, and entering "penis" as the other word :-)

It is therefore not surprising the most marketplaces attempt to have some form of identification service. A form of identification that is considered strong is to ask for unique ID element from the registering users, e.g., the SSN of the participant, asking for place of birth, etc. Interestingly enough, it is trivially easy to bypass many such identity tests.

Go and check the website Fake Name Generator. You can specify the characteristics of the name that you want, and you get back an entry that you want. Someone with Japanese heritage living in the US? Sure thing, here is the entry for Mr. Souma Miura:


You prefer something more exotic? May a person of Icelandic origin living in Cyrpus? No problem:


Interestingly enough, I was able to fool quite a few websites that supposedly guarantee for the identity of their participants. All of them accepted without problems the fake identities, and in some cases even the credit card numbers (not for actual charges but the fake credit card numbers were accepted as legitimate credit cards to create a profile). For obvious reasons, I will not reveal the names of the victims :-)

So, how can a market secure better against identity attacks? Here a few examples that I encountered:

  • On Embee Mobile the payment to the workers is free talk time for their cell phone. While it is definitely possible to change the SIM card and the phone number, this is definitely not a cheap generation of identities.
  • On oDesk, as part of the identification, participants are asked to send scans of their driving license and of their bank statements, in order to unlock the ability to apply to large (more than 5) projects. While it is certainly possible to fake those, it is unclear what someone can do with the money collected to an account if the cash cannot be withdrawn to a bank.
Perhaps in the future we will see the emergence of identification services for individuals. We already have such services for websites (e.g., Verisign). It is conceivable that someone will be able to guarantee for the identify of a person, but you can see already the Big Brother concerns that such a service will raise.