A Computer Scientist in a Business School

Tuesday, January 26, 2010

Universities and Intellectual Property: A Minefield?

One of the things that I never understood at NYU is what are the rights that the university has on the work produced by the faculty members and students.

Following the intellectual properly law, there are four basic types of intellectual properly:

Copyright
Patents
Trademarks
Trade Secrets

If someone works for a corporation, things are pretty clear. Any paper, before being published needs to get approval. Any developed algorithms and code written is the intellectual property of the company and the company owns the copyright for the code, and can treat the algorithms as a trade secret. The company may also patent useful inventions and register some valuable trademarks. But, in all these cases, everything that is being produced within the corporation is work made for hire and owned by the corporation. The employee has typically no ownership of the produced work and it is commonly prohibited for the employee to work for another company or provide any sort of consulting services..

For the work of faculty, I always felt that everything falls into a grey area. Most of the work is made public as soon as possible. Code is often released as open source, following some pretty liberal licensing scheme, or even released to the public domain. Papers are written and publicized without much, if any, vetting and the algorithms and methods described there are typically in the public domain. The only case where a university has some control over intellectual property is when a patent is filed and granted.

Now, the great confusion arises when the faculty wants to work with a corporation and the university allows faculty members to engage into consulting agreements. Who owns and controls the expertise and discoveries of the faculty member?

Let's say that myself, Panos, invented an algorithm in area X, wrote a paper, and published the code in an open source format. Corporation A, comes to me and asks me to consult them on area X. What is the control that my employer, NYU, has on my work? Yes, Corporation A wants to hire me because of the IP that I produced while at NYU. This IP though is publicly available, so I do not really transfer anything protected under copyright law.

I have asked this question to our own tech transfer office. Unfortunately, I did not get back a clear answer. They told me that I cannot transfer code and that any patent is owned by NYU. Correct, these are indeed intellectual property assets. (Although for the case of open source code, this is again confusing.) But what about the expertise that a faculty member develops? In corporations this is often protected using some no-compete clauses in the employment contract, effectively preventing employees from directly transferring know-how. In universities, there is no such provision.

I find this merging of academic and corporate worlds to be particularly confusing and I find this to be a potential minefield. Who owns what? Any ideas? Any experiences? How other universities treat the concept of tech transfer?

Monday, January 25, 2010

Did you find this helpful?

Last week, the New York Times Sunday magazine had an article titled The Reviewing Stand, starting with the following:

Here’s a challenge for students of expository writing: review a popular product on Amazon and aim to get your review chosen by readers as “most helpful.” It’s dead hard. The product review, as a literary form, is in its heyday. Polemical, evocative, witty, narrative, exhortative, furious, ironic, off the cuff....

What I found amusing was the fact that, after reading this article, I got a notification that the journal version of the paper Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics, co-authored with my frequent co-author, Anindya Ghose, has been accepted for publication at the IEEE Transactions on Knowledge and Data Engineering (TKDE) journal.

As the title suggests, one of the problems that we attack in the paper is how to predict the usefulness of a product review. For example, if you go on Amazon, you will see, on top of many reviews, how many people considered a particular product review helpful:

So, the question is: Can we predict how helpful a particular review will be?

Our first attempts to address this problem appeared in the WITS 2006 and the ICEC 2007 papers. Following the scientific zeitgeist, a large number of other papers appeared these years, all tackling the question of predicting helpfulness of reviews. (See the actual paper for references.)

What I found rather surprising was the relative easiness of the task. A few relatively straightforward features can be used to predict with good accuracy whether a review will be deemed helpful or not.

Check the readability of the article, as measured by one of the many readability metrics, check the number of spelling errors, and measure basic statistics of the text, such as review length. Using just the readability and the fraction of spelling errors in the article we can estimate with 70%-80% accuracy whether a review will be deemed helpful or not.
Check for spelling errors in the article and check the grammar: To get a proxy variable for the spelling errors, just compare all the words in the review with words in an online English dictionary. If the word does not appear in the dictionary, the probability is high that it can be a typo. (Yes, I know about acronyms, proper names, etc. We care about a rough proxy). It is also possible to check the grammar (although that did not make it into the paper): Just compare the log-likelihood of a particular review based on the frequencies of its unigrams, bi-grams, and tri-grams, compared to the statistics from Google N-grams. If the likelihood is very low, then the review is likely to have grammatical errors. To ensure that the log-likelihood is comparable across reviews, we compare the log-likelihood of each review with the median likelihood of other reviews with similar readability scores. (Update: Amazingly enough, Zappos noticed the same thing and took action to improve the spelling and grammar of its reviews.)
Check the history of the reviewer. If the reviewer has been writing helpful reviews in the past, it is highly likely that reviews in the future will also be helpful. Also, if a reviewer has disclosed personal details (name, location, etc) the reviews are more likely to be helpful. Again, using just reviewer history and disclosure details, we get 70%-80% accuracy, as measured with the AUC metric.
Check the "subjectivity" of the review. We call a review objective if it contains mainly information that can be found in the product description and specs. A subjective review contains information that depends on the personal experiences of the reviewer. Helpful reviews tend to contain a mix of both.

Interestingly enough, all three feature sets seem to have equivalent predictive power. Even using them all together does not seem to increase substantially the predictive performance.

While preparing the final version of the paper, I also checked other papers that were attacking the same problem. While many papers were trying to predict helpfulness using textual features, I noticed that a few papers were using a set of alternative and interesting features:

Coverage of product features. Many products can be considered an aggregation of multiple product features. For example, a digital camera has resolution, size, battery life, sensor size, etd. How many product features are being discussed in the review? This feature tends to have predictive power, according to (Liu et al, EMNLP 2007).
Dynamics of reviews. Reviews that are posted early on get a higher fraction of helpful votes. In contrast, later reviews need to be more informative and comprehensive to attract the same fraction of helpful votes (Liu et al, EMNLP 2007).
Controversy. The helpfulness of a review depends not only on its own content but also on how controversial is the product under consideration (Danescu-Niculescu-Mizil, WWW 2009).
Social network of reviewers. If reviewer A trusts the reviews of reviewer B, then the reviews of B are likely to be more helpful than the reviews of A. ("Exploiting Social Context for Review Quality Prediction"; by Lu, Tsaparas, Ntoulas, and Polanyi; WWW 2010)

Although I have not seen a paper combining all the above features in order to predict the helpfulness of a review (or for ranking reviews by helpfulness), I guess that these set of features will bring predictive accuracy pretty close to its limit for this task.

What is next? I guess personalized recommendations are going to appear sooner or later, matching users with reviews that are more likely to benefit them. (Update: See Eugene's comment below for related papers.) For example, a beginner in photography will be interested in a different type of review when buying an SLR, compared to a seasoned professional. We already know that reviews from similar users can be used for recommending products (see Netflix) so it is not unlikely that different types of reviews will be deemed helpful by different types of users.

So, did you find this blog post useful?