Monday, December 13, 2010

Sharing code, API's, and a Readability API

Yesterday, I received an email from a student that wanted to have access to some code that we used in our recent TKDE paper "Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics".

Specifically, the student wanted to estimate the readability test scores for the reviews. For those not familiar with readability tests, they are simple formulas that examine the text and estimate what is the necessary level education required in order to read and understand a particular piece of text.

I tried to send the code, but then I realized that it had some dependencies to some old libraries, which have been deprecated. At that point, I realized that it would be a pain to send the code to the student, then give instructions about all the dependencies etc. On the other hand, not sending the code is simply unacceptable.

Sharing code as an API

This got me thinking: How can we make the code to be robust to changes? How can we share the code in a way that it can be easily used by others? Given that all software packages today have web API's, why not creating API's for our own (research) code?

Since I have never tried in the past to do some serious web programming, I decided that I can spend a few hours to familiarize myself with the basics and make my library to be a set of RESTful API calls.

Apparently, it was not that difficult. I uploaded the code to the Google App Engine, and I wrote a small servlet that was taking as input the text, and was returning the readability metric of choice. Almost an assignment for a first-year student learning about programming.

Readability API


After a few hours of coding, I managed to generate a first version of the demo at http://ipeirotis-hrd.appspot.com/. I also created a basic API which can be easily used to estimate the readability scores of various texts.

I followed the example of bit.ly and I allowed the API calls to return simple txt format, so that it can be possible to embed the Readability API calls in many places. For example, I really enjoy calling bit.ly within Excel or within R, in order to shorten URLs. Now, it is possible to do the same in order to compute readability scores.

For example, if we want to compute the SMOG score for for the text "I do not like them in a box. I do not like them with a fox" and get back the score in simple text, you just need to call:

http://ipeirotis.appspot.com/readability/GetReadabilityScores?output=txt&metric=SMOG&text=I%20do%20not%20like%20them%20in%20a%20box.%20I%20do%20not%20like%20them%20with%20a%20fox.

The result is the SMOG score for the text, which in this case is 3.129. You can play with the demo and type whatever text you want, and see the documentation if you want to use the code. Of course, the source code is also available.

Future Plans


I actually like this idea and the result. I will be trying to port more of my code online, and make it available as an API. With the availability of sites such as Google App Engine, we do not have to worry about servers being taken down, or upgrades in OS, etc. The code can remain online and functioning. Now, let's see how easy it will be to port some non-trivial code.