As a simple example, I wanted students to find automatically the gender of a person, given only the first name, since 1/3 of the Facebook users do not list their gender. (The "homework motivation" was the need to send letters to customers, and we need to decide whether to put "Dear Mr." or "Dear Ms." as a greeting.) In general, the task is relatively easy and the majority of the names are not ambiguous. However, there is a set of highly ambiguous names, for which inference based on first name is problematic. For your viewing pleasure, the most ambiguous first names, together with the confidence that the name belongs to a male:
The next part of the homework, motivated by the ambiguity for some of the first names, asks students to guess the gender of a person based on the other stated preferences on Facebook profiles, regarding movies, books, TV shows and so on.
Based on the analysis of these features, women favor overwhelmingly the books "Something Borrowed," "Flyy Girl," "Good In Bed," "The Other Boleyn Girl," "Anne Of Green Gables", the movie "Dirty Dancing" and they like dancing as an activity.
On the other hand, characteristics that are unique to men are movies like "Terminator 2," "Wall Street," "Unforgiven," "The Good the Bad and the Ugly," "Seven Samurai"; the book "Moneyball"; sports-related activities (baseball, lifting) and sports-related TV shows (e.g., PTI, Sportscenter, Around the Horn). Another distinguishing feature of men is that they list "women" and "girls" as their interests (and in this case they should also think about taking perhaps some dancing lessons :-)