The different attitudes of computer scientists and economists
I was reading Noam Nisan's blog post about the different attitudes of computer scientists and economists. Noam hypothesizes that economists emphasize research on “what is” while computer scientists emphasize on “what can be”, and offers the view of an algorithmic game theorist.
I have my own interpretation on this topic, mainly from the data mining point of view.
Economists are interested in suggesting policies (i.e., suggest to people, "what to do"). Therefore, it is important to built models that assign causality. Computer scientists are rarely interested in the issue of causality. Computer scientists control the system (the computer) and algorithms can be directed to perform one way or another. In contrast, economists cannot really control the system that they study. They do not even know how the system behaves.
When a computer scientist proposes an algorithm, the main focus is to examine the performance of the algorithm under different settings of incoming data. How the (computer) system will behave is controlled. When an economist suggests a policy, it is highly unclear how the underlying (rational?) agents will behave. Therefore, it is important to figure out what exactly "causes" the behavior of the agents, and figure out what policies can change this behavior.
One area that gets closer to economics in this respect is the area of data mining and machine learning. Get the data, and learn how the underlying system behaves. For example, get data about credit card transactions and learn which of them are fraudulent. However, there is a significant difference in focus: Computer scientists are mainly focused on predictive modelling. As long as the system can "predict" the outcome on unseen data, things are ok. A black box with perfect predictive performance is great. Explanatory models are rarely the focus. In the best case, someone may want to understand the internals of the predictive model but even if the model can be understood (e.g., read the rules or the decision tree), these rules are rarely causal in nature.
Let me give you an example: Suppose that you are trying to predict price per square feet for houses. As one independent variable (feature) you add average size of the house in the area. What the predictive model will find? That places that have smaller houses also have higher price per square foot. Unexpected? Not really. Houses in urban areas are typically smaller and more expensive compared to the their suburban and rural counterparts.
For a predictive model, this information is absolutely sufficient; the average house size is a valuable feature for predictive purposes. Think however what would happen is someone was devising policy based on this feature. A house builder would try to build smaller houses in rural areas, hoping that the resulting prices would be higher. Or a politician in Manhattan would encourage construction of bigger apartments, since the experiments have shown that if average house size is increased, the prices will drop. Absurd? Yes.
Even funnier things can come up if someone uses country-wide data to predict demand for apartments using apartment prices. The result will show that increasing prices actually increases demand, even though we would expect the opposite. (Just the effect of prices increasing in places where there is higher demand.)
Predictive modeling can survive (or even thrive) by exploiting such strange correlations. A causal model that captures correlations and presents them as causes can wreak havoc.
So, an economist will try to build a model that will generate causal relationships. In the case above, a model based on supply and demand is more likely to result in a model that captures the true "causes" of increased apartment prices. A house builder can see these effects and make a more informed decision on how to build. Similarly, for a politician that is trying to encourage building more affordable housing.
Often, causal models are called "structural" in economics [not sure where the term comes from; I have seen a few different interpretations]. They typically start by modelling the micro-behavior of agents, and then proceed to explain the behavior of a large system comprising of the interactions of such agents. A benefit of such models is that assumptions are easier to check, test, and challenge. In contrasts to "statistical" models, such models tend to generate relationships that are easier to consider "causal".
An advantage of causal models over predictive models is that causal models are valid even if the underlying data distribution changes. Causal models are supposed to be robust, as long as the behavior of the agents remains the same. A predictive model works under the assumption that the "unseen" data follow the same distribution as the "training" data. Change the distribution of the unseen data, and any performance guarantee for the predictive models disappears.
Update 1: This is not an attempt to downgrade the importance of predictive models. Most of the results presented by Google after a query are generated using predictive modeling algorithms. You get recommendations from Amazon and Netflix as the outcome of predictive algorithms. Your inbox remains spam-free due to the existence of the spam filter, again a system built using predictive modeling techniques. It is too hard, if not impossible, to build "causal" models for these applications.
Update 2: An interesting example of a company deriving policy based on their predictive model is American Express. They realized that the feature "customer buys in a 99c store" is correlated with higher delinquency rates. So, AmEx decided to decrease the credit limit for such customers. Of course, the result will be that potentially affected customers will stop visiting such stores, decreasing the value of this policy for AmEx. Furthermore, this action may cause even more economic stress to these customers that are now "forced" to buy from more expensive stores, and this may result in a much higher default rate for AmEx. This "unexpected" outcome is the effect of devising policy based on non-causal variables.
If AmEx had a variable "customer in economic distress", which arguably has a causal effect on default rates, then it would be possible to perform this action, without the ability of customers to game the system. However, since AmEx relied on a variable "customer buys in a 99c store" that is the outcome of the variable "customer in economic distress" it is possible for consumers to simply change their behavior in the face of economic distress.


This is all very interesting... just to add to what you’ve said, from an applied economist's perspective (aka an econometrician), models are thought of as revealing the effects of specific variables to phenomena that interest them (which become dependent variables in the models). Economists like to have these variables be suggested by theory (which brings about the problem of how do you even start theorizing before you see the data!) Prediction is of course something that can occur through these models that in essence include policy lever variables (thus, a variable does not just enter due to its correlation, but because a theory (or theories) support its presence there.
ReplyDeleteIn your example on housing prices, an econometric model informed by theory could introduce average housing size in the area as a control variable, but not as a policy lever variable. So assuming that other predictors from theory are the square footage of the house, age of the house, distance to downtown, distance to the coast and other amenities etc., the average housing size in the neighborhood of the observation might be thought of controlling something like neighborhood heterogeneity but it may also not be the best control for that. If you want to predict average prices at an aggregate unit then you should try to use individual house data and then aggregate up or use a multilevel model to begin with. What I’m trying to say is, it has to be clear to the policy maker what can act as a lever and what not… and that depends a lot on the setup of the model.
Econometrics has of course been criticized for several “sins” (see the classic paper by Leamer for example: http://www.international.ucla.edu/media/files/Leamer_article.pdf) but since that time it has made some progress with use of IVs (http://www.economist.com/businessfinance/displaystory.cfm?story_id=14210799) and different research designs (e.g. quasi-experimental).
@maikwl: Thanks, very helpful!
ReplyDeleteThere are way too many parallels between applied economists and people doing applied predictive modeling (aka data mining) in CS. But what separates the two is the perspective. And this changes the way they approach problems. It took me some time to understand the difference, so that I can talk and understand better my econometrician colleagues.
For example, the whole distinction between "control" and "policy level" variable is almost unheard of in CS. Why? Because the only thing that matters is the predictive performance in the "test set" (aka "out of sample data"). Nobody in CS really cares about devising policy based on the derived models.
The funny things start when people love their models so much that START devising policy based on predictive models. The nice example is American Express: They figure out that the variable "buys in a 99c store" is good predictor for defaulting on the credit card debt. So, AmEx starts cutting down the credit limits for those members, or cancelling the credit card altogether. Till, of course, this strategy becomes known, and people stop shopping in these stores, being afraid that their credit line will be discontinued. (And this causes even more economic stress to the distressed members, causing even higher default rates for AmEx?)
There is a time and a place for the various models. Causal models are by definition predictive. However, you can build more powerful predictive models when you do not care about interpretability or about causality. The choice is simple if you know why you build the model.
The AmEx example is great in showing how people actually adapt to models :) I did not know of the particular case.
ReplyDeleteThe question of having the capacity of building more powerful predictive models through CS methods (or, through instrumentalism as your methodological approach, following economics jargonese) is something that bothers me a bit. Maybe this happens because you can never fully identify the true data generating process. In an "ideal" world you should be able to do just as well but the complexity of social phenomena makes theory-based econometrics rather weak in that respect. Well, a good model is a useful model in the end...
"Maybe this happens because you can never fully identify the true data generating process. In an "ideal" world you should be able to do just as well but the complexity of social phenomena makes theory-based econometrics rather weak in that respect. "
ReplyDeleteThis "clash" is actually manifested in many different ways. See for example the article by Chris Andersen "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
As expected, for an article written to be provocative, there were many reactions, positive and negative. One of the best replies that I encountered is Peter Norvig's: http://norvig.com/fact-check.html
I remember reading the End of Theory and disagreeing :))) Fun stuff...
ReplyDeleteOn the topic of the clash of model-ful vs. model-less approaches to data, there's a very nice Breiman 2001 article + discussion:
ReplyDelete"Statistical Modeling: Two Cultures" http://projecteuclid.org/euclid.ss/1009213726
My take on this is now:
The CS approach is good if you have lots of data and new situations that arise have data with similar properties.
The CS approach is bad in messy worlds like government policy making or maybe web site design, where every situation/dataset is very different and all your inferences are going to be through qualitative reasoning by analogy. So you want to use data to support said reasoning. Predictive methods aren't helpful here; rather, data analysis is what you want -- which may or may not include modeling.
@Brendan: I do not think that it is possible to devise policy based on "model-less" data analysis. Devising policy requires not only predictive power, but also causal relationships.
ReplyDeleteThe data-based modeling approach (aka the "CS approach") assumes that the world does not change. However, when we introduce model-based decisions in a world or rational agents, the agents may decide to change their behavior to adapt to the model. This often changes the "unseen data" and the model performance deteriorates. As a simple example, think of spam filtering, a classic "predictive modeling" task. As spam filters (i.e., data-driven models) evolve, the spammers change their behavior, rendering old spam filters useless. This whole interaction of models with the actual system that is being studied causes problems.
On the other hand, devising a clean causal model is extremely difficult. I have no idea of how someone could even build a "causal" model for spam filtering.
A useful analogy that I got after talking with a colleague of mine:
ReplyDelete* If you are trying to "diagnose" something (will someone repay the loan? is this email a spam?) then predictive modeling is fine.
* If you are trying to "treat" something (how can I reduce the default rate of my customers?) then you need a causal model, to tell you what is the "treatment effect" after you vary an input variable.
Huh, that analogy is opposite of how I think of those terms. "Diagnose" means "explain" or "understand causes of". But you can "treat" a spam problem by throwing enough features at it.
ReplyDeleteIn any case, when I was saying data analysis with or without modeling, I just meant any sort of analysis that leads to decision-support insight. One way, the one you're talking about, could be through a model with causal interpretations of its parameters. This might make sense for understanding certain domains, say market behavior, where there's a theory (supply and demand) that seems applicable to the situation.
Or this data analysis could be through simpler descriptive statistical techniques, like comparing raw averages between groups, exploratory visualization, etc. I consider these to be "model-less". They let you understand the phenomena without precommitting yourself to the assumptions the model makes. Since it's easy to make bad models that seem good, I think it's worth worrying about this. On the other hand, your inferences are weaker since you have fewer assumptions.
Those two approaches are both for descriptive analysis.
Where does the causal model come from? If it comes from data, it too is susceptible to predictive modelling. If it isn't, then it simply isn't science - it's storytelling.
ReplyDelete@Barry: Typically the causal model is developed based on theory that defines how the different agents behave (e.g., that all agents are greedy and are trying to maximize their own benefit). This theory results in a set of variables that should be added in the model. At that point the data collection starts and we expect reality to follow the theory. Sometimes it does, sometimes it does not. In general, developing a model that is considered "causal" is typically much harder than developing a predictive model.
ReplyDeleteTo generate a predictive model, you can simply "throw" a large number of data-derived variables that are correlated with the outcome that you are trying to predict. Although it is good to have a theory to help you select the data, it is not necessary. It may sound more like a "hack" but it really works pretty well in real life. You enjoy great recommendations from Netflix, without Netflix knowing what "causes" you to like a particular movie. They just predict that you will like a movie based on your preferences, other movies that you rated, and so on. The spam filter keeps your inbox clean, without knowing what "causes" something to be spam.
In general, do not think predictive modeling as something bad. There are plenty of wonderful applications. It is just something that economists are not interested in doing, most of the time.
Wait, isn't the problem with the AMEX situation that they simply asked the wrong question of the computer? Or rather, they used the answer of one question and applied it to another.
ReplyDeleteIf they were interested in policy, they should have asked: what is the variable whose change would minimize delinquency over the long term?
Whether or not data for this analysis exist is a separate question. You do point out this in the last paragraph of the article, but I wanted to emphasize that there is nothing fundamental about the "computer models" that preclude answering such question.
@Anonymous: Pretty much is what you are saying.
ReplyDeleteAmEx asked the question "what features can I use to *predict* who is going to default" and they assumed that this is the same as "what behaviors cause a consumer to default"?
I do not know how well my concern would fit the discussion, but would still put forward the same.
ReplyDeleteLearning from the near past, weren't computer scientists and theorists then, modeling a lot of such variables (Mycin based certainty factors (much alike the hacks) etc.) in huge knowledge bases which with the application of forward and backward chaining inference mechanisms they would come up with decisive factors relevant to the context?
I guess, the line of thinking, then, was pretty much similar to the attitude of the economists, as you've put it. Interestingly, approaches of predictive modeling today, comply largely to non-human interpretable sets of data to come up with the variables which influence the policies at large. Undoubtedly, the well known critical acclaim of the fact that it does turn biased towards the training set, at times serves as a boon and at times, performs poorly with novel cases.
Interestingly, the shift seems to be from knowledge bases to statistical data sets. Natural language and logic based explanatory models seemed to be moving to statistical explanatory models.