Last week we wrote in the Compass blog how we estimate the geographic footprint of an agent.
At the very core, the technique is simple: Use the addresses of the houses that an agent has bought or sold in the past; get their longitude and latitude; and then apply a 2-dimensional kernel density estimation to find what are the areas where the agent is likely to be active. Doing the kernel density estimation is easy; the fundamentals of our approach are material that you can find in tutorials for applying a KDE. There are two interesting twists that make the approach more interesting:
- How can we standardize the "geographic footprint" score to be interpretable? The density scores that come back from a kernel density application are very hard to interpret. Ideally, we want a score from 0 to 1, with 0 being "completely outside of the area of activity" and 1 being "as important as it gets". We show how to use a percentile transformation of the likelihood values to create a score that is normalized, interpretable, and very well calibrated.
- What are the metrics for evaluating such a technique? We show how we can use the concept of "recall-efficiency" curves to provide a common way to evaluate the models.