Monday, January 28, 2008

Consumer surplus in eBay

A paper that we wrote on "Consumer surplus in online auctions" was recently accepted to the leading journal Information Systems Research. Reuters interviewed us about the paper (Study shows eBay buyers save billions of dollars), which is of special interest these days due to the change in CEO at eBay. Although the economic implications of the paper are interesting and important, the neat methodology is a highlight in itself. So here's what we did:

Consumer surplus is the difference between what a consumer pays and what s/he was willing to pay for an item. eBay can measure the consumer surplus generated in their auction, because they run a second-price auction. This means that the highest bidder wins, but pays only the second highest bid. [I'm always surprised to find out that many people, including eBay users, do not know this!]

So generally speaking, eBay has the info both on what a winner paid and what s/he was willing to bid (if we assume that the highest bid reflects their true willingness-to-pay value). Adding up all the differences between the highest and second highest bids would, say over a certain year, would then (under some assumptions) give the total consumer surplus generated in eBay in that year. The catch is that eBay makes public all bids in an auction besides the highest bid! This is where we came in: we used a website that allows eBay bidders to bid on their behalf during the last seconds of the auction (called a sniping agent). At the time, this website belonged to our co-author Ravi Bapna, who was the originator of this cool idea. For those users who won an eBay auction we then had the highest bid!

In short, the beauty of this paper is in its novel use of technology for quantifying an economic value. [Not to mention the intricate statistical modeling to measure and adjust for different biases]. See our paper for details.

Friday, January 25, 2008

New "predictive tools" from Fair Issac

An interesting piece in the Star Tribune: Fair Isaac hopes its new tools lessen lenders' risk of defaults was sent to me by former student Erik Anderson. Fair Issac is apparently updating their method for computing FICO scores for 2008. According to the article "in the next few weeks [Fair Issac] will roll out a suite of tools designed to predict future default risk". The emphasis is on predicting. In other words, given a database of past credit reports, a model is developed for predicting default risk.

I would be surprised if this is a new methodology. Trying to decipher what really is new is very hard. Erik pointed out the following paragraph (note the huge reported improvement):

"The new tools include revamping the old credit-scoring formula so that it penalizes consumers with a high debt load more than the earlier version. The update, dubbed FICO 08, should increase predictive strength by 5 to 15 percent, according to Fair Isaac's vice president of scoring, Tom Quinn."

So what is new for the 2008 predictor? The inclusion of a new debt load variable? a different binning of debt into categories? a different way for incorporating debt into the model? a new model altogether? Or maybe, simply the model based on the most recent data now includes a parameter estimate that is much higher for debt load than models based on earlier data.

Wednesday, January 16, 2008

Data Mining goes to Broadway!

Data mining is all about being creative. At one of the recent data mining conferences I recall receiving a T-shirt from one of the vendors with the print "Data Mining Rocks!"

Maybe data mining does have the groove: A data mining class at U Fullerton (undergrad business students) instructed by Ofir Turel, has created "Data Mining - The Musical". Check it out for some wild lyrics.

Cycle plots for time series

In his most recent newsletter, Stephen Few from PerceptualEdge presents a short and interesting article on Cycle Plots (by Naomi Robbins). These are plots for visualizing time series, which enhance both cyclical and trend components of the series. Cycle plots were invented by Cleveland, Dunn, and Terpenning in 1978, and seem quite useful. I have not seen them integrated into any visualization tool, although they definitely are useful and easy to interpret. The closest implementation that I've seen (aside from creating them yourself or using one of the macros suggested in the article) is Spotfire DXP's hierarchies. A hierarchy enables one to define some time scales embedded within other time scales, such as "day within week within month within year". One can then plot the time series at any level of hierarchy, thereby supporting the visualization of trends and cycles at different time scales.