$\begingroup$

Like many people these days, I'm afflicted with the disease of recapitulating my work in my leisure. So it is that I find myself wanting to use data analysis to reverse-engineer the haggling mechanic in the venerable online game Neopets. As a first step, I'm trying to learn how the shopkeeper comes up with a counteroffer to the player's first offer. How this works from the player's perspective is:

The shopkeeper advertises an item at a specific price. I call this the sticker price.

The player initiates haggling by making an offer (which presumably will be equal to or less than the sticker price).

The shopkeeper either makes a counteroffer (which will always be greater than the player's offer, and almost always lower than the sticker price) or accepts the player's price.

All three of these prices must be positive integers. (They are in the in-game currency called Neopoints, or NP.)

I've collected a whole lot of data (over 9,000 observations), by automatically making offers and recording the counteroffers. You can get it at:

http://arfer.net/downloads/neopets_haggling_firstround.csv.gz

Two important things I've learned so far are:

No shopkeeper will accept or offer something below 3/4 of the sticker price (well, maybe one or two NP below). (This is something Neopets players have already known for a while; e.g.) The distribution of counteroffers, expressed as a fraction of the sticker price, is determined by the ratio of your offer to the sticker price. That is, if p1 is the sticker price, offer is your offer, and p2 is the counteroffer, then the distribution of p2/p1 is determined by offer/p1 . In short, differences in sticker price can be ignored for the purposes of data analysis so long as we work with ratios.

Here's a plot of the data I have so far:

ggplot(data = with(d, data.frame(x = offer / p1, y = p2 / p1)), aes(x, y)) + geom_point(alpha = .2) + geom_smooth(method = "loess", color = "blue", linetype = "dotted") + stat_summary_bin(fun.y = mean, binwidth = .02, geom = "line", color = "red") + coord_cartesian(xlim = c(0, .75), ylim = c(.74, 1), expand = F) + xlab("offer / p1") + ylab("p2 / p1")

Each point is a counteroffer. Clearly, the relationship is nonmonotonic and heteroscedastic.

The dotted blue curve is a LOESS fit. The solid red curve connects the means of p2 / p1 in each bin of offer / p1 , where the bins are .02 units wide.

(You'll note that the amount of data I have varies quite a bit between values of offer / p1 , which is because I've used various different strategies of computing offers over the course of data collection.)

It seems to me that LOESS isn't doing a good job here. In particular, we have a great deal of data at offer / p1 == .25 , meaning we should have quite a good estimate of the mean, but the curve doesn't go through the mean there (where the binned-means curve is). I've played around with the parameters of R's loess , cubic-spline regression ( splines::ns ), and generalized additive models ( mgcv::gam ), but I haven't gotten anything better. And obviously, the binned-means approach is too coarse and makes a curve that's too wiggly. Any hints?

Here's a plot of the univariate density at each of several offer ratios. The vertical lines indicate means.