We have moved to a new website. You are being redirected!

Grouping and Standardizing Ticket Prices

A couple of weeks ago, Stromae performed at a sold out show in Washington, DC. Several friends were scouring StubHub and Craigslist for tickets, but ticket prices seemed to be all over the place. That got us wondering whether the market price for concert tickets exhibits any patterns that can be used to improve your chances of getting the best price (highest, if you’re the seller; lowest, if you’re the buyer).First, we needed a dataset of market transactions for concert tickets. As it turns out, StubHub provides this data to help sellers determine a listing price. Unfortunately, StubHub would not provide us with a dataset of sold ticket prices so we had to manually scrape their website over the course of a couple of weeks. Our dataset includes 16,562 ticket sales from 32 concerts in 17 cities and from 20 different artists. We started out with more than 20,000 ticket sales but remove transactions that occurred earlier than 90 days before the show, as well as the top and bottom 5th percentiles to remove outliers in the data.The source code for the analysis can be found here . We usually provide the dataset but, in this case, we’ll need StubHub's approval to share the data that we scraped from their website.Since ticket prices vary by city, show, seating section, date, etc., the first challenge was to group the raw data into categories in order to compare the 16,562 data points that we have in a meaningful way. We’re interested in the change in ticket prices for each seating section as the date of the show approaches, so we grouped the data by section (e.g., ‘Ed Sheeran / New York / Orchestra 201’ would be one section) with the timeline being the number of days before the show.A second challenge was to scale or standardize ticket prices whose absolute dollar value varies significantly from section to section. After all, we are interested in the movement of prices over time and not their absolute dollar value. To standardize the prices, we used z-score scaling , which transforms each ticket price into a value that represents its position (percentile rank) within its section assuming a normal distribution of prices. In other words, the average/mean ticket price in each section will have a value of zero and the standard deviation of prices in each section will be one (meaning that 95% of ticket prices in each section will have a z-score between -1.96 and 1.96). This makes it easier to compare the relative movement of ticket prices across sections because now prices in each section have the same scale (roughly -2.0 to 2.0, with an average of zero), instead of an absolute dollar value (which, in our case, ranges from a few dollars to more than $500, with different averages for each section).Figure 1 shows the impact of z-score scaling on the distribution of the ticket prices in our dataset. As you can see, z-score scaling standardizes the ticket prices by converting them into a normal distribution with mean zero and standard deviation of one. This allows us to compare ticket prices across sections as the average ticket price in each section will be zero and the distribution of ticket prices will be the same.