Climb Through the Data With Me

An Analysis of 8a.nu Logbook Data

Being in nature is such a wonderful feeling. The peace and tranquility of being on the trail is one of my greatest pleasures, but I need a reward at the end of a hike. I need a magnificent view, a beautiful lake, or an aesthetically pleasing activity to spectate. Yeah, I know, that last one doesn’t really make sense, but if you hike to Grassi Lakes, near Canmore, Alberta, you get to experience all three of the above. This is where I fell in love with rock climbing.

My wife and I had driven from our home in Calgary to enjoy an easy hike with a big reward. As we sat near the lake pictured above, eating our lunch, I watched the climbers and was mesmerized. It was so beautiful.

After returning to my everyday life I found I couldn’t stop thinking about it. The following weekend we went camping with some friends and as we sat around the camp fire getting to know each other better I found out that they had met each other rock climbing. I don’t believe in destiny, but this felt like a strange coincidence to me.

A few days after arriving back from our camping adventure I received a WhatsApp message from my cousin in New York. He was trying to arrange a family trip for Christmas and his first suggestion was Las Vegas and his reason was so that we could go rock climbing!

In the previous 41 years of my life rock climbing had came up in conversation exactly once, and in the span of two weeks it had came up three times. I had actually tried rock climbing once, when it first came up in conversation, but I hated it. I was too fat, afraid of heights and I had a terrible experience, but I decided to try it again with a fresh perspective. Turns out it’s wonderful and I love it. I needed an activity to help me lose weight, but it has become more than that. It’s a passion.

I’m still fat though.

But… You said data analysis?

Now that you have a better understanding of why I am interested in analyzing the Climbing Log Book Data, created by David Cohen, let’s jump right in.

Caveat: I am going to say this once, and only once, this data set is inherently biased. The data comes from a website where users log their climbs and the people who will go to the trouble of tracking this information on a website are very serious about their hobby. Hence, the data will suffer from self selection bias and it will tend to over represent the serious, strong climber. It will also over represent the geographic areas where 8a.nu is popular. I haven’t analyzed that, but based on the name of the site it is very likely that 8a.nu is more popular in Europe than other areas as 8a is a reference to the French grading system.

The Data — Cleaning

The data set can be downloaded as a sqlite database, but I found Pandas to be a bit slow at reading this and exported each of the four tables from the database as individual csv files for analysis.

Ascents — 4 million logged climbs. Bouldering and sport climbs are both included, but can be separated. I restricted myself to only sport climbs.

Grades — A list of each of the different climbing grades. For anyone who doesn’t know, every climb is given a grade. These grades are supposed to be consistent throughout the world and gives us a means to measure our progress as we journey through the climbing world in search of our next great send.

Users — Approximately 65,000 users

Method — A helper table to describe the method used to finish the climb (we won’t get into this)

After a thorough analysis I found that approximately half the users had never logged a climb and further found that many others had not entered the data that I was interested in. I settled on the following filters for my analysis:

Only sport climbs

User account not “deactivated”

Restricted sex to “male” or “female”

Restricted birth date to be after 1930

Weight had to be > 0 kg (lots of 0's)

Height was set to be > 122 cm (4'0") and < 213 cm (7'0") — I apologize to any extremely short or tall people that love to climb

BMI (body mass index) was calculated from weight and height and then was further used to filter the data between 12 and 40. These seemed to have minimal affect, but helped to reduce some outliers.

During my analysis I discovered some additional data quality issues related to BMI and I had to add an addition filter. A lot of users have 100 kg (220lbs) listed as their weight, which might be legitimate so I couldn’t eliminate them, but it messed up the BMI results a little. What I had to do was filter out anyone who had a calculated BMI > 28 and had also climbed 5.12a+ as this does not feel realistic and is probably a data entry error. This decision only affected a small number of overall users, but really skewed one of the analysis.

That left us with ~13,700 unique climbers and 1.65 million climbs. Seems like a reasonable sample size to work with.

How Hard Do People Climb?

Let’s jump right in and see what the distribution of our climbers looks like.

We see that almost no one logs climbs below a grade of 5.9. We also see a problem with the 5.11c grade, which isn’t really a problem at all. Grading is a confusing subject, but in short, there are multiple grading systems and as this is a European website they default to using the French system, which then gets converted to the Yosemite Decimal System. The French system is more popular in Europe whereas in North America we favor the Yosemite Decimal System. It turns out there is no direct conversion of a French grade to 5.11c, hence there is no data above.

Now that we’ve explained the missing bar, what about the big spike out at 5.13b? This converts directly to the French grade 8a. In case you’ve forgotten, the name of the website this data was collected from is 8a.nu. Maybe a little more self-selection bias, or, it also could be this is the grade people push hard to attain as it would be the first 8 level grade. It’s hard to say why this spike exists.

What else can we learn from this? It would be nice to know what the median climber is as well as the 10th and 90th percentiles. A cumulative distribution histogram would make this information easy to attain.

From this we can see that the median (50th percentile) climber, who logs climbs at 8a.nu, has a max grade of 5.12c. This is likely a much higher max grade than the true median climber is capable of and is aproduct of the self-selection bias previously discussed. We can also see that the 10th percentile is around 5.10c and the the 90th percentile is 5.13d. Impressive stuff.

For reference, I have only climbed outside a tiny amount but my maximum grade is 5.10b.

What country climbs the most?

Simple question — on a per capital basis what country climbs the most? To answer this I needed population data. I’ve only included countries with at least 25 climbers in the dataset.

For Countries with > 25 Climbers

I had to extend this to include the top 20 countries before I could get Canada to show. The US was 28 overall.

I know I promised not to talk about selection bias, but the above chart does need to be discussed a little, because it is very dependent on the user base of the website. Poland has about the same population as Canada, but has 2.6x as many climbers in the dataset. So, please take the above chart with a giant grain of salt, but it’s the only data I had to work with.

I feel better getting that off my chest, but the chart stands.

Who climbs more — males or females?

This question seemed like it would have an obvious answer, but I felt like it needed to get addressed.

I expected the proportion of male climbers to be much higher, but the degree of disparity was much higher than I expected. I mostly climb in the gym, and, without doing any analysis, my anecdotal evidence would make me guess that the proportion of female climbers in Calgary is much higher than this. Who needs data when we have anecdotal evidence anyways?

The surprising result, combined with my anecdotal evidence, made me curious about whether or not the proportion of new climbers that are female was increasing or not. If it was increasing that might account for my feeling that there are more females climbing in Calgary than are represented in this database as I am a relatively new climber.

OK, maybe I’m onto something. It seems that the proportion of female new climbers has been increasing steadily from about 2010. I started to climb in 2018, so this seems to provide some evidence to support my anecdotal evidence. What else can we look at?

For Countries with > 25 Female Climbers

Well, here we go. I live in Calgary and Calgary is in Canada and Canada has the fourth highest proportion of female climbers in the database and the proportion of female climbers that are starting out has been increasing. I like when things support my guesses and the evidence does seem to support my theory that the proportion of female climbers in Canada is much higher than the proportion of female climbers in the 8a.nu database.

Does Weight Affect My Ability to Send?

First off, weight is an unreliable measurement. If someone is 6'0" tall they are obviously going to weigh more than someone with a similar body type that is 5'0" tall. Body Mass Index (BMI), while not infallible, is a much better metric to use to compare similar individuals. Although a lot of us know this, and it’s also some what intuitive, I thought it would be interesting to visualize it. Let’s use the 8a.nu data to illustrate this.

A couple of things to note from the above graph.

First, the data has some issues. Weight is in five kilogram increments! That’s 10 lb increments. I’m not sure who invented the interface or what their intention was, but that seems like a rather large increment. I added jitter to reduce the effect of the bands a little, but it’s hard to compensate for the bad data.

Second, the trend lines through the data illustrate the value of using BMI when comparing individuals. In the top plots weight is plotted vs height and we can see that the two values are linearly related to one another. In the bottom plots we display BMI vs height and no obvious relationship can be seen. I suppose you could argue there is a slight downward trend, but the general point is that it is very difficult to observe a linear relationship as the data is more normally distributed. This is good for analysis and we will restrict ourselves to BMI.

Let’s look at the average BMI for each grade.

What I’ve done in the above chart is to find the max grade for each climber in the database and then to plot their BMI. I thought this would be interesting.

There appears to be a downward trend in BMI as the max grade of the climber increases. I think this makes sense as climbing is hard, and to get into the 5.12+ range is difficult. You need to be dedicated and you need to practice. If you are that dedicated to climbing, you are going to have enough willpower to not eat the last piece of chocolate. I am always going to eat that last piece of chocolate. C’est la vie.

After creating the above chart, I realized that it really wasn’t what I wanted. I wanted to know what the maximum grade was for a given BMI.

This clearly shows that there is a negative correlation between max grade and BMI. I’m in trouble. :(

Does Height Affect My Ability to Send?

I am on the shorter side of things and often climb with someone who is above average in height, and I see how they are able to reach holds and make moves that simply aren’t possible for me to make. I bemoan their height and pout as I fail and fail and fail. Does the data tell me that my lack of height is holding me back?

Umm… no, I can’t blame my height. Well, I still can — the above graph is for outdoor climbing and gym climbing is different as it depends on how the routes are set. Whew, I can still blame something my inability to send.

But, what happens when we flip this chart around and look at max grade vs height?

That is much more interesting! I did not expect to see a downward trend with height. Maybe it is due to the fact that height is correlated with weight and weight is bad for climbing? Whatever the cause the trend is clear and as I am 168cm I am at the idea height (apparently), so no more excuses. Just send it!

Conclusion

Thanks for joining me on this journey through the climbing data. I hope you’ve enjoyed reading as much as I enjoyed making it. If you have any questions or if you are curious about any aspect of this project feel free to reach out.