1354 lines
45 KiB
Plaintext
1354 lines
45 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "37e8cfe9",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Tension Board 2 / Tension Board 1: Data Overview and Climbing Statistics\n",
|
|
"\n",
|
|
"## Purpose\n",
|
|
"\n",
|
|
"This notebook establishes the basic statistical landscape of the dataset before we move into hold-level analysis and predictive modelling. The main goals are:\n",
|
|
"\n",
|
|
"1. to understand the size and scope of the data,\n",
|
|
"2. to compare layouts, boards, and angles at a high level,\n",
|
|
"3. to identify broad trends in grade, popularity, and quality,\n",
|
|
"4. to create a clean descriptive baseline for the later modelling notebooks.\n",
|
|
"\n",
|
|
"Throughout, I treat each climb-angle entry as a separate observation unless explicitly noted otherwise. That matters because some climbs appear at multiple angles, so a unique climb count and a climb-angle count are not always the same thing.\n",
|
|
"\n",
|
|
"## Outputs\n",
|
|
"\n",
|
|
"This notebook produces summary tables and exploratory plots that motivate the later notebooks on:\n",
|
|
"- hold usage,\n",
|
|
"- hold difficulty,\n",
|
|
"- feature engineering,\n",
|
|
"- predictive modelling,\n",
|
|
"- and deep learning.\n",
|
|
"\n",
|
|
"## Notebook Structure\n",
|
|
"1. [Setup and Imports](#setup-and-imports)\n",
|
|
"2. [Popularity and Temporal Trends](#popularity-and-temporal-trends)\n",
|
|
"3. [Climbing Statistics](#climbing-statistics-grades-angles-quality-and-matching)\n",
|
|
"4. [Prolific Statistics](#prolific-statistics)\n",
|
|
"5. [Conclusion](#conclusion)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "898cad20",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup and Imports"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e48e2d25",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Setup and imports\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"# Imports\n",
|
|
"import pandas as pd\n",
|
|
"import sqlite3\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"import seaborn as sns\n",
|
|
"import numpy as np\n",
|
|
"\n",
|
|
"import matplotlib.patches as mpatches\n",
|
|
"import sqlite3\n",
|
|
"\n",
|
|
"\n",
|
|
"# Set some display options\n",
|
|
"pd.set_option('display.max_columns', None)\n",
|
|
"pd.set_option('display.max_rows', 100)\n",
|
|
"\n",
|
|
"# Set style\n",
|
|
"palette=['steelblue', 'coral', 'seagreen'] #(for multi-bar graphs)\n",
|
|
"\n",
|
|
"# Connect to the database\n",
|
|
"DB_PATH=\"../data/tb2.db\"\n",
|
|
"conn = sqlite3.connect(DB_PATH)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2e7b5862",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"### Query our data from the DB\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Query climb data\n",
|
|
"climbs_query = \"\"\"\n",
|
|
"SELECT\n",
|
|
" c.uuid,\n",
|
|
" c.name AS climb_name,\n",
|
|
" c.setter_username,\n",
|
|
" c.layout_id AS layout_id,\n",
|
|
" c.description,\n",
|
|
" c.is_nomatch,\n",
|
|
" c.is_listed,\n",
|
|
" l.name AS layout_name,\n",
|
|
" p.name AS board_name,\n",
|
|
" c.frames,\n",
|
|
" cs.angle,\n",
|
|
" cs.display_difficulty,\n",
|
|
" dg.boulder_name AS boulder_grade,\n",
|
|
" cs.ascensionist_count,\n",
|
|
" cs.quality_average,\n",
|
|
" cs.fa_at\n",
|
|
" \n",
|
|
"FROM climbs c\n",
|
|
"JOIN layouts l ON c.layout_id = l.id\n",
|
|
"JOIN products p ON l.product_id = p.id\n",
|
|
"JOIN climb_stats cs ON c.uuid = cs.climb_uuid\n",
|
|
"JOIN difficulty_grades dg ON ROUND(cs.display_difficulty) = dg.difficulty\n",
|
|
"WHERE cs.display_difficulty IS NOT NULL;\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Load it into a DataFrame\n",
|
|
"df = pd.read_sql_query(climbs_query, conn)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "70a88be4",
|
|
"metadata": {},
|
|
"source": [
|
|
"The above query will allow us to gather basically anything we need to in order to analyze climbing statistics. We leave out information about climging holds and things like this, because they will be analyzed in a different notebook. Let's see what our DataFrame looks like."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8b0057f5",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e7eb46cc",
|
|
"metadata": {},
|
|
"source": [
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ba4fc956",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Popularity and Temporal Trends\n",
|
|
"\n",
|
|
"## Popularity of Tension Board\n",
|
|
"\n",
|
|
"Since we do not have access to user data, we will examine the popular of the Tension Boards by counting first ascents and unique setters by year. Often it's the case that the first ascensionist is the also the setter of the climb, but not always. None the less, we group up first ascensionists by year, with an extra tidbit about how many unique setters there were. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3dff1bd7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Popular of tension board by year. \n",
|
|
"First ascents by year + unique setters by year\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Convert df['fa_at'] to datetime format. (For some reason, it does not register as such)\n",
|
|
"df['fa_at'] = pd.to_datetime(df['fa_at'])\n",
|
|
"\n",
|
|
"# Add a new column for the year\n",
|
|
"df['fa_year'] = df['fa_at'].dt.year\n",
|
|
"\n",
|
|
"# Make a new DataFrame with year, first_ascents, and unique_setters\n",
|
|
"df_growth = df.groupby('fa_year').agg(\n",
|
|
" first_ascents=('uuid', 'count'),\n",
|
|
" unique_setters=('setter_username', 'nunique')\n",
|
|
").reset_index()\n",
|
|
"\n",
|
|
"# Disregard the year 2026 since the data only goes one month in. \n",
|
|
"df_growth = df_growth[df_growth['fa_year'] < 2026]\n",
|
|
"\n",
|
|
"## Plot\n",
|
|
"# Dual index plotting\n",
|
|
"\n",
|
|
"fig, ax1 = plt.subplots(figsize=(12,6))\n",
|
|
"\n",
|
|
"# Bar chart for first ascents\n",
|
|
"ax1.bar(df_growth['fa_year'], df_growth['first_ascents'], label='First Ascents', color='coral')\n",
|
|
"ax1.set_xlabel('Year')\n",
|
|
"ax1.set_ylabel('First Ascents')\n",
|
|
"ax1.set_title('TB First Ascents & Unique Setters over Time')\n",
|
|
"#ax1.tick_params(axis='y')\n",
|
|
"\n",
|
|
"# Line chart for unique setters (secondary axis)\n",
|
|
"ax2 = ax1.twinx()\n",
|
|
"ax2.plot(df_growth['fa_year'], df_growth['unique_setters'], color='steelblue', marker='o', label='Unique Setters')\n",
|
|
"ax2.set_ylabel('Unique Setters', color='steelblue')\n",
|
|
"ax2.tick_params(axis='y', labelcolor='steelblue')\n",
|
|
"\n",
|
|
"# Other stuff\n",
|
|
"fig.legend(loc='upper left', bbox_to_anchor=(0.15,0.85))\n",
|
|
"\n",
|
|
"plt.xticks()\n",
|
|
"plt.savefig('../images/01_climb_stats/first_ascents_by_year.png')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e2bca15a",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Seasonal analysis\n",
|
|
"\n",
|
|
"Next, we examine when the Tension board is most popular. Again, we will work with what we have and use first ascent data. We will plot first ascents by month, combing all years. We exclude the year 2026 because this can skew the analysis as some of the month of January has data (and clearly, 2026 is when the TB2 is the most popular, so this can actually add quite a bit bias). "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8912306b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Season analysis: first ascents by month\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# First let us add a column for the month to our data\n",
|
|
"df['fa_month'] = df['fa_at'].dt.month\n",
|
|
"\n",
|
|
"# Filter to years < 2026 since the data only goes one month in\n",
|
|
"df_filter = df[df['fa_year'] < 2026]\n",
|
|
"\n",
|
|
"# Make a new DataFrame with month and first ascents\n",
|
|
"df_season = df_filter.groupby('fa_month').agg(\n",
|
|
" first_ascents=('uuid', 'count'),\n",
|
|
").reset_index()\n",
|
|
"\n",
|
|
"# We also add a column for the month name. \n",
|
|
"month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', \n",
|
|
" 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']\n",
|
|
"df_season['fa_month_name'] = df_season['fa_month'].apply(lambda x: month_names[x-1])\n",
|
|
"\n",
|
|
"# Plot the data\n",
|
|
"fig,ax = plt.subplots(figsize=(12,6))\n",
|
|
"ax.bar(df_season['fa_month_name'], df_season['first_ascents'], color='coral')\n",
|
|
"ax.set_title('First Ascents by Month (All Years Combined)')\n",
|
|
"ax.set_xlabel('Month')\n",
|
|
"ax.set_ylabel('Total First Ascents')\n",
|
|
"\n",
|
|
"# Save the file\n",
|
|
"plt.savefig('../images/01_climb_stats/first_ascents_by_month.png')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bb802d74",
|
|
"metadata": {},
|
|
"source": [
|
|
"This should be what we expect: that the winter months (Dec/Jan) see the most traffic. This is probably when the outdoor climbers are hitting the boards because they're stuck inside. The warmer months see the least number of first ascents since the strong climbers are probably outdoors."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "def962ba",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Day of Week Analysis\n",
|
|
"\n",
|
|
"We can plot the number of first ascents by day of week. Removing the 2026 data shouldn't make a difference here, so we opt to keep it."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "071cfd5f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Day of Week analysis\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Let us add a column in our DataFrame for the day of the week.\n",
|
|
"# Note that df.dt.day_of_week will have Monday be 0 and Sunday be 6. \n",
|
|
"\n",
|
|
"df['fa_day_of_week'] = df['fa_at'].dt.day_of_week\n",
|
|
"\n",
|
|
"\n",
|
|
"# Make a new DataFrame with month and first ascents\n",
|
|
"df_days = df.groupby('fa_day_of_week').agg(\n",
|
|
" first_ascents=('uuid', 'count'),\n",
|
|
").reset_index()\n",
|
|
"\n",
|
|
"# We also add a column for the month name. \n",
|
|
"day_names = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']\n",
|
|
"df_days['fa_day_name'] = df_days['fa_day_of_week'].apply(lambda x: day_names[x])\n",
|
|
"\n",
|
|
"# Plot the data\n",
|
|
"fig,ax = plt.subplots(figsize=(12,6))\n",
|
|
"ax.bar(df_days['fa_day_name'], df_days['first_ascents'], color='coral')\n",
|
|
"ax.set_title('First Ascents by Day of Week (All Years Combined)')\n",
|
|
"ax.set_xlabel('Day')\n",
|
|
"ax.set_ylabel('Total First Ascents')\n",
|
|
"\n",
|
|
"# Save the file\n",
|
|
"plt.savefig('../images/01_climb_stats/first_ascents_by_day_of_week.png')\n",
|
|
"plt.show()\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9961e92e",
|
|
"metadata": {},
|
|
"source": [
|
|
"Interesting, Tuesday and Wednesday have the most traffic, while Monday is the least popular."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "440a0b28",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Time of Day Analysis\n",
|
|
"\n",
|
|
"We can even do a time of day analysis. Again, we will keep the 2026 data since it shouldn't affect much. It is not entirely clear that makes sense to look at this, as we don't know if the time of first ascent is recorded in local time of the climber or local time of the server. These boards are all over the world, so this may add quite a bit of variance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "1b8e3c4a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Time of Day analysis\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Let us add a column in our DataFrame for the day of the week.\n",
|
|
"# Note that df.dt.day_of_week will have Monday be 0 and Sunday be 6. \n",
|
|
"\n",
|
|
"df['fa_hour'] = df['fa_at'].dt.hour\n",
|
|
"\n",
|
|
"\n",
|
|
"# Make a new DataFrame with month and first ascents\n",
|
|
"df_hour = df.groupby('fa_hour').agg(\n",
|
|
" first_ascents=('uuid', 'count'),\n",
|
|
").reset_index()\n",
|
|
"\n",
|
|
"# We also add a column for the month name. \n",
|
|
"#df_time['fa_day_name'] = df_time['fa_day_of_week'].apply(lambda x: day_names[x])\n",
|
|
"\n",
|
|
"# Plot the data\n",
|
|
"fig,ax = plt.subplots(figsize=(12,6))\n",
|
|
"ax.bar(df_hour['fa_hour'], df_hour['first_ascents'], color='coral')\n",
|
|
"ax.set_title('First Ascents by Hour (All Years Combined)')\n",
|
|
"ax.set_xlabel('Hour')\n",
|
|
"ax.set_ylabel('Total First Ascents')\n",
|
|
"\n",
|
|
"# Save the file\n",
|
|
"plt.savefig('../images/01_climb_stats/first_ascents_by_hour.png')\n",
|
|
"plt.show()\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5ddc49b3",
|
|
"metadata": {},
|
|
"source": [
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "533dc4a2",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Climbing Statistics: Grades, Angles, Quality, and Matching\n",
|
|
"\n",
|
|
"We will visualize the climbing grade distribution. Recall that we have the following table of grades (with some other unlisted grades).\n",
|
|
"\n",
|
|
"|difficulty|boulder_name|route_name|\n",
|
|
"|----------|------------|----------|\n",
|
|
"| 10|4a/V0 |5b/5.9 |\n",
|
|
"| 11|4b/V0 |5c/5.10a |\n",
|
|
"| 12|4c/V0 |6a/5.10b |\n",
|
|
"| 13|5a/V1 |6a+/5.10c |\n",
|
|
"| 14|5b/V1 |6b/5.10d |\n",
|
|
"| 15|5c/V2 |6b+/5.11a |\n",
|
|
"| 16|6a/V3 |6c/5.11b |\n",
|
|
"| 17|6a+/V3 |6c+/5.11c |\n",
|
|
"| 18|6b/V4 |7a/5.11d |\n",
|
|
"| 19|6b+/V4 |7a+/5.12a |\n",
|
|
"| 20|6c/V5 |7b/5.12b |\n",
|
|
"| 21|6c+/V5 |7b+/5.12c |\n",
|
|
"| 22|7a/V6 |7c/5.12d |\n",
|
|
"| 23|7a+/V7 |7c+/5.13a |\n",
|
|
"| 24|7b/V8 |8a/5.13b |\n",
|
|
"| 25|7b+/V8 |8a+/5.13c |\n",
|
|
"| 26|7c/V9 |8b/5.13d |\n",
|
|
"| 27|7c+/V10 |8b+/5.14a |\n",
|
|
"| 28|8a/V11 |8c/5.14b |\n",
|
|
"| 29|8a+/V12 |8c+/5.14c |\n",
|
|
"| 30|8b/V13 |9a/5.14d |\n",
|
|
"| 31|8b+/V14 |9a+/5.15a |\n",
|
|
"| 32|8c/V15 |9b/5.15b |\n",
|
|
"| 33|8c+/V16 |9b+/5.15c |\n",
|
|
"\n",
|
|
"We will use the actual difficulty in our work, and then unpack translations into boulder_name as we see fit."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4f986836",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Grade distribution"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5fa594ba",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Difficulty distribution by layout (with total)\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"grade_counts = df['boulder_grade'].value_counts()\n",
|
|
"grade_order = df.groupby('boulder_grade')['display_difficulty'].mean().sort_values().index.tolist()\n",
|
|
"grade_counts = grade_counts.reindex(grade_order)\n",
|
|
"\n",
|
|
"# Prepare data in long format\n",
|
|
"df_long = df.groupby(['boulder_grade', 'layout_name']).size().reset_index(name='count')\n",
|
|
"\n",
|
|
"# Calculate total for background\n",
|
|
"df_total = df.groupby('boulder_grade').size().reset_index(name='count')\n",
|
|
"df_total['layout_name'] = 'All Layouts'\n",
|
|
"\n",
|
|
"# Reindex to correct grade order\n",
|
|
"df_long['grade_order'] = df_long['boulder_grade'].map(\n",
|
|
" {g: i for i, g in enumerate(grade_order)}\n",
|
|
")\n",
|
|
"df_long = df_long.sort_values('grade_order')\n",
|
|
"\n",
|
|
"df_total['grade_order'] = df_total['boulder_grade'].map(\n",
|
|
" {g: i for i, g in enumerate(grade_order)}\n",
|
|
")\n",
|
|
"df_total = df_total.sort_values('grade_order')\n",
|
|
"\n",
|
|
"# Plot\n",
|
|
"fig, ax = plt.subplots(figsize=(16, 8))\n",
|
|
"\n",
|
|
"# Plot \"All Layouts\" behind (light gray)\n",
|
|
"sns.barplot(\n",
|
|
" data=df_total,\n",
|
|
" x='boulder_grade',\n",
|
|
" y='count',\n",
|
|
" color='lightgray',\n",
|
|
" ax=ax,\n",
|
|
" zorder=1,\n",
|
|
" width=0.6,\n",
|
|
" order=grade_order\n",
|
|
")\n",
|
|
"\n",
|
|
"# Plot individual layouts (grouped) in front\n",
|
|
"sns.barplot(\n",
|
|
" data=df_long,\n",
|
|
" x='boulder_grade',\n",
|
|
" y='count',\n",
|
|
" hue='layout_name',\n",
|
|
" palette=['steelblue', 'coral', 'seagreen'],\n",
|
|
" ax=ax,\n",
|
|
" zorder=2,\n",
|
|
" order=grade_order\n",
|
|
")\n",
|
|
"\n",
|
|
"# Create custom legend with \"All Layouts\" included\n",
|
|
"handles, labels = ax.get_legend_handles_labels()\n",
|
|
"all_layouts_patch = mpatches.Patch(color='lightgray', label='All Layouts')\n",
|
|
"handles.insert(0, all_layouts_patch)\n",
|
|
"ax.legend(handles=handles, title='Layout', fontsize=10)\n",
|
|
"\n",
|
|
"ax.set_xlabel('Grade', fontsize=11)\n",
|
|
"ax.set_ylabel('Number of Climbs', fontsize=11)\n",
|
|
"ax.set_title('Difficulty Distribution by Board Layout', fontsize=14)\n",
|
|
"ax.tick_params(axis='x', rotation=45)\n",
|
|
"ax.grid(axis='y', alpha=0.3)\n",
|
|
"\n",
|
|
"plt.tight_layout()\n",
|
|
"plt.savefig('../images/01_climb_stats/difficulty_distribution_by_layout_with_total.png', dpi=150, bbox_inches='tight')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "871bb45d",
|
|
"metadata": {},
|
|
"source": [
|
|
"As a climber in North America, I tend to just use the V-grade and not look at the French grade. So let us group the V-grades together and show the distribution like that. We'll usually just stick the boulder_grade (e.g., 5c/V2) instead of grouping the V-grades though. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d7f0b911",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"V-Grade distribution by layout (with total)\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Let's add a v_grade column and v_grade_counts\n",
|
|
"df['v_grade'] = df['boulder_grade'].str.split('/').str[1]\n",
|
|
"v_grade_counts = df['v_grade'].value_counts()\n",
|
|
"v_grade_order = df.groupby('v_grade')['display_difficulty'].mean().sort_values().index.tolist()\n",
|
|
"v_grade_counts = grade_counts.reindex(v_grade_order)\n",
|
|
"\n",
|
|
"# Prepare data in long format\n",
|
|
"df_long = df.groupby(['v_grade', 'layout_name']).size().reset_index(name='count')\n",
|
|
"\n",
|
|
"# Calculate total for background\n",
|
|
"df_total = df.groupby('v_grade').size().reset_index(name='count')\n",
|
|
"df_total['layout_name'] = 'All Layouts'\n",
|
|
"\n",
|
|
"# Reindex to correct grade order\n",
|
|
"df_long['v_grade_order'] = df_long['v_grade'].map(\n",
|
|
" {g: i for i, g in enumerate(v_grade_order)}\n",
|
|
")\n",
|
|
"df_long = df_long.sort_values('v_grade_order')\n",
|
|
"\n",
|
|
"df_total['v_grade_order'] = df_total['v_grade'].map(\n",
|
|
" {g: i for i, g in enumerate(v_grade_order)}\n",
|
|
")\n",
|
|
"df_total = df_total.sort_values('v_grade_order')\n",
|
|
"\n",
|
|
"# Plot\n",
|
|
"fig, ax = plt.subplots(figsize=(16, 8))\n",
|
|
"\n",
|
|
"# Plot \"All Layouts\" behind (light gray)\n",
|
|
"sns.barplot(\n",
|
|
" data=df_total,\n",
|
|
" x='v_grade',\n",
|
|
" y='count',\n",
|
|
" color='lightgray',\n",
|
|
" ax=ax,\n",
|
|
" zorder=1,\n",
|
|
" width=0.6,\n",
|
|
" order=v_grade_order\n",
|
|
")\n",
|
|
"\n",
|
|
"# Plot individual layouts (grouped) in front\n",
|
|
"sns.barplot(\n",
|
|
" data=df_long,\n",
|
|
" x='v_grade',\n",
|
|
" y='count',\n",
|
|
" hue='layout_name',\n",
|
|
" palette=palette,\n",
|
|
" ax=ax,\n",
|
|
" zorder=2,\n",
|
|
" order=v_grade_order\n",
|
|
")\n",
|
|
"\n",
|
|
"# Create legend with \"All Layouts\" included\n",
|
|
"handles, labels = ax.get_legend_handles_labels()\n",
|
|
"all_layouts_patch = mpatches.Patch(color='lightgray', label='All Layouts')\n",
|
|
"handles.insert(0, all_layouts_patch)\n",
|
|
"ax.legend(handles=handles, title='Layout', fontsize=10)\n",
|
|
"\n",
|
|
"ax.set_xlabel('Grade', fontsize=11)\n",
|
|
"ax.set_ylabel('Number of Climbs', fontsize=11)\n",
|
|
"ax.set_title('Difficulty Distribution by Board Layout', fontsize=14)\n",
|
|
"ax.tick_params(axis='x', rotation=45)\n",
|
|
"ax.grid(axis='y', alpha=0.3)\n",
|
|
"\n",
|
|
"plt.tight_layout()\n",
|
|
"plt.savefig('../images/01_climb_stats/v_grade_distribution_by_layout_with_total.png', dpi=150, bbox_inches='tight')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e27f4277",
|
|
"metadata": {},
|
|
"source": [
|
|
"So the grade distribution actually varies quite a bit from board to board. Some key differences in grades are the angle at which the climb is. Note that climbs can be done at different angles."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e6a34766",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Angle Distribution\n",
|
|
"\n",
|
|
"What about the angle distribution? Since the TB1 goes from 0 to 50 and the TB2 goes from 0 to 65 (although my local board only goes to 60?), let's do an analysis on each."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7d65b6cd",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Angle distribution\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# TB1 goes up to 50 degrees, TB2 up to 65. (Although my local TB2 only goes up to 60 -- brutal climbing)\n",
|
|
"\n",
|
|
"# Prepare data in long format\n",
|
|
"df_long = df.groupby(['angle', 'layout_name']).size().reset_index(name='count')\n",
|
|
"\n",
|
|
"# Calculate total for background\n",
|
|
"df_total = df.groupby('angle').size().reset_index(name='count')\n",
|
|
"df_total['layout_name'] = 'All Layouts'\n",
|
|
"\n",
|
|
"# Reindex to correct order\n",
|
|
"angle_order = sorted(df['angle'].unique())\n",
|
|
"\n",
|
|
"# Plot\n",
|
|
"fix, ax = plt.subplots(figsize=(16,8))\n",
|
|
"\n",
|
|
"# Plot All Layouts\n",
|
|
"sns.barplot(\n",
|
|
" data=df_total,\n",
|
|
" x='angle',\n",
|
|
" y='count',\n",
|
|
" color='lightgray',\n",
|
|
" ax=ax,\n",
|
|
" zorder=1,\n",
|
|
" width=0.6,\n",
|
|
" order=angle_order\n",
|
|
")\n",
|
|
"\n",
|
|
"# Plt indivudual layouts\n",
|
|
"sns.barplot(\n",
|
|
" data=df_long,\n",
|
|
" x='angle',\n",
|
|
" y='count',\n",
|
|
" hue='layout_name',\n",
|
|
" palette=palette,\n",
|
|
" order=angle_order,\n",
|
|
" ax=ax,\n",
|
|
" zorder=2\n",
|
|
")\n",
|
|
"\n",
|
|
"handles,labels = ax.get_legend_handles_labels()\n",
|
|
"all_layouts_patch = mpatches.Patch(color='lightgray', label='All Layouts')\n",
|
|
"handles.insert(0, all_layouts_patch)\n",
|
|
"ax.legend(handles=handles, title='Layout')\n",
|
|
"\n",
|
|
"ax.set_xlabel('Angle')\n",
|
|
"ax.set_ylabel('Number of Climbs')\n",
|
|
"ax.set_title('Angle Distribution by Board Layout')\n",
|
|
"ax.grid(axis='y', alpha=0.3)\n",
|
|
"\n",
|
|
"\n",
|
|
"plt.suptitle('Angle Distribution by Board Layout')\n",
|
|
"plt.savefig('../images/01_climb_stats/angle_distribution_by_layout.png')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7a00ee20",
|
|
"metadata": {},
|
|
"source": [
|
|
"We see that for all the boards, 40 degrees is the most common angle."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bc164cee",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Angle vs grade\n",
|
|
"\n",
|
|
"How is the distribution between angles and grades? Let's do this with a heatmap."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2023cb49",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Angle vs grade\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"fig, ax = plt.subplots(figsize=(16, 8))\n",
|
|
"\n",
|
|
"# Create mapping from difficulty to V-grade\n",
|
|
"grade_mapping = df.groupby('display_difficulty')['boulder_grade'].first().to_dict()\n",
|
|
"\n",
|
|
"# Plot \"All Layouts\" as faint background boxes\n",
|
|
"sns.boxplot(\n",
|
|
" data=df,\n",
|
|
" x='angle',\n",
|
|
" y='display_difficulty',\n",
|
|
" color='lightgray',\n",
|
|
" order=angle_order,\n",
|
|
" showfliers=False,\n",
|
|
" width=0.6,\n",
|
|
" ax=ax,\n",
|
|
" zorder=1\n",
|
|
")\n",
|
|
"\n",
|
|
"# Plot individual layouts in front\n",
|
|
"sns.boxplot(\n",
|
|
" data=df,\n",
|
|
" x='angle',\n",
|
|
" y='display_difficulty',\n",
|
|
" hue='layout_name',\n",
|
|
" hue_order=['Original Layout', 'Tension Board 2 Mirror', 'Tension Board 2 Spray'], # For some reason this plot goes TB2 Spray / TB1 Orig / TB2 Mirror. Simple fix.\n",
|
|
" palette=['steelblue', 'coral', 'seagreen'],\n",
|
|
" order=angle_order,\n",
|
|
" showfliers=False,\n",
|
|
" ax=ax,\n",
|
|
" width=0.5,\n",
|
|
" zorder=2\n",
|
|
")\n",
|
|
"\n",
|
|
"# Relabel y-axis with boulder_grades\n",
|
|
"yticks_rounded = sorted(set(int(round(t)) for t in df['display_difficulty'].unique() if not pd.isna(t)))\n",
|
|
"ylabels = [grade_mapping.get(t, '') for t in yticks_rounded]\n",
|
|
"ax.set_yticks(yticks_rounded)\n",
|
|
"ax.set_yticklabels(ylabels)\n",
|
|
"\n",
|
|
"# Custom legend with \"All Layouts\"\n",
|
|
"handles, labels = ax.get_legend_handles_labels()\n",
|
|
"all_patch = mpatches.Patch(color='lightgray', label='All Layouts')\n",
|
|
"handles.insert(0, all_patch)\n",
|
|
"ax.legend(handles=handles, title='Layout', fontsize=10)\n",
|
|
"\n",
|
|
"ax.set_xlabel('Angle (degrees)', fontsize=11)\n",
|
|
"ax.set_ylabel('V-Grade', fontsize=11)\n",
|
|
"ax.set_title('Difficulty Distribution by Angle and Layout', fontsize=14)\n",
|
|
"ax.grid(axis='y', alpha=0.3)\n",
|
|
"\n",
|
|
"plt.tight_layout()\n",
|
|
"plt.savefig('../images/01_climb_stats/difficulty_by_angle_boxplot_by_layout.png', dpi=150, bbox_inches='tight')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c7a5973a",
|
|
"metadata": {},
|
|
"source": [
|
|
"## The Quality of a climb\n",
|
|
"\n",
|
|
"Next we examine the quality of a climb. First we look at how quality relates to the number of ascents."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "bf5bd013",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Climb quality vs popularity\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Filter to climbs with quality ratings\n",
|
|
"df_quality = df[(df['quality_average'].notna()) & (df['quality_average'] > 0)]\n",
|
|
"\n",
|
|
"# Sample for performance\n",
|
|
"df_sample = df_quality.sample(min(2000, len(df_quality)))\n",
|
|
"\n",
|
|
"g = sns.jointplot(\n",
|
|
" data=df_sample,\n",
|
|
" x='quality_average',\n",
|
|
" y='ascensionist_count',\n",
|
|
" kind='scatter',\n",
|
|
" color='teal',\n",
|
|
" height=5\n",
|
|
")\n",
|
|
"\n",
|
|
"g.ax_joint.set_xlabel('Quality Rating')\n",
|
|
"g.ax_joint.set_ylabel('Ascensionist Count')\n",
|
|
"g.fig.suptitle('Quality vs Popularity')\n",
|
|
"\n",
|
|
"plt.savefig('../images/01_climb_stats/quality_popularity.png', dpi=150, bbox_inches='tight')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bf0ca290",
|
|
"metadata": {},
|
|
"source": [
|
|
"Next we visualize the average quality vs the angle and grade, by means of a heatmap. Keep in mind that the harder the climb and steeper the angle, the less people will be doing it. So harder climbs are skewed towards people who can actually do it. The point is that, on boards, the climb quality isn't always the best metric. As such, we won't spend too much time on the quality and will only do a heatmap which takes into account all layouts."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "af7282b9",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"### Average quality by angle and grade\n",
|
|
"\n",
|
|
"# Filter to climbs with quality ratings\n",
|
|
"df_quality = df[(df['quality_average'].notna()) & (df['quality_average'] > 0)]\n",
|
|
"\n",
|
|
"\n",
|
|
"# Create pivot table\n",
|
|
"quality_pivot = df_quality.pivot_table(\n",
|
|
" index='boulder_grade',\n",
|
|
" columns='angle',\n",
|
|
" values='quality_average',\n",
|
|
" aggfunc='mean'\n",
|
|
")\n",
|
|
"quality_pivot = quality_pivot.reindex(grade_order)\n",
|
|
"quality_pivot = quality_pivot.reindex(columns=[a for a in angle_order if a in quality_pivot.columns])\n",
|
|
"\n",
|
|
"# Plot\n",
|
|
"fig, ax = plt.subplots(figsize=(16, 8))\n",
|
|
"\n",
|
|
"sns.heatmap(\n",
|
|
" quality_pivot,\n",
|
|
" cmap='RdYlGn',\n",
|
|
" cbar_kws={'label': 'Avg Quality Rating'},\n",
|
|
" ax=ax\n",
|
|
")\n",
|
|
"\n",
|
|
"ax.set_xlabel('Angle (°)')\n",
|
|
"ax.set_ylabel('Grade')\n",
|
|
"ax.invert_yaxis()\n",
|
|
"ax.set_title('Average Quality Rating by Grade and Angle (All Layouts)')\n",
|
|
"\n",
|
|
"plt.tight_layout()\n",
|
|
"plt.savefig('../images/01_climb_stats/quality_heatmap_all_layouts.png', dpi=150, bbox_inches='tight')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3630b077",
|
|
"metadata": {},
|
|
"source": [
|
|
"## \"Match\" vs. \"No Match\"\n",
|
|
"\n",
|
|
"Some setters opt to put the \"no match\" tag onto their climbs. This means that the climber is not allowed to match their hands on any hold. Let's do an analysis of the differences with regular climbs."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "078cd6b6",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Match vs No Match analysis\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Create status column (Match vs No Match)\n",
|
|
"df['status'] = df.apply(\n",
|
|
" lambda x: 'No Match' if (\n",
|
|
" pd.notna(x['description']) and 'No matching' in str(x['description'])\n",
|
|
" ) or x.get('is_nomatch', 0) == 1 else 'Matched',\n",
|
|
" axis=1\n",
|
|
")\n",
|
|
"\n",
|
|
"# Aggregate by layout and status\n",
|
|
"df_agg = df.groupby(['layout_name', 'status']).agg(\n",
|
|
" count=('uuid', 'count'),\n",
|
|
" avg_ascensionists=('ascensionist_count', 'mean'),\n",
|
|
" avg_difficulty=('display_difficulty', 'mean')\n",
|
|
").reset_index()\n",
|
|
"\n",
|
|
"# Calculate All Layouts totals\n",
|
|
"df_all = df.groupby('status').agg(\n",
|
|
" count=('uuid', 'count'),\n",
|
|
" avg_ascensionists=('ascensionist_count', 'mean'),\n",
|
|
" avg_difficulty=('display_difficulty', 'mean')\n",
|
|
").reset_index()\n",
|
|
"df_all['layout_name'] = 'All Layouts'\n",
|
|
"\n",
|
|
"# Combine\n",
|
|
"df_combined = pd.concat([df_agg, df_all], ignore_index=True)\n",
|
|
"\n",
|
|
"# Order\n",
|
|
"status_order = ['Matched', 'No Match']\n",
|
|
"\n",
|
|
"\n",
|
|
"# Plot\n",
|
|
"fig, axes = plt.subplots(1, 3, figsize=(18, 6))\n",
|
|
"\n",
|
|
"for ax, metric, title in zip(axes, ['count', 'avg_difficulty', 'avg_ascensionists'], \n",
|
|
" ['Total Climbs', 'Average Difficulty', 'Avg Ascensionists']):\n",
|
|
" \n",
|
|
" # All Layouts as background (separate for each status)\n",
|
|
" for i, status in enumerate(status_order):\n",
|
|
" df_bg = df_combined[(df_combined['layout_name'] == 'All Layouts') & \n",
|
|
" (df_combined['status'] == status)]\n",
|
|
" \n",
|
|
" if not df_bg.empty:\n",
|
|
" # Position for status\n",
|
|
" x_pos = i\n",
|
|
" ax.bar(\n",
|
|
" x_pos,\n",
|
|
" df_bg[metric].values[0],\n",
|
|
" width=0.7,\n",
|
|
" color='lightgray',\n",
|
|
" zorder=1\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Individual layouts in front\n",
|
|
" df_plot = df_combined[df_combined['layout_name'] != 'All Layouts']\n",
|
|
" \n",
|
|
" sns.barplot(\n",
|
|
" data=df_plot,\n",
|
|
" x='status',\n",
|
|
" y=metric,\n",
|
|
" hue='layout_name',\n",
|
|
" palette=palette,\n",
|
|
" order=status_order,\n",
|
|
" ax=ax,\n",
|
|
" zorder=2\n",
|
|
" )\n",
|
|
" \n",
|
|
" ax.set_title(title, fontsize=12)\n",
|
|
" ax.set_xlabel('')\n",
|
|
" ax.set_ylabel(title if metric == 'count' else ('Grade' if metric == 'avg_difficulty' else 'Count'), fontsize=11)\n",
|
|
" ax.legend_.remove()\n",
|
|
" ax.grid(axis='y', alpha=0.3)\n",
|
|
"\n",
|
|
"# Y-axis labels for difficulty plot\n",
|
|
"yticks = [11, 13, 15, 17, 19, 21, 23]\n",
|
|
"ylabels = [grade_mapping.get(t, f\"V{t-10}\") for t in yticks]\n",
|
|
"axes[1].set_yticks(yticks)\n",
|
|
"axes[1].set_yticklabels(ylabels)\n",
|
|
"axes[1].set_ylim(bottom=10)\n",
|
|
"\n",
|
|
"# Add V-grade annotations on difficulty bars\n",
|
|
"for i, status in enumerate(status_order):\n",
|
|
" for j, layout in enumerate(df['layout_name'].unique()):\n",
|
|
" row = df_combined[(df_combined['layout_name'] == layout) & (df_combined['status'] == status)]\n",
|
|
" if not row.empty:\n",
|
|
" diff = row['avg_difficulty'].values[0]\n",
|
|
" v_grade = grade_mapping.get(round(diff), '')\n",
|
|
" # Position: x = status index + offset for layout\n",
|
|
" x_pos = i + (j - 1) * 0.27\n",
|
|
" axes[1].text(x_pos, diff + 0.3, v_grade, ha='center', fontsize=8, fontweight='bold')\n",
|
|
"\n",
|
|
"\n",
|
|
"# Custom Legend\n",
|
|
"handles,labels = ax.get_legend_handles_labels()\n",
|
|
"all_layouts_patch = mpatches.Patch(color='lightgray', label='All Layouts')\n",
|
|
"handles.insert(0, all_layouts_patch)\n",
|
|
"fig.legend(handles=handles, title='Layout', bbox_to_anchor=(1.08, 0.9))\n",
|
|
"\n",
|
|
"plt.suptitle('Match vs No Match Climbs by Layout', fontsize=14, y=1.02)\n",
|
|
"plt.tight_layout()\n",
|
|
"plt.savefig('../images/01_climb_stats/match_vs_nomatch_by_layout.png', dpi=150, bbox_inches='tight')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6f51156a",
|
|
"metadata": {},
|
|
"source": [
|
|
"So we gather the following about \"no match\" climbs:\n",
|
|
"\n",
|
|
"- they are far fewer than \"match\" climbs,\n",
|
|
"- they are on average harder than \"match\" climbs,\n",
|
|
"- and that they tend to have a bit more ascensionists on the TB2, and less on the TB1 (although, much more overall)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "21e12faa",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Match vs No Match Summary\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"summary = df_combined.pivot_table(\n",
|
|
" index='layout_name',\n",
|
|
" columns='status',\n",
|
|
" values=['count', 'avg_difficulty', 'avg_ascensionists']\n",
|
|
").round(2)\n",
|
|
"\n",
|
|
"summary"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "de8eb20e",
|
|
"metadata": {},
|
|
"source": [
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "754abf39",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Prolific statistics\n",
|
|
"\n",
|
|
"Here we will take note of some prolific statistics: what are the most popular climbs and who are the most popular setters?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "935c0ea3",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Most popular climbs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "42f93d59",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Most popular climbs\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# The ascensionist_count column will allow us to easily deduce the top 15 climbs. \n",
|
|
"\n",
|
|
"# Create a DataFrame with the top 15 climbs\n",
|
|
"df_popular_climbs = df.sort_values(by='ascensionist_count', ascending=False).head(15)[::-1]\n",
|
|
"\n",
|
|
"# We want the y-axis to say \"Climb @ angle\". Let's just create a new column for this. \n",
|
|
"df_popular_climbs['y_label'] = df_popular_climbs.apply(\n",
|
|
" lambda row: f\"{row['climb_name']} @ {row['angle']}°\", axis=1\n",
|
|
")\n",
|
|
"\n",
|
|
"# Plot it\n",
|
|
"fig, ax = plt.subplots(figsize=(16,8))\n",
|
|
"bars = ax.barh(df_popular_climbs['y_label'], df_popular_climbs['ascensionist_count'], color='teal')\n",
|
|
"ax.set_xlabel('Ascensionist Count')\n",
|
|
"ax.set_title('Top 15 Most Popular Climbs (at a specific angle)')\n",
|
|
"\n",
|
|
"# Add grade and angle labels\n",
|
|
"for bar, (_, row) in zip(bars, df_popular_climbs.iterrows()):\n",
|
|
" label = f\"{row['boulder_grade']} on {row['layout_name']}\"\n",
|
|
" ax.text(100,\n",
|
|
" bar.get_y() + bar.get_height()/2,\n",
|
|
" label,\n",
|
|
" va='center',\n",
|
|
" ha='left',\n",
|
|
" color='white')\n",
|
|
"\n",
|
|
"\n",
|
|
"plt.savefig('../images/01_climb_stats/top_15_climbs.png', bbox_inches='tight')\n",
|
|
"plt.show()\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9d70aa42",
|
|
"metadata": {},
|
|
"source": [
|
|
"It's unsuprising that every one of these climbs is at 40° given that 40° is the most popular angle, by a long shot.\n",
|
|
"\n",
|
|
"What about an angle-agnostic analysis? What are the top climbs amonst all angles?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "404c67aa",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Top 15 most popular climbs (angle agnostic)\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Aggregate by climb_name (sum counts across all angles)\n",
|
|
"df_agg = df.groupby(['climb_name', 'layout_name']).agg(\n",
|
|
" total_ascensionists=('ascensionist_count', 'sum'),\n",
|
|
" avg_difficulty=('display_difficulty', 'mean')\n",
|
|
").reset_index()\n",
|
|
"\n",
|
|
"\n",
|
|
"# Sort and select top 15\n",
|
|
"df_popular_climbs_aa = df_agg.sort_values(by='total_ascensionists', ascending=False).head(15)\n",
|
|
"\n",
|
|
"df_popular_climbs_aa\n",
|
|
"\n",
|
|
"# Plot\n",
|
|
"fig, ax = plt.subplots(figsize=(16, 8))\n",
|
|
"bars = ax.barh(df_popular_climbs_aa['climb_name'], df_popular_climbs_aa['total_ascensionists'], color='teal')\n",
|
|
"ax.set_xlabel('Total Ascensionist Count (All Angles)')\n",
|
|
"ax.set_title('Top 15 Most Popular Climbs (Angle Agnostic)')\n",
|
|
"ax.invert_yaxis()\n",
|
|
"\n",
|
|
"# Add grade labels inside bars\n",
|
|
"for bar, (_, row) in zip(bars, df_popular_climbs.iterrows()):\n",
|
|
" # Create a label like \"(Tension Board 2 Mirror)\"\n",
|
|
" label = f\"{row['layout_name']}\" \n",
|
|
" \n",
|
|
" ax.text(\n",
|
|
" 100, # Position inside bar\n",
|
|
" bar.get_y() + bar.get_height() / 2,\n",
|
|
" label,\n",
|
|
" va='center',\n",
|
|
" ha='left',\n",
|
|
" color='white'\n",
|
|
" )\n",
|
|
"\n",
|
|
"plt.tight_layout()\n",
|
|
"plt.savefig('../images/01_climb_stats/top_15_climbs_angle_agnostic.png', bbox_inches='tight')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ff455a15",
|
|
"metadata": {},
|
|
"source": [
|
|
"What about the top climbs (with/without angles) for each of the board layouts? Let's do with angles first. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "1d50a209",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Top 15 most popular climbs by layout\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"layouts = df['layout_name'].unique()\n",
|
|
"\n",
|
|
"for layout in layouts:\n",
|
|
" # Filter data for this layout\n",
|
|
" df_layout = df[df['layout_name'] == layout]\n",
|
|
" \n",
|
|
" # Sort by popularity and take top 15\n",
|
|
" df_top = df_layout.sort_values(by='ascensionist_count', ascending=False).head(15).reset_index(drop=True)\n",
|
|
" \n",
|
|
" # Select desired columns\n",
|
|
" df_display = df_top[['climb_name', 'angle', 'boulder_grade', 'ascensionist_count']].copy()\n",
|
|
" \n",
|
|
" # Rename columns for display\n",
|
|
" df_display.columns = ['Name', 'Angle', 'Grade', 'Ascensionists']\n",
|
|
" \n",
|
|
" # Format angle as string with degree symbol\n",
|
|
" df_display['Angle'] = df_display['Angle'].astype(int).astype(str) + '°'\n",
|
|
" \n",
|
|
" # Reset index to show rank 1-15\n",
|
|
" df_display.index = df_display.index + 1\n",
|
|
" \n",
|
|
" print(f\"\\n{layout}\\n\")\n",
|
|
" display(df_display)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e60b4705",
|
|
"metadata": {},
|
|
"source": [
|
|
"On the TB2, it looks like 40 degrees constantly wins out. It is cool to see that, on the TB1, at least one climb made it to the top 15 twice, at two different angles! Congrats \"It's Alive.\"\n",
|
|
"\n",
|
|
"Now let us do a per-board angle agnostic analysis."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6f99580f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Top 15 climbs by layout (angle agnostic)\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"layouts = df['layout_name'].unique()\n",
|
|
"\n",
|
|
"# Aggregate counts and collect angles\n",
|
|
"df_agg = df.groupby(['climb_name', 'layout_name']).agg(\n",
|
|
" total_ascensionists=('ascensionist_count', 'sum')\n",
|
|
").reset_index()\n",
|
|
"\n",
|
|
"for layout in layouts:\n",
|
|
" df_layout = df_agg[df_agg['layout_name'] == layout]\n",
|
|
" df_top = df_layout.sort_values(by='total_ascensionists', ascending=False).head(15).reset_index(drop=True)\n",
|
|
" \n",
|
|
" df_display = df_top[['climb_name', 'total_ascensionists']].copy()\n",
|
|
" df_display.columns = ['Name', 'Total Ascensionists']\n",
|
|
" \n",
|
|
" # Appropriate index\n",
|
|
" df_display.index = df_display.index + 1\n",
|
|
" \n",
|
|
" print(f\"\\n### {layout}\\n\")\n",
|
|
" display(df_display)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cab87caa",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prolific setters\n",
|
|
"\n",
|
|
"Next, we will make a simple table of the most prolific setters by board."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3053ddc3",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\"\"\"\n",
|
|
"==================================\n",
|
|
"Top 10 setters by layout\n",
|
|
"==================================\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"# Make a DataFrame for the setters\n",
|
|
"df_setters = df.groupby(['setter_username', 'layout_name']).agg(\n",
|
|
" climb_count=('uuid', 'nunique')\n",
|
|
").reset_index()\n",
|
|
"\n",
|
|
"layouts = df['layout_name'].unique()\n",
|
|
"\n",
|
|
"for layout in layouts:\n",
|
|
" df_layout = df_setters[df_setters['layout_name'] == layout]\n",
|
|
" df_top = df_layout.sort_values(by='climb_count', ascending=False).head(10).reset_index(drop=True)\n",
|
|
"\n",
|
|
" df_display = df_top[['setter_username', 'climb_count']].copy()\n",
|
|
" df_display.columns = ['Username', 'Climbs']\n",
|
|
"\n",
|
|
" # Reset index to show rank\n",
|
|
" df_display.index = df_display.index + 1\n",
|
|
"\n",
|
|
" # Display\n",
|
|
" print(f\"\\n{layout}\\n\")\n",
|
|
" display(df_display)\n",
|
|
" \n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "19dd4fad",
|
|
"metadata": {},
|
|
"source": [
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "085c99ea",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Conclusion\n",
|
|
"\n",
|
|
"At this point we have a board-level and climb-level picture of the dataset. In particular, we now know:\n",
|
|
"\n",
|
|
"- how large the dataset is,\n",
|
|
"- how the grade and angle distributions vary across layouts,\n",
|
|
"- which climbs and setters appear most often,\n",
|
|
"- and where simple descriptive trends begin to show up.\n",
|
|
"\n",
|
|
"That gives us enough context to move from *global statistics* to *hold-level structure*. The next notebook focuses on hold usage patterns and board heatmaps, where we stop asking only **how many climbs there are** and start asking **which physical parts of the board are driving those climbs**."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.14.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|