{ "cells": [ { "cell_type": "markdown", "id": "f301146a", "metadata": {}, "source": [ "# Tension Board 2: Hold Difficulty Analysis\n", "\n", "We continue on with our hold analysis, except we will solely be interested in computing the difficulty of each hold.\n", "\n", "Recall some of the following findings.\n", "\n", "- TB2 Mirror has has `layout_id` 10, and has two sets: wood and plastic. These have `set_id` 12 and 13 respectively. \n", "- the `frame` feature of a climb determines the climb: it looks something like `p3r4p29r2p59r1p65r2p75r3p89r2p157r4p158r4`. A substring `pXrY` tells us the placement (`placement_id=X`) and the role (whether it is a start, finish, foot, or middle hold) comes from the `placement_role_id=Y`. The role will also tell us which color to use if we plot our climb against the board.\n", "- the `holes` table will tell us which `placement_id` goes where on the (x,y) coordinate system. It also tells us the ID of its mirror image, which let's us unravel the `placement_id` of its mirror image.\n", "\n", "## Output\n", "\n", "The final products are hold-level difficulty scores saved to CSV files. These scores encode, for each placement, the average difficulty of climbs that use that hold. The scores are computed per-angle, per-role, and also aggregated. A Bayesian smoothing step shrinks noisy estimates for rarely-used holds toward the global mean, and mirror averaging stabilizes scores across symmetric left-right hold pairs.\n", "\n", "## Notebook Structure\n", "\n", "1. [Setup and Imports](#setup-and-imports)\n", "2. [Hold Usage DataFrame](#hold-usage-dataframe)\n", "3. [Difficulty Score](#difficulty-score)\n", "4. [Visualization](#visualization)\n", "5. [Conclusion](#conclusion)" ] }, { "cell_type": "markdown", "id": "6e17c7da", "metadata": {}, "source": [ "# Setup and Imports" ] }, { "cell_type": "code", "execution_count": null, "id": "2cd8a53a", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Setup and Imports\n", "==================================\n", "\"\"\"\n", "\n", "\n", "# Imports\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import numpy as np\n", "import matplotlib.patches as mpatches\n", "\n", "import sqlite3\n", "\n", "import os\n", "\n", "import re\n", "from collections import defaultdict\n", "\n", "from PIL import Image\n", "\n", "# Set some display options\n", "pd.set_option('display.max_columns', None)\n", "pd.set_option('display.max_rows', 100)\n", "\n", "# Set style\n", "palette=['steelblue', 'coral', 'seagreen'] #(for multi-bar graphs)\n", "\n", "# Set board image for some visual analysis\n", "board_img = Image.open('../images/tb2_board_12x12_composite.png')\n", "\n", "# Connect to the database\n", "DB_PATH=\"../data/tb2.db\"\n", "conn = sqlite3.connect(DB_PATH)" ] }, { "cell_type": "code", "execution_count": null, "id": "c9da4ef8", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Query our data from the DB\n", "==================================\n", "\n", "This time we restrict to where `layout_id=10` for the TB2 Mirror.\n", "\"\"\"\n", "\n", "# Query climbs data\n", "climbs_query = \"\"\"\n", "SELECT\n", " c.uuid,\n", " c.name AS climb_name,\n", " c.setter_username,\n", " c.layout_id AS layout_id,\n", " c.description,\n", " c.is_nomatch,\n", " c.is_listed,\n", " l.name AS layout_name,\n", " p.name AS board_name,\n", " c.frames,\n", " cs.angle,\n", " cs.display_difficulty,\n", " dg.boulder_name AS boulder_grade,\n", " cs.ascensionist_count,\n", " cs.quality_average,\n", " cs.fa_at\n", " \n", "FROM climbs c\n", "JOIN layouts l ON c.layout_id = l.id\n", "JOIN products p ON l.product_id = p.id\n", "JOIN climb_stats cs ON c.uuid = cs.climb_uuid\n", "JOIN difficulty_grades dg ON ROUND(cs.display_difficulty) = dg.difficulty\n", "WHERE cs.display_difficulty IS NOT NULL AND c.is_listed=1 AND c.layout_id=10\n", "\"\"\"\n", "\n", "# Query information about placements (and their mirrors)\n", "placements_query = \"\"\"\n", "SELECT\n", " p.id AS placement_id,\n", " h.x,\n", " h.y,\n", " p.default_placement_role_id AS default_role_id,\n", " p.set_id AS set_id,\n", " s.name AS set_name,\n", " p_mirror.id AS mirror_placement_id\n", "FROM placements p\n", "JOIN holes h ON p.hole_id = h.id\n", "JOIN sets s ON p.set_id = s.id\n", "LEFT JOIN holes h_mirror ON h.mirrored_hole_id = h_mirror.id\n", "LEFT JOIN placements p_mirror ON p_mirror.hole_id = h_mirror.id AND p_mirror.layout_id = p.layout_id\n", "WHERE p.layout_id = 10\n", "\"\"\"\n", "\n", "# Load it into a DataFrame\n", "df_climbs = pd.read_sql_query(climbs_query, conn)\n", "df_placements = pd.read_sql_query(placements_query, conn)\n", "\n", "# Save placements csv in data (for other things later on)\n", "df_placements.to_csv('../data/placements.csv')" ] }, { "cell_type": "markdown", "id": "336687a9", "metadata": {}, "source": [ "We've added a column for the mirror of a hold. Let's take a look at `df_placements`." ] }, { "cell_type": "code", "execution_count": null, "id": "b2f74d89", "metadata": {}, "outputs": [], "source": [ "display(df_placements)" ] }, { "cell_type": "code", "execution_count": null, "id": "1a4a5612", "metadata": {}, "outputs": [], "source": [ "# Role definitions\n", "ROLE_DEFINITIONS = {\n", " 'start': 5,\n", " 'middle': 6,\n", " 'finish': 7,\n", " 'foot': 8\n", "}\n", "\n", "HAND_ROLES = ['start', 'middle', 'finish']\n", "FOOT_ROLES = ['foot']\n", "ROLE_TYPES = ['start', 'middle', 'finish', 'hand', 'foot']\n", "\n", "MATERIAL_PALETTE = {'Wood': '#8B4513', 'Plastic': '#4169E1'}\n", "\n", "def get_role_type(role_id):\n", " \"\"\"Map role_id to role_type string.\"\"\"\n", " for role_type, rid in ROLE_DEFINITIONS.items():\n", " if role_id == rid:\n", " return role_type\n", " return 'unknown'" ] }, { "cell_type": "code", "execution_count": null, "id": "b395dd64", "metadata": {}, "outputs": [], "source": [ "# Placement Data\n", "# Build placement_coordinates dict\n", "placement_coordinates = {\n", " row['placement_id']: (row['x'], row['y'])\n", " for _, row in df_placements.iterrows()\n", "}\n", "\n", "# Build mirror mapping\n", "placement_to_mirror = {\n", " row['placement_id']: int(row['mirror_placement_id'])\n", " for _, row in df_placements.iterrows()\n", " if pd.notna(row['mirror_placement_id'])\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "3fee6f6b", "metadata": {}, "outputs": [], "source": [ "get_role_type(7)" ] }, { "cell_type": "code", "execution_count": null, "id": "51e0bd84", "metadata": {}, "outputs": [], "source": [ "## Boundary conditions\n", "x_min, x_max = -68, 68\n", "y_min, y_max = 0, 144" ] }, { "cell_type": "markdown", "id": "8b8d9abd", "metadata": {}, "source": [ "# Hold Usage DataFrame" ] }, { "cell_type": "code", "execution_count": null, "id": "85f7ac83", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Hold Usage DataFrame\n", "==================================\n", "\n", "Explodes climb frames into individual hold usages.\n", "\"\"\"\n", "\n", "records = []\n", "\n", "for _, row in df_climbs.iterrows():\n", " frames = row['frames']\n", " if not isinstance(frames, str):\n", " continue\n", " \n", " matches = re.findall(r'p(\\d+)r(\\d+)', frames)\n", " \n", " for p_str, r_str in matches:\n", " role_type = get_role_type(int(r_str))\n", " records.append({\n", " 'placement_id': int(p_str),\n", " 'role_id': int(r_str),\n", " 'role_type': role_type,\n", " 'is_hand': role_type in HAND_ROLES,\n", " 'is_foot': role_type in FOOT_ROLES,\n", " 'difficulty': row['display_difficulty'],\n", " 'angle': row['angle'],\n", " 'climb_uuid': row['uuid']\n", " })\n", "\n", "df_hold_usage = pd.DataFrame(records)\n", "\n", "print(f\"Built hold usage DataFrame: {len(df_hold_usage):,} records\")\n", "print(f\"Unique placements: {df_hold_usage['placement_id'].nunique():,}\")\n", "print(f\"Unique angles: {sorted(df_hold_usage['angle'].unique())}\")\n", "\n", "print(\"\\nRecords by role type:\")\n", "display(df_hold_usage['role_type'].value_counts().to_frame('count'))\n", "\n", "print(f\"\\nHand usages: {df_hold_usage['is_hand'].sum():,}\")\n", "print(f\"Foot usages: {df_hold_usage['is_foot'].sum():,}\")" ] }, { "cell_type": "markdown", "id": "38df6453", "metadata": {}, "source": [ "# Difficulty Score" ] }, { "cell_type": "markdown", "id": "107b223f", "metadata": {}, "source": [ "## Bayesian Smoothing of Hold Difficulty\n", "\n", "Raw hold difficulty estimates can be unstable for rarely used holds. To reduce\n", "noise, we apply Bayesian smoothing, shrinking hold-level averages toward the\n", "global mean difficulty. Frequently used holds remain close to their empirical\n", "means, while sparse holds are pulled more strongly toward the overall average.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f9a4e3c9", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Bayesian Smoothing\n", "==================================\n", "\"\"\"\n", "\n", "SMOOTHING_M = 20\n", "\n", "def bayesian_smooth(mean_col, count_col, global_mean, m=SMOOTHING_M):\n", " \"\"\"\n", " Bayesian smoothing toward the global mean.\n", " \"\"\"\n", " return (count_col * mean_col + m * global_mean) / (count_col + m)\n", "\n", "GLOBAL_DIFFICULTY_MEAN = df_hold_usage['difficulty'].mean()\n", "print(f\"Global difficulty mean: {GLOBAL_DIFFICULTY_MEAN:.3f}\")\n" ] }, { "cell_type": "markdown", "id": "d54c005d", "metadata": {}, "source": [ "## Raw Difficulty Score" ] }, { "cell_type": "code", "execution_count": null, "id": "7547d6dd", "metadata": {}, "outputs": [], "source": [ "\n", "\"\"\"\n", "==================================\n", "Raw difficulty score (averged & smoothed)\n", "==================================\n", "\n", "\n", "Average difficulty of all climbs that use this hold, plus a Bayesian-smoothed\n", "version that is more stable for low-usage holds.\n", "\"\"\"\n", "\n", "raw_scores = df_hold_usage.groupby('placement_id').agg(\n", " raw_difficulty=('difficulty', 'mean'),\n", " usage_count=('climb_uuid', 'count'),\n", " climbs_count=('climb_uuid', 'nunique')\n", ")\n", "\n", "raw_scores['raw_difficulty_smoothed'] = bayesian_smooth(\n", " raw_scores['raw_difficulty'],\n", " raw_scores['usage_count'],\n", " GLOBAL_DIFFICULTY_MEAN\n", ")\n", "\n", "raw_scores = raw_scores.round(2)\n", "\n", "print(\"### Top 10 Hardest Holds (Raw)\\n\")\n", "display(raw_scores.sort_values('raw_difficulty', ascending=False).head(10))\n", "\n", "print(\"\\n### Top 10 Easiest Holds (Raw)\\n\")\n", "display(raw_scores.sort_values('raw_difficulty', ascending=True).head(10))\n", "\n", "print(\"\\n### Example of Raw vs Smoothed Difficulty\\n\")\n", "display(raw_scores[['raw_difficulty', 'raw_difficulty_smoothed', 'usage_count']].head(10))\n" ] }, { "cell_type": "markdown", "id": "df819708", "metadata": {}, "source": [ "## Per-Angle Difficulty Score" ] }, { "cell_type": "code", "execution_count": null, "id": "13a2d53f", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Per-Angle Difficulty Score\n", "==================================\n", "\n", "Computes difficulty score per angle, then aggregates with weighting.\n", "Uses Bayesian-smoothed per-angle difficulty throughout.\n", "\"\"\"\n", "\n", "# Calculate per-angle scores\n", "angle_scores = df_hold_usage.groupby(['placement_id', 'angle']).agg(\n", " avg_difficulty=('difficulty', 'mean'),\n", " usage_count=('climb_uuid', 'count')\n", ").reset_index()\n", "\n", "# Apply Bayesian smoothing\n", "angle_scores['avg_difficulty_smoothed'] = bayesian_smooth(\n", " angle_scores['avg_difficulty'],\n", " angle_scores['usage_count'],\n", " GLOBAL_DIFFICULTY_MEAN\n", ")\n", "\n", "# Pivot to see angles side-by-side\n", "angle_pivot = angle_scores.pivot_table(\n", " index='placement_id',\n", " columns='angle',\n", " values='avg_difficulty_smoothed',\n", " aggfunc='mean'\n", ")\n", "angle_pivot.columns = [f'diff_{int(col)}deg' for col in angle_pivot.columns]\n", "\n", "# Calculate weighted average using the smoothed per-angle values\n", "weighted_scores = []\n", "\n", "for pid in angle_scores['placement_id'].unique():\n", " df_pid = angle_scores[angle_scores['placement_id'] == pid].copy()\n", "\n", " total_count = df_pid['usage_count'].sum()\n", " weighted_diff = (\n", " df_pid['avg_difficulty_smoothed'] * df_pid['usage_count']\n", " ).sum() / total_count\n", "\n", " weighted_scores.append({\n", " 'placement_id': pid,\n", " 'angle_weighted_difficulty': weighted_diff,\n", " 'angles_used': len(df_pid),\n", " 'min_angle': int(df_pid['angle'].min()),\n", " 'max_angle': int(df_pid['angle'].max()),\n", " 'angle_range': int(df_pid['angle'].max() - df_pid['angle'].min())\n", " })\n", "\n", "df_angle_scores = pd.DataFrame(weighted_scores).set_index('placement_id')\n", "\n", "print(\"### Per-Angle Difficulty Analysis (Sample)\\n\")\n", "display(angle_pivot.join(df_angle_scores).head(15))\n", "\n", "print(f\"\\nAngles used per hold:\")\n", "print(df_angle_scores['angles_used'].describe())\n" ] }, { "cell_type": "markdown", "id": "2164c4fe", "metadata": {}, "source": [ "## Per-Role Difficulty Score" ] }, { "cell_type": "code", "execution_count": null, "id": "f6c9dd60", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Per-Role Difficulty Score\n", "==================================\n", "\n", "Individual roles (start, middle, finish, foot) AND aggregate (hand).\n", "All exported difficulty values are Bayesian-smoothed.\n", "\"\"\"\n", "\n", "# Individual role scores\n", "role_scores = df_hold_usage.groupby(['placement_id', 'role_type']).agg(\n", " avg_difficulty=('difficulty', 'mean'),\n", " usage_count=('climb_uuid', 'count')\n", ").reset_index()\n", "\n", "# Apply Bayesian smoothing\n", "role_scores['avg_difficulty_smoothed'] = bayesian_smooth(\n", " role_scores['avg_difficulty'],\n", " role_scores['usage_count'],\n", " GLOBAL_DIFFICULTY_MEAN\n", ")\n", "\n", "# Pivot for individual roles\n", "role_pivot = role_scores.pivot_table(\n", " index='placement_id',\n", " columns='role_type',\n", " values='avg_difficulty_smoothed',\n", " aggfunc='mean'\n", ")\n", "role_pivot.columns = [f'diff_as_{col}' for col in role_pivot.columns]\n", "\n", "# Usage counts per individual role\n", "role_counts = role_scores.pivot_table(\n", " index='placement_id',\n", " columns='role_type',\n", " values='usage_count',\n", " aggfunc='sum',\n", " fill_value=0\n", ")\n", "role_counts.columns = [f'uses_as_{col}' for col in role_counts.columns]\n", "\n", "# Aggregate hand difficulty\n", "hand_usage = df_hold_usage[df_hold_usage['is_hand']].groupby('placement_id').agg(\n", " diff_as_hand_raw=('difficulty', 'mean'),\n", " uses_as_hand=('climb_uuid', 'count')\n", ")\n", "\n", "hand_usage['diff_as_hand'] = bayesian_smooth(\n", " hand_usage['diff_as_hand_raw'],\n", " hand_usage['uses_as_hand'],\n", " GLOBAL_DIFFICULTY_MEAN\n", ")\n", "\n", "hand_usage = hand_usage[['diff_as_hand', 'uses_as_hand']]\n", "\n", "# Combine role tables\n", "df_role_analysis = role_pivot.join(role_counts).join(hand_usage).round(2)\n", "\n", "cols_order = [\n", " 'diff_as_start', 'uses_as_start',\n", " 'diff_as_middle', 'uses_as_middle',\n", " 'diff_as_finish', 'uses_as_finish',\n", " 'diff_as_hand', 'uses_as_hand',\n", " 'diff_as_foot', 'uses_as_foot'\n", "]\n", "cols_order = [c for c in cols_order if c in df_role_analysis.columns]\n", "df_role_analysis = df_role_analysis[cols_order]\n", "\n", "print(\"### Role-Specific Difficulty Scores (Sample)\\n\")\n", "display(df_role_analysis.head(15))\n", "\n", "print(\"\\n### Holds Used as Both Hand and Foot\\n\")\n", "dual_use = df_role_analysis[\n", " df_role_analysis['diff_as_hand'].notna() &\n", " df_role_analysis['diff_as_foot'].notna()\n", "].copy()\n", "\n", "if len(dual_use) > 0:\n", " dual_use['hand_minus_foot'] = dual_use['diff_as_hand'] - dual_use['diff_as_foot']\n", " display(\n", " dual_use[['diff_as_hand', 'diff_as_foot', 'hand_minus_foot']]\n", " .sort_values('hand_minus_foot', ascending=False)\n", " .head(15)\n", " )\n" ] }, { "cell_type": "markdown", "id": "6f0635f6", "metadata": {}, "source": [ "## Per-Role Per-Angle Difficulty Score" ] }, { "cell_type": "code", "execution_count": null, "id": "2ff53ab4", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Per-Role Per-Angle Difficulty Score\n", "==================================\n", "\n", "\n", "Granular scores: placement_id × role_type × angle\n", "Includes both individual roles AND aggregate hand.\n", "All downstream tables use the smoothed difficulty values.\n", "\"\"\"\n", "\n", "# Individual roles per angle\n", "role_angle_scores = df_hold_usage.groupby(['placement_id', 'role_type', 'angle']).agg(\n", " avg_difficulty=('difficulty', 'mean'),\n", " usage_count=('climb_uuid', 'count')\n", ").reset_index()\n", "\n", "role_angle_scores['avg_difficulty_smoothed'] = bayesian_smooth(\n", " role_angle_scores['avg_difficulty'],\n", " role_angle_scores['usage_count'],\n", " GLOBAL_DIFFICULTY_MEAN\n", ")\n", "\n", "# Aggregate hand per angle\n", "hand_angle_scores = df_hold_usage[df_hold_usage['is_hand']].groupby(['placement_id', 'angle']).agg(\n", " avg_difficulty=('difficulty', 'mean'),\n", " usage_count=('climb_uuid', 'count')\n", ").reset_index()\n", "\n", "hand_angle_scores['avg_difficulty_smoothed'] = bayesian_smooth(\n", " hand_angle_scores['avg_difficulty'],\n", " hand_angle_scores['usage_count'],\n", " GLOBAL_DIFFICULTY_MEAN\n", ")\n", "hand_angle_scores['role_type'] = 'hand'\n", "\n", "# Combine all\n", "df_role_angle = pd.concat([role_angle_scores, hand_angle_scores], ignore_index=True)\n", "\n", "print(f\"Total role-angle records: {len(df_role_angle):,}\")\n", "print(\"\\nBreakdown by role_type:\")\n", "display(df_role_angle.groupby('role_type').size().to_frame('count'))\n", "\n", "print(\"\\n### Per-Role Per-Angle Difficulty Scores (Sample)\\n\")\n", "display(df_role_angle.head(20))\n" ] }, { "cell_type": "markdown", "id": "75ed3028", "metadata": {}, "source": [ "## Creating Tables" ] }, { "cell_type": "code", "execution_count": null, "id": "5b324cd0", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Role-Specific Tables\n", "==================================\n", "\n", "Tables for: start, middle, finish, hand, foot\n", "Each with per-angle columns and overall average.\n", "Uses Bayesian-smoothed role-angle difficulty values.\n", "\"\"\"\n", "\n", "angles = sorted(df_hold_usage['angle'].unique())\n", "role_tables = {}\n", "\n", "for role in ROLE_TYPES:\n", " df_role = df_role_angle[df_role_angle['role_type'] == role].copy()\n", "\n", " if df_role.empty:\n", " print(f\"No data for role: {role}\")\n", " continue\n", "\n", " pivot = df_role.pivot_table(\n", " index='placement_id',\n", " columns='angle',\n", " values='avg_difficulty_smoothed',\n", " aggfunc='mean'\n", " )\n", " pivot.columns = [f'{role}_diff_{int(col)}deg' for col in pivot.columns]\n", " pivot[f'{role}_overall_avg'] = pivot.mean(axis=1).round(2)\n", "\n", " usage_pivot = df_role.pivot_table(\n", " index='placement_id',\n", " columns='angle',\n", " values='usage_count',\n", " aggfunc='sum',\n", " fill_value=0\n", " )\n", " usage_pivot.columns = [f'{role}_uses_{int(col)}deg' for col in usage_pivot.columns]\n", " pivot[f'{role}_total_uses'] = usage_pivot.sum(axis=1).astype(int)\n", "\n", " role_tables[role] = pivot.join(usage_pivot)\n", "\n", " print(f\"\\n### {role.upper()} Difficulty by Angle\\n\")\n", " display(role_tables[role].head(8))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "37428cb9", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Combined Table for Modelling\n", "==================================\n", "\n", "Build a single placement-level table used downstream in feature\n", "engineering. The smoothed overall difficulty is exposed under the simple\n", "name `overall_difficulty`, while the raw version is retained as\n", "`overall_difficulty_raw` for reference.\n", "\"\"\"\n", "\n", "# Start with placement info\n", "df_model_features = df_placements[['placement_id', 'x', 'y', 'set_name', 'default_role_id']].copy()\n", "df_model_features = df_model_features.set_index('placement_id')\n", "df_model_features = df_model_features.rename(columns={\n", " 'set_name': 'material',\n", " 'default_role_id': 'default_role'\n", "})\n", "\n", "# Add raw + smoothed overall scores\n", "df_model_features = df_model_features.join(\n", " raw_scores[['raw_difficulty', 'raw_difficulty_smoothed', 'usage_count', 'climbs_count']],\n", " how='left'\n", ")\n", "\n", "# Add angle scores\n", "df_model_features = df_model_features.join(\n", " df_angle_scores[['angle_weighted_difficulty', 'angles_used', 'min_angle', 'max_angle', 'angle_range']],\n", " how='left'\n", ")\n", "\n", "# Add per-role tables\n", "for role in ROLE_TYPES:\n", " if role in role_tables:\n", " df_model_features = df_model_features.join(role_tables[role], how='left')\n", "\n", "# Add aggregate hand / foot scores if missing\n", "extra_role_cols = [c for c in ['diff_as_hand', 'uses_as_hand', 'diff_as_foot', 'uses_as_foot'] if c in df_role_analysis.columns]\n", "missing_extra_cols = [c for c in extra_role_cols if c not in df_model_features.columns]\n", "if missing_extra_cols:\n", " df_model_features = df_model_features.join(df_role_analysis[missing_extra_cols], how='left')\n", "\n", "# Rename for clarity\n", "df_model_features = df_model_features.rename(columns={\n", " 'raw_difficulty': 'overall_difficulty_raw',\n", " 'raw_difficulty_smoothed': 'overall_difficulty'\n", "})\n", "\n", "print(\"### Combined Model Features Table (Before Mirror)\\n\")\n", "display(df_model_features.head(10))\n", "print(f\"\\nShape: {df_model_features.shape}\")\n" ] }, { "cell_type": "markdown", "id": "0f44005e", "metadata": {}, "source": [ "## Taking the Mirror Score into Account" ] }, { "cell_type": "code", "execution_count": null, "id": "adc4f2f2", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Mirror average function\n", "==================================\n", "\n", "Mirror averaging is a simple way to stabilize difficulty estimates under\n", "left-right board symmetry. For each mirror pair:\n", "- if both holds have a value, we average them\n", "- if only one side has a value, we copy it to the missing mirror hold\n", "- metadata and usage counts are left unchanged\n", "\"\"\"\n", "\n", "def average_with_mirror(df, columns, placement_to_mirror):\n", " df_result = df.copy()\n", " processed = set()\n", "\n", " for placement_id in df_result.index:\n", " if placement_id in processed:\n", " continue\n", "\n", " mirror_id = placement_to_mirror.get(placement_id)\n", "\n", " if mirror_id and mirror_id in df_result.index:\n", " for col in columns:\n", " if col not in df_result.columns:\n", " continue\n", "\n", " val1 = df_result.loc[placement_id, col]\n", " val2 = df_result.loc[mirror_id, col]\n", "\n", " if pd.notna(val1) and pd.notna(val2):\n", " avg_val = (val1 + val2) / 2\n", " df_result.loc[placement_id, col] = avg_val\n", " df_result.loc[mirror_id, col] = avg_val\n", " elif pd.isna(val1) and pd.notna(val2):\n", " df_result.loc[placement_id, col] = val2\n", " elif pd.notna(val1) and pd.isna(val2):\n", " df_result.loc[mirror_id, col] = val1\n", "\n", " processed.add(mirror_id)\n", "\n", " processed.add(placement_id)\n", "\n", " return df_result\n" ] }, { "cell_type": "code", "execution_count": null, "id": "4ec95518", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Apply mirror to all difficulty coluns\n", "==================================\n", "\n", "Averages mirror pairs for:\n", "- overall difficulty\n", "- angle-weighted difficulty\n", "- per-role overall averages\n", "- per-role per-angle difficulties\n", "\"\"\"\n", "\n", "overall_cols = [c for c in ['overall_difficulty', 'angle_weighted_difficulty'] if c in df_model_features.columns]\n", "role_avg_cols = [c for c in df_model_features.columns if c.endswith('_overall_avg')]\n", "angle_diff_cols = [c for c in df_model_features.columns if '_diff_' in c and c.endswith('deg')]\n", "\n", "all_difficulty_cols = sorted(set(overall_cols + role_avg_cols + angle_diff_cols))\n", "\n", "missing_before = df_model_features[all_difficulty_cols].isna().sum().sum()\n", "print(f\"Missing values before mirror: {missing_before}\")\n", "print(f\"Columns affected: {len(all_difficulty_cols)}\")\n", "\n", "df_model_features = average_with_mirror(df_model_features, all_difficulty_cols, placement_to_mirror)\n", "\n", "missing_after = df_model_features[all_difficulty_cols].isna().sum().sum()\n", "print(f\"Missing values after mirror: {missing_after}\")\n", "print(f\"Reduced by: {missing_before - missing_after}\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a733b540", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Rebuild detailed role-angle table from model features\n", "==================================\n", "\n", "Rebuild df_role_angle from the mirror-filled placement-level table so\n", "saved exports and later visualizations reflect the final mirrored values.\n", "\"\"\"\n", "\n", "records = []\n", "\n", "angle_cols = [c for c in df_model_features.columns if '_diff_' in c and c.endswith('deg')]\n", "\n", "for col in angle_cols:\n", " parts = col.split('_')\n", " role = parts[0]\n", " angle = int(parts[2].replace('deg', ''))\n", "\n", " for placement_id, val in df_model_features[col].dropna().items():\n", " records.append({\n", " 'placement_id': placement_id,\n", " 'role_type': role,\n", " 'angle': angle,\n", " 'avg_difficulty_smoothed': val,\n", " 'usage_count': 0\n", " })\n", "\n", "df_role_angle = pd.DataFrame(records)\n", "\n", "print(f\"Rebuilt role-angle table: {len(df_role_angle)} records\")\n", "\n", "print(\"\\n### Verify Hand @ 50° Coverage\\n\")\n", "hand_50 = df_role_angle[(df_role_angle['role_type'] == 'hand') & (df_role_angle['angle'] == 50)]\n", "print(f\"Records for Hand @ 50°: {len(hand_50)}\")\n", "\n", "if placement_to_mirror:\n", " sample_pid = list(placement_to_mirror.keys())[0]\n", " sample_mirror = placement_to_mirror[sample_pid]\n", "\n", " print(f\"\\nSample mirror pair {sample_pid} <-> {sample_mirror}:\")\n", " for role in ['hand', 'foot']:\n", " for angle in [40, 50]:\n", " check = df_role_angle[\n", " (df_role_angle['role_type'] == role) &\n", " (df_role_angle['angle'] == angle) &\n", " (df_role_angle['placement_id'].isin([sample_pid, sample_mirror]))\n", " ]\n", " if len(check) == 2:\n", " vals = check['avg_difficulty_smoothed'].values\n", " print(f\" {role} @ {angle}°: {vals[0]:.2f} <-> {vals[1]:.2f}\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ebe4ab42", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Verify mirror symmetry\n", "==================================\n", "\"\"\"\n", "\n", "print(\"### Mirror Pair Verification\\n\")\n", "\n", "sample_pairs = list(placement_to_mirror.items())[:5]\n", "\n", "for pid, mirror_pid in sample_pairs:\n", " if pid not in df_model_features.index or mirror_pid not in df_model_features.index:\n", " continue\n", " \n", " print(f\"Placement {pid} ↔ {mirror_pid}:\")\n", " \n", " for col in ['overall_difficulty', 'hand_overall_avg', 'foot_overall_avg']:\n", " if col in df_model_features.columns:\n", " val1 = df_model_features.loc[pid, col]\n", " val2 = df_model_features.loc[mirror_pid, col]\n", " if pd.notna(val1) and pd.notna(val2):\n", " match = \"okay\" if abs(val1 - val2) < 0.01 else \"x\"\n", " print(f\" {col}: {val1:.2f} ↔ {val2:.2f} {match}\")\n", " print()\n", "\n", "# Count matching pairs\n", "matching_pairs = 0\n", "total_pairs = 0\n", "\n", "for pid, mirror_pid in placement_to_mirror.items():\n", " if pid in df_model_features.index and mirror_pid in df_model_features.index:\n", " total_pairs += 1\n", " val1 = df_model_features.loc[pid, 'overall_difficulty']\n", " val2 = df_model_features.loc[mirror_pid, 'overall_difficulty']\n", " if pd.notna(val1) and pd.notna(val2) and abs(val1 - val2) < 0.01:\n", " matching_pairs += 1\n", "\n", "print(f\"Mirror pairs with matching values: {matching_pairs}/{total_pairs}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "4b384d0a", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Mirror Coverage Summary\n", "==================================\n", "\"\"\"\n", "\n", "print(\"### Difficulty Column Coverage (After Mirror)\\n\")\n", "\n", "scenarios = [\n", " (\"Overall (all usages)\", \"overall_difficulty\"),\n", " (\"Angle-weighted\", \"angle_weighted_difficulty\"),\n", " (\"Hand (all angles)\", \"hand_overall_avg\"),\n", " (\"Foot (all angles)\", \"foot_overall_avg\"),\n", " (\"Hand @ 40°\", \"hand_diff_40deg\"),\n", " (\"Hand @ 50°\", \"hand_diff_50deg\"),\n", " (\"Foot @ 40°\", \"foot_diff_40deg\"),\n", " (\"Start @ 40°\", \"start_diff_40deg\"),\n", " (\"Finish @ 40°\", \"finish_diff_40deg\"),\n", "]\n", "\n", "for name, col in scenarios:\n", " if col in df_model_features.columns:\n", " non_null = df_model_features[col].notna().sum()\n", " total = len(df_model_features)\n", " pct = non_null / total * 100\n", " print(f\"{name:25s}: {non_null:3d}/{total} ({pct:5.1f}%)\")" ] }, { "cell_type": "markdown", "id": "fa443f22", "metadata": {}, "source": [ "# Visualization" ] }, { "cell_type": "code", "execution_count": null, "id": "706305f9", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Visualization: difficulty heatmaps\n", "==================================\n", "\"\"\"\n", "\n", "os.makedirs('../images/03_hold_difficulty', exist_ok=True)\n", "\n", "def plot_difficulty_heatmap(score_column='overall_difficulty', title_suffix=\"\", save=True):\n", " \"\"\"Plot hold difficulty scores on the board.\"\"\"\n", " \n", " fig, ax = plt.subplots(figsize=(16, 14))\n", " ax.imshow(board_img, extent=[x_min, x_max, y_min, y_max], aspect='auto')\n", " \n", " df_plot = df_model_features[df_model_features['x'].notna()].copy()\n", " \n", " if score_column not in df_plot.columns:\n", " print(f\"Column '{score_column}' not found\")\n", " plt.close()\n", " return\n", " \n", " df_plot = df_plot[df_plot[score_column].notna()]\n", " \n", " if df_plot.empty:\n", " print(f\"No data for '{score_column}'\")\n", " plt.close()\n", " return\n", " \n", " max_usage = df_plot['usage_count'].max()\n", " size_scale = 20 + 150 * (df_plot['usage_count'] / max_usage)\n", " \n", " scatter = ax.scatter(\n", " df_plot['x'],\n", " df_plot['y'],\n", " c=df_plot[score_column],\n", " s=size_scale,\n", " cmap='coolwarm',\n", " alpha=0.85,\n", " edgecolors='black',\n", " linewidths=0.5\n", " )\n", " \n", " ax.set_xlabel('X Position (inches)', fontsize=12)\n", " ax.set_ylabel('Y Position (inches)', fontsize=12)\n", " ax.set_title(f'Hold Difficulty: {score_column} {title_suffix}', fontsize=14)\n", " \n", " cbar = plt.colorbar(scatter, ax=ax, shrink=0.5)\n", " cbar.set_label('Difficulty')\n", " \n", " plt.tight_layout()\n", " \n", " if save:\n", " safe_name = score_column.replace('/', '_')\n", " plt.savefig(f'../images/03_hold_difficulty/difficulty_heatmap_{safe_name}.png', dpi=150, bbox_inches='tight')\n", " \n", " plt.show()\n", "\n", "\n", "# Plot main scores\n", "plot_difficulty_heatmap('overall_difficulty', \"(Raw Average)\")\n", "plot_difficulty_heatmap('angle_weighted_difficulty', \"(Angle-Weighted)\")\n", "\n", "# Plot role scores\n", "plot_difficulty_heatmap('hand_overall_avg', \"(Hand)\")\n", "plot_difficulty_heatmap('foot_overall_avg', \"(Foot)\")" ] }, { "cell_type": "code", "execution_count": null, "id": "3eb840ec", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Visualization: per-role per-angle heatmaps\n", "==================================\n", "\"\"\"\n", "\n", "def plot_role_angle_heatmap(role_type='hand', angle=40):\n", " \"\"\"Plot difficulty scores for a specific role and angle.\"\"\"\n", " \n", " df_role = df_role_angle[\n", " (df_role_angle['role_type'] == role_type) & \n", " (df_role_angle['angle'] == angle)\n", " ].copy()\n", " \n", " if df_role.empty:\n", " print(f\"No data for {role_type} at {angle}°\")\n", " return\n", " \n", " df_role['x'] = df_role['placement_id'].map(lambda p: placement_coordinates.get(p, (None, None))[0])\n", " df_role['y'] = df_role['placement_id'].map(lambda p: placement_coordinates.get(p, (None, None))[1])\n", " df_role = df_role.dropna(subset=['x', 'y'])\n", " \n", " fig, ax = plt.subplots(figsize=(16, 14))\n", " ax.imshow(board_img, extent=[x_min, x_max, y_min, y_max], aspect='auto')\n", " \n", " scatter = ax.scatter(\n", " df_role['x'],\n", " df_role['y'],\n", " c=df_role['avg_difficulty_smoothed'],\n", " s=100,\n", " cmap='coolwarm',\n", " alpha=0.85,\n", " edgecolors='black',\n", " linewidths=0.5\n", " )\n", " \n", " ax.set_xlabel('X Position (inches)', fontsize=12)\n", " ax.set_ylabel('Y Position (inches)', fontsize=12)\n", " ax.set_title(f'{role_type.capitalize()} Hold Difficulty at {angle}°', fontsize=14)\n", " \n", " cbar = plt.colorbar(scatter, ax=ax, shrink=0.5)\n", " cbar.set_label('Difficulty')\n", " \n", " plt.tight_layout()\n", " plt.savefig(f'../images/03_hold_difficulty/difficulty_{role_type}_{angle}deg.png', dpi=150, bbox_inches='tight')\n", " plt.show()\n", "\n", "\n", "# Plot for common angles\n", "common_angles = [30, 40, 45, 50]\n", "\n", "for role in ['hand', 'foot']:\n", " for angle in common_angles:\n", " print(f\"\\n{role.capitalize()} at {angle}°:\")\n", " plot_role_angle_heatmap(role, angle)" ] }, { "cell_type": "markdown", "id": "44c53251", "metadata": {}, "source": [ "# Conclusion" ] }, { "cell_type": "code", "execution_count": null, "id": "b4f1431c", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Summary Statistics\n", "==================================\n", "\"\"\"\n", "\n", "# Material comparison\n", "print(\"### Difficulty by Material\\n\")\n", "material_diff = df_model_features.groupby('material').agg(\n", " count=('overall_difficulty', 'count'),\n", " avg_difficulty=('overall_difficulty', 'mean'),\n", " median_difficulty=('overall_difficulty', 'median'),\n", " avg_hand=('hand_overall_avg', 'mean'),\n", " avg_foot=('foot_overall_avg', 'mean'),\n", " avg_usage=('usage_count', 'mean')\n", ").round(2)\n", "\n", "display(material_diff)\n", "\n", "# Default role comparison\n", "print(\"\\n### Difficulty by Default Role\\n\")\n", "role_diff = df_model_features.groupby('default_role').agg(\n", " count=('overall_difficulty', 'count'),\n", " avg_difficulty=('overall_difficulty', 'mean'),\n", " avg_hand=('hand_overall_avg', 'mean'),\n", " avg_foot=('foot_overall_avg', 'mean'),\n", " avg_usage=('usage_count', 'mean')\n", ").round(2)\n", "\n", "display(role_diff)\n", "\n", "# Correlation\n", "if 'hand_overall_avg' in df_model_features.columns and 'foot_overall_avg' in df_model_features.columns:\n", " valid = df_model_features.dropna(subset=['hand_overall_avg', 'foot_overall_avg'])\n", " if len(valid) > 0:\n", " corr = valid['hand_overall_avg'].corr(valid['foot_overall_avg'])\n", " print(f\"\\nCorrelation (hand vs foot difficulty): {corr:.3f}\")\n", " print(f\"(based on {len(valid)} holds used as both)\")" ] }, { "cell_type": "code", "execution_count": null, "id": "8d333751", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "==================================\n", "Save to files\n", "==================================\n", "\"\"\"\n", "\n", "import os\n", "os.makedirs('../data/03_hold_difficulty', exist_ok=True)\n", "\n", "# Main features table\n", "df_model_features.to_csv('../data/03_hold_difficulty/hold_difficulty_scores.csv')\n", "\n", "# Full pivot for modeling\n", "pivot_value_col = 'avg_difficulty_smoothed' if 'avg_difficulty_smoothed' in df_role_angle.columns else 'avg_difficulty'\n", "\n", "pivot_full = df_role_angle.pivot_table(\n", " index='placement_id',\n", " columns=['role_type', 'angle'],\n", " values=pivot_value_col,\n", " aggfunc='mean'\n", ")\n", "pivot_full.columns = [f'diff_{role}_{int(angle)}deg' for role, angle in pivot_full.columns]\n", "pivot_full.to_csv('../data/03_hold_difficulty/hold_role_angle_difficulty_scores.csv')\n", "\n", "# Per-role tables\n", "for role in ROLE_TYPES:\n", " if role in role_tables:\n", " role_tables[role].to_csv(f'../data/03_hold_difficulty/hold_{role}_difficulty_by_angle.csv')\n", "\n", "# Detailed records\n", "df_role_angle.to_csv('../data/03_hold_difficulty/hold_role_angle_detailed.csv', index=False)\n", "\n", "print(\"Saved files to ../data/03_hold_difficulty/:\")\n", "print(\" - hold_difficulty_scores.csv (main table)\")\n", "print(\" - hold_role_angle_difficulty_scores.csv (full pivot)\")\n", "for role in ROLE_TYPES:\n", " if role in role_tables:\n", " print(f\" - hold_{role}_difficulty_by_angle.csv\")\n", "print(\" - hold_role_angle_detailed.csv (detailed records)\")\n" ] }, { "cell_type": "markdown", "id": "443a6779", "metadata": {}, "source": [ "\n", "## Tables produced:\n", "\n", "1. `df_model_features` - Main feature table for downstream modeling\n", " - One row per `placement_id`\n", " - Includes metadata, overall scores, angle-level summaries, and role-specific scores\n", " - `overall_difficulty` is the Bayesian-smoothed overall score\n", " - `overall_difficulty_raw` is retained only as a reference column\n", "\n", "2. `df_role_angle` - Detailed records for visualization / export\n", " - One row per (`placement_id`, `role_type`, `angle`) combination\n", " - Rebuilt after mirror-averaging so plots and exports reflect the final mirrored values\n", "\n", "3. `role_tables[role]` - Per-role tables\n", " - start, middle, finish, hand, foot\n", " - each with per-angle columns, overall averages, and usage counts\n", "\n", "## Mirror logic:\n", "- difficulty-like columns are mirrored across symmetric hold pairs\n", "- if both mirror holds have values, their scores are averaged\n", "- if only one side has a value, that value is copied to the missing mirror side\n", "- usage counts and metadata are left unchanged" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.14.3" } }, "nbformat": 4, "nbformat_minor": 5 }