Kilter-Board-Analysis/notebooks/03_hold_difficulty.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f301146a",
   "metadata": {},
   "source": [
    "# Kilter Board: Hold Difficulty Analysis\n",
    "\n",
    "We continue on with our hold analysis, except we will solely be interested in computing the difficulty of each hold.\n",
    "\n",
    "Recall some of the following findings.\n",
    "\n",
    "- The Kilter Board Original has `layout_id` 1, and has two sets: bolt holes and screw holes. These have `set_id` 1 and 20 respectively. \n",
    "- the `frame` feature of a climb determines the climb: it looks something like `p3r4p29r2p59r1p65r2p75r3p89r2p157r4p158r4`. A substring `pXrY` tells us the placement (`placement_id=X`) and the role (whether it is a start, finish, foot, or middle hold) comes from the `placement_role_id=Y`. The role will also tell us which color to use if we plot our climb against the board.\n",
    "- the `holes` table will tell us which `placement_id` goes where on the (x,y) coordinate system. It also tells us the ID of its mirror image, which let's us unravel the `placement_id` of its mirror image.\n",
    "\n",
    "## Output\n",
    "\n",
    "The final products are hold-level difficulty scores saved to CSV files. These scores encode, for each placement, the average difficulty of climbs that use that hold. The scores are computed per-angle, per-role, and also aggregated. A Bayesian smoothing step shrinks noisy estimates for rarely-used holds toward the global mean..\n",
    "\n",
    "## Notebook Structure\n",
    "\n",
    "1. [Setup and Imports](#setup-and-imports)\n",
    "2. [Hold Usage DataFrame](#hold-usage-dataframe)\n",
    "3. [Difficulty Score](#difficulty-score)\n",
    "4. [Visualization](#visualization)\n",
    "5. [Conclusion](#conclusion)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e17c7da",
   "metadata": {},
   "source": [
    "# Setup and Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2cd8a53a",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Setup and Imports\n",
    "==================================\n",
    "\"\"\"\n",
    "\n",
    "\n",
    "# Imports\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import numpy as np\n",
    "import matplotlib.patches as mpatches\n",
    "\n",
    "import sqlite3\n",
    "\n",
    "import os\n",
    "\n",
    "import re\n",
    "from collections import defaultdict\n",
    "\n",
    "from PIL import Image\n",
    "\n",
    "# Set some display options\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('display.max_rows', 100)\n",
    "\n",
    "# Set style\n",
    "palette=['steelblue', 'coral', 'seagreen']  #(for multi-bar graphs)\n",
    "\n",
    "# Set board image for some visual analysis\n",
    "board_img = Image.open('../images/kilter-original-16x12_compose.png')\n",
    "\n",
    "# Connect to the database\n",
    "DB_PATH=\"../data/kilter.db\"\n",
    "conn = sqlite3.connect(DB_PATH)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9da4ef8",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Query our data from the DB\n",
    "==================================\n",
    "\n",
    "This time we restrict to where `layout_id=10` for the TB2 Mirror.\n",
    "\"\"\"\n",
    "\n",
    "# Query climbs data\n",
    "climbs_query = \"\"\"\n",
    "SELECT\n",
    "    c.uuid,\n",
    "    c.name AS climb_name,\n",
    "    c.setter_username,\n",
    "    c.layout_id AS layout_id,\n",
    "    c.description,\n",
    "    c.is_nomatch,\n",
    "    c.is_listed,\n",
    "    l.name AS layout_name,\n",
    "    p.name AS board_name,\n",
    "    c.frames,\n",
    "    cs.angle,\n",
    "    cs.display_difficulty,\n",
    "    dg.boulder_name AS boulder_grade,\n",
    "    cs.ascensionist_count,\n",
    "    cs.quality_average,\n",
    "    cs.fa_at\n",
    "FROM climbs c\n",
    "JOIN layouts l ON c.layout_id = l.id\n",
    "JOIN products p ON l.product_id = p.id\n",
    "JOIN climb_stats cs ON c.uuid = cs.climb_uuid\n",
    "JOIN difficulty_grades dg ON ROUND(cs.display_difficulty) = dg.difficulty\n",
    "WHERE cs.display_difficulty IS NOT NULL AND c.is_listed=1 AND c.layout_id=1 AND cs.fa_at > '2016-01-01'\n",
    "\"\"\"\n",
    "\n",
    "# Query information about placements (and their mirrors)\n",
    "placements_query = \"\"\"\n",
    "SELECT\n",
    "    p.id AS placement_id,\n",
    "    h.x,\n",
    "    h.y,\n",
    "    p.default_placement_role_id AS default_role_id,\n",
    "    p.set_id AS set_id,\n",
    "    s.name AS set_name\n",
    "FROM placements p\n",
    "JOIN holes h ON p.hole_id = h.id\n",
    "JOIN sets s ON p.set_id = s.id\n",
    "WHERE p.layout_id = 1 AND y <=156\n",
    "\"\"\"\n",
    "\n",
    "# Load it into a DataFrame\n",
    "df_climbs = pd.read_sql_query(climbs_query, conn)\n",
    "df_placements = pd.read_sql_query(placements_query, conn)\n",
    "\n",
    "# Save placements csv in data (for other things later on)\n",
    "df_placements.to_csv('../data/placements.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "336687a9",
   "metadata": {},
   "source": [
    "We've added a column for the mirror of a hold. Let's take a look at `df_placements`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2f74d89",
   "metadata": {},
   "outputs": [],
   "source": [
    "display(df_placements)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a4a5612",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Role definitions\n",
    "ROLE_DEFINITIONS = {\n",
    "    'start': 12,\n",
    "    'middle': 13,\n",
    "    'finish': 14,\n",
    "    'foot': 15\n",
    "}\n",
    "\n",
    "HAND_ROLES = ['start', 'middle', 'finish']\n",
    "FOOT_ROLES = ['foot']\n",
    "ROLE_TYPES = ['start', 'middle', 'finish', 'hand', 'foot']\n",
    "\n",
    "MATERIAL_PALETTE = {'Wood': '#8B4513', 'Plastic': '#4169E1'}\n",
    "\n",
    "def get_role_type(role_id):\n",
    "    \"\"\"Map role_id to role_type string.\"\"\"\n",
    "    for role_type, rid in ROLE_DEFINITIONS.items():\n",
    "        if role_id == rid:\n",
    "            return role_type\n",
    "    return 'unknown'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b395dd64",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Placement Data\n",
    "# Build placement_coordinates dict\n",
    "placement_coordinates = {\n",
    "    row['placement_id']: (row['x'], row['y'])\n",
    "    for _, row in df_placements.iterrows()\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3fee6f6b",
   "metadata": {},
   "outputs": [],
   "source": [
    "get_role_type(15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51e0bd84",
   "metadata": {},
   "outputs": [],
   "source": [
    "## Boundary conditions\n",
    "x_min, x_max = -24, 168\n",
    "y_min, y_max = 0, 156"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b8d9abd",
   "metadata": {},
   "source": [
    "# Hold Usage DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85f7ac83",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Hold Usage DataFrame\n",
    "==================================\n",
    "\n",
    "Explodes climb frames into individual hold usages.\n",
    "\"\"\"\n",
    "\n",
    "records = []\n",
    "\n",
    "for _, row in df_climbs.iterrows():\n",
    "    frames = row['frames']\n",
    "    if not isinstance(frames, str):\n",
    "        continue\n",
    "    \n",
    "    matches = re.findall(r'p(\\d+)r(\\d+)', frames)\n",
    "    \n",
    "    for p_str, r_str in matches:\n",
    "        role_type = get_role_type(int(r_str))\n",
    "        records.append({\n",
    "            'placement_id': int(p_str),\n",
    "            'role_id': int(r_str),\n",
    "            'role_type': role_type,\n",
    "            'is_hand': role_type in HAND_ROLES,\n",
    "            'is_foot': role_type in FOOT_ROLES,\n",
    "            'difficulty': row['display_difficulty'],\n",
    "            'angle': row['angle'],\n",
    "            'climb_uuid': row['uuid']\n",
    "        })\n",
    "\n",
    "df_hold_usage = pd.DataFrame(records)\n",
    "\n",
    "print(f\"Built hold usage DataFrame: {len(df_hold_usage):,} records\")\n",
    "print(f\"Unique placements: {df_hold_usage['placement_id'].nunique():,}\")\n",
    "print(f\"Unique angles: {sorted(df_hold_usage['angle'].unique())}\")\n",
    "\n",
    "print(\"\\nRecords by role type:\")\n",
    "display(df_hold_usage['role_type'].value_counts().to_frame('count'))\n",
    "\n",
    "print(f\"\\nHand usages: {df_hold_usage['is_hand'].sum():,}\")\n",
    "print(f\"Foot usages: {df_hold_usage['is_foot'].sum():,}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38df6453",
   "metadata": {},
   "source": [
    "# Difficulty Score"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "107b223f",
   "metadata": {},
   "source": [
    "## Bayesian Smoothing of Hold Difficulty\n",
    "\n",
    "Raw hold difficulty estimates can be unstable for rarely used holds. To reduce\n",
    "noise, we apply Bayesian smoothing, shrinking hold-level averages toward the\n",
    "global mean difficulty. Frequently used holds remain close to their empirical\n",
    "means, while sparse holds are pulled more strongly toward the overall average.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9a4e3c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Bayesian Smoothing\n",
    "==================================\n",
    "\"\"\"\n",
    "\n",
    "SMOOTHING_M = 20\n",
    "\n",
    "def bayesian_smooth(mean_col, count_col, global_mean, m=SMOOTHING_M):\n",
    "    \"\"\"\n",
    "    Bayesian smoothing toward the global mean.\n",
    "    \"\"\"\n",
    "    return (count_col * mean_col + m * global_mean) / (count_col + m)\n",
    "\n",
    "GLOBAL_DIFFICULTY_MEAN = df_hold_usage['difficulty'].mean()\n",
    "print(f\"Global difficulty mean: {GLOBAL_DIFFICULTY_MEAN:.3f}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d54c005d",
   "metadata": {},
   "source": [
    "## Raw Difficulty Score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7547d6dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "\"\"\"\n",
    "==================================\n",
    "Raw difficulty score (averged & smoothed)\n",
    "==================================\n",
    "\n",
    "\n",
    "Average difficulty of all climbs that use this hold, plus a Bayesian-smoothed\n",
    "version that is more stable for low-usage holds.\n",
    "\"\"\"\n",
    "\n",
    "raw_scores = df_hold_usage.groupby('placement_id').agg(\n",
    "    raw_difficulty=('difficulty', 'mean'),\n",
    "    usage_count=('climb_uuid', 'count'),\n",
    "    climbs_count=('climb_uuid', 'nunique')\n",
    ")\n",
    "\n",
    "raw_scores['raw_difficulty_smoothed'] = bayesian_smooth(\n",
    "    raw_scores['raw_difficulty'],\n",
    "    raw_scores['usage_count'],\n",
    "    GLOBAL_DIFFICULTY_MEAN\n",
    ")\n",
    "\n",
    "raw_scores = raw_scores.round(2)\n",
    "\n",
    "print(\"### Top 10 Hardest Holds (Raw)\\n\")\n",
    "display(raw_scores.sort_values('raw_difficulty', ascending=False).head(10))\n",
    "\n",
    "print(\"\\n### Top 10 Easiest Holds (Raw)\\n\")\n",
    "display(raw_scores.sort_values('raw_difficulty', ascending=True).head(10))\n",
    "\n",
    "print(\"\\n### Example of Raw vs Smoothed Difficulty\\n\")\n",
    "display(raw_scores[['raw_difficulty', 'raw_difficulty_smoothed', 'usage_count']].head(10))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df819708",
   "metadata": {},
   "source": [
    "## Per-Angle Difficulty Score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13a2d53f",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Per-Angle Difficulty Score\n",
    "==================================\n",
    "\n",
    "Computes difficulty score per angle, then aggregates with weighting.\n",
    "Uses Bayesian-smoothed per-angle difficulty throughout.\n",
    "\"\"\"\n",
    "\n",
    "# Calculate per-angle scores\n",
    "angle_scores = df_hold_usage.groupby(['placement_id', 'angle']).agg(\n",
    "    avg_difficulty=('difficulty', 'mean'),\n",
    "    usage_count=('climb_uuid', 'count')\n",
    ").reset_index()\n",
    "\n",
    "# Apply Bayesian smoothing\n",
    "angle_scores['avg_difficulty_smoothed'] = bayesian_smooth(\n",
    "    angle_scores['avg_difficulty'],\n",
    "    angle_scores['usage_count'],\n",
    "    GLOBAL_DIFFICULTY_MEAN\n",
    ")\n",
    "\n",
    "# Pivot to see angles side-by-side\n",
    "angle_pivot = angle_scores.pivot_table(\n",
    "    index='placement_id',\n",
    "    columns='angle',\n",
    "    values='avg_difficulty_smoothed',\n",
    "    aggfunc='mean'\n",
    ")\n",
    "angle_pivot.columns = [f'diff_{int(col)}deg' for col in angle_pivot.columns]\n",
    "\n",
    "# Calculate weighted average using the smoothed per-angle values\n",
    "weighted_scores = []\n",
    "\n",
    "for pid in angle_scores['placement_id'].unique():\n",
    "    df_pid = angle_scores[angle_scores['placement_id'] == pid].copy()\n",
    "\n",
    "    total_count = df_pid['usage_count'].sum()\n",
    "    weighted_diff = (\n",
    "        df_pid['avg_difficulty_smoothed'] * df_pid['usage_count']\n",
    "    ).sum() / total_count\n",
    "\n",
    "    weighted_scores.append({\n",
    "        'placement_id': pid,\n",
    "        'angle_weighted_difficulty': weighted_diff,\n",
    "        'angles_used': len(df_pid),\n",
    "        'min_angle': int(df_pid['angle'].min()),\n",
    "        'max_angle': int(df_pid['angle'].max()),\n",
    "        'angle_range': int(df_pid['angle'].max() - df_pid['angle'].min())\n",
    "    })\n",
    "\n",
    "df_angle_scores = pd.DataFrame(weighted_scores).set_index('placement_id')\n",
    "\n",
    "print(\"### Per-Angle Difficulty Analysis (Sample)\\n\")\n",
    "display(angle_pivot.join(df_angle_scores).head(15))\n",
    "\n",
    "print(f\"\\nAngles used per hold:\")\n",
    "print(df_angle_scores['angles_used'].describe())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2164c4fe",
   "metadata": {},
   "source": [
    "## Per-Role Difficulty Score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6c9dd60",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Per-Role Difficulty Score\n",
    "==================================\n",
    "\n",
    "Individual roles (start, middle, finish, foot) AND aggregate (hand).\n",
    "All exported difficulty values are Bayesian-smoothed.\n",
    "\"\"\"\n",
    "\n",
    "# Individual role scores\n",
    "role_scores = df_hold_usage.groupby(['placement_id', 'role_type']).agg(\n",
    "    avg_difficulty=('difficulty', 'mean'),\n",
    "    usage_count=('climb_uuid', 'count')\n",
    ").reset_index()\n",
    "\n",
    "# Apply Bayesian smoothing\n",
    "role_scores['avg_difficulty_smoothed'] = bayesian_smooth(\n",
    "    role_scores['avg_difficulty'],\n",
    "    role_scores['usage_count'],\n",
    "    GLOBAL_DIFFICULTY_MEAN\n",
    ")\n",
    "\n",
    "# Pivot for individual roles\n",
    "role_pivot = role_scores.pivot_table(\n",
    "    index='placement_id',\n",
    "    columns='role_type',\n",
    "    values='avg_difficulty_smoothed',\n",
    "    aggfunc='mean'\n",
    ")\n",
    "role_pivot.columns = [f'diff_as_{col}' for col in role_pivot.columns]\n",
    "\n",
    "# Usage counts per individual role\n",
    "role_counts = role_scores.pivot_table(\n",
    "    index='placement_id',\n",
    "    columns='role_type',\n",
    "    values='usage_count',\n",
    "    aggfunc='sum',\n",
    "    fill_value=0\n",
    ")\n",
    "role_counts.columns = [f'uses_as_{col}' for col in role_counts.columns]\n",
    "\n",
    "# Aggregate hand difficulty\n",
    "hand_usage = df_hold_usage[df_hold_usage['is_hand']].groupby('placement_id').agg(\n",
    "    diff_as_hand_raw=('difficulty', 'mean'),\n",
    "    uses_as_hand=('climb_uuid', 'count')\n",
    ")\n",
    "\n",
    "hand_usage['diff_as_hand'] = bayesian_smooth(\n",
    "    hand_usage['diff_as_hand_raw'],\n",
    "    hand_usage['uses_as_hand'],\n",
    "    GLOBAL_DIFFICULTY_MEAN\n",
    ")\n",
    "\n",
    "hand_usage = hand_usage[['diff_as_hand', 'uses_as_hand']]\n",
    "\n",
    "# Combine role tables\n",
    "df_role_analysis = role_pivot.join(role_counts).join(hand_usage).round(2)\n",
    "\n",
    "cols_order = [\n",
    "    'diff_as_start', 'uses_as_start',\n",
    "    'diff_as_middle', 'uses_as_middle',\n",
    "    'diff_as_finish', 'uses_as_finish',\n",
    "    'diff_as_hand', 'uses_as_hand',\n",
    "    'diff_as_foot', 'uses_as_foot'\n",
    "]\n",
    "cols_order = [c for c in cols_order if c in df_role_analysis.columns]\n",
    "df_role_analysis = df_role_analysis[cols_order]\n",
    "\n",
    "print(\"### Role-Specific Difficulty Scores (Sample)\\n\")\n",
    "display(df_role_analysis.head(15))\n",
    "\n",
    "print(\"\\n### Holds Used as Both Hand and Foot\\n\")\n",
    "dual_use = df_role_analysis[\n",
    "    df_role_analysis['diff_as_hand'].notna() &\n",
    "    df_role_analysis['diff_as_foot'].notna()\n",
    "].copy()\n",
    "\n",
    "if len(dual_use) > 0:\n",
    "    dual_use['hand_minus_foot'] = dual_use['diff_as_hand'] - dual_use['diff_as_foot']\n",
    "    display(\n",
    "        dual_use[['diff_as_hand', 'diff_as_foot', 'hand_minus_foot']]\n",
    "        .sort_values('hand_minus_foot', ascending=False)\n",
    "        .head(15)\n",
    "    )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f0635f6",
   "metadata": {},
   "source": [
    "## Per-Role Per-Angle Difficulty Score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ff53ab4",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Per-Role Per-Angle Difficulty Score\n",
    "==================================\n",
    "\n",
    "\n",
    "Granular scores: placement_id × role_type × angle\n",
    "Includes both individual roles AND aggregate hand.\n",
    "All downstream tables use the smoothed difficulty values.\n",
    "\"\"\"\n",
    "\n",
    "# Individual roles per angle\n",
    "role_angle_scores = df_hold_usage.groupby(['placement_id', 'role_type', 'angle']).agg(\n",
    "    avg_difficulty=('difficulty', 'mean'),\n",
    "    usage_count=('climb_uuid', 'count')\n",
    ").reset_index()\n",
    "\n",
    "role_angle_scores['avg_difficulty_smoothed'] = bayesian_smooth(\n",
    "    role_angle_scores['avg_difficulty'],\n",
    "    role_angle_scores['usage_count'],\n",
    "    GLOBAL_DIFFICULTY_MEAN\n",
    ")\n",
    "\n",
    "# Aggregate hand per angle\n",
    "hand_angle_scores = df_hold_usage[df_hold_usage['is_hand']].groupby(['placement_id', 'angle']).agg(\n",
    "    avg_difficulty=('difficulty', 'mean'),\n",
    "    usage_count=('climb_uuid', 'count')\n",
    ").reset_index()\n",
    "\n",
    "hand_angle_scores['avg_difficulty_smoothed'] = bayesian_smooth(\n",
    "    hand_angle_scores['avg_difficulty'],\n",
    "    hand_angle_scores['usage_count'],\n",
    "    GLOBAL_DIFFICULTY_MEAN\n",
    ")\n",
    "hand_angle_scores['role_type'] = 'hand'\n",
    "\n",
    "# Combine all\n",
    "df_role_angle = pd.concat([role_angle_scores, hand_angle_scores], ignore_index=True)\n",
    "\n",
    "print(f\"Total role-angle records: {len(df_role_angle):,}\")\n",
    "print(\"\\nBreakdown by role_type:\")\n",
    "display(df_role_angle.groupby('role_type').size().to_frame('count'))\n",
    "\n",
    "print(\"\\n### Per-Role Per-Angle Difficulty Scores (Sample)\\n\")\n",
    "display(df_role_angle.head(20))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75ed3028",
   "metadata": {},
   "source": [
    "## Creating Tables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5b324cd0",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Role-Specific Tables\n",
    "==================================\n",
    "\n",
    "Tables for: start, middle, finish, hand, foot\n",
    "Each with per-angle columns and overall average.\n",
    "Uses Bayesian-smoothed role-angle difficulty values.\n",
    "\"\"\"\n",
    "\n",
    "angles = sorted(df_hold_usage['angle'].unique())\n",
    "role_tables = {}\n",
    "\n",
    "for role in ROLE_TYPES:\n",
    "    df_role = df_role_angle[df_role_angle['role_type'] == role].copy()\n",
    "\n",
    "    if df_role.empty:\n",
    "        print(f\"No data for role: {role}\")\n",
    "        continue\n",
    "\n",
    "    pivot = df_role.pivot_table(\n",
    "        index='placement_id',\n",
    "        columns='angle',\n",
    "        values='avg_difficulty_smoothed',\n",
    "        aggfunc='mean'\n",
    "    )\n",
    "    pivot.columns = [f'{role}_diff_{int(col)}deg' for col in pivot.columns]\n",
    "    pivot[f'{role}_overall_avg'] = pivot.mean(axis=1).round(2)\n",
    "\n",
    "    usage_pivot = df_role.pivot_table(\n",
    "        index='placement_id',\n",
    "        columns='angle',\n",
    "        values='usage_count',\n",
    "        aggfunc='sum',\n",
    "        fill_value=0\n",
    "    )\n",
    "    usage_pivot.columns = [f'{role}_uses_{int(col)}deg' for col in usage_pivot.columns]\n",
    "    pivot[f'{role}_total_uses'] = usage_pivot.sum(axis=1).astype(int)\n",
    "\n",
    "    role_tables[role] = pivot.join(usage_pivot)\n",
    "\n",
    "    print(f\"\\n### {role.upper()} Difficulty by Angle\\n\")\n",
    "    display(role_tables[role].head(8))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37428cb9",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Combined Table for Modelling\n",
    "==================================\n",
    "\n",
    "Build a single placement-level table used downstream in feature\n",
    "engineering. The smoothed overall difficulty is exposed under the simple\n",
    "name `overall_difficulty`, while the raw version is retained as\n",
    "`overall_difficulty_raw` for reference.\n",
    "\"\"\"\n",
    "\n",
    "# Start with placement info\n",
    "df_model_features = df_placements[['placement_id', 'x', 'y', 'set_name', 'default_role_id']].copy()\n",
    "df_model_features = df_model_features.set_index('placement_id')\n",
    "df_model_features = df_model_features.rename(columns={\n",
    "    'set_name': 'material',\n",
    "    'default_role_id': 'default_role'\n",
    "})\n",
    "\n",
    "# Add raw + smoothed overall scores\n",
    "df_model_features = df_model_features.join(\n",
    "    raw_scores[['raw_difficulty', 'raw_difficulty_smoothed', 'usage_count', 'climbs_count']],\n",
    "    how='left'\n",
    ")\n",
    "\n",
    "# Add angle scores\n",
    "df_model_features = df_model_features.join(\n",
    "    df_angle_scores[['angle_weighted_difficulty', 'angles_used', 'min_angle', 'max_angle', 'angle_range']],\n",
    "    how='left'\n",
    ")\n",
    "\n",
    "# Add per-role tables\n",
    "for role in ROLE_TYPES:\n",
    "    if role in role_tables:\n",
    "        df_model_features = df_model_features.join(role_tables[role], how='left')\n",
    "\n",
    "# Add aggregate hand / foot scores if missing\n",
    "extra_role_cols = [c for c in ['diff_as_hand', 'uses_as_hand', 'diff_as_foot', 'uses_as_foot'] if c in df_role_analysis.columns]\n",
    "missing_extra_cols = [c for c in extra_role_cols if c not in df_model_features.columns]\n",
    "if missing_extra_cols:\n",
    "    df_model_features = df_model_features.join(df_role_analysis[missing_extra_cols], how='left')\n",
    "\n",
    "# Rename for clarity\n",
    "df_model_features = df_model_features.rename(columns={\n",
    "    'raw_difficulty': 'overall_difficulty_raw',\n",
    "    'raw_difficulty_smoothed': 'overall_difficulty'\n",
    "})\n",
    "\n",
    "print(\"### Combined Model Features Table (Before Mirror)\\n\")\n",
    "display(df_model_features.head(10))\n",
    "print(f\"\\nShape: {df_model_features.shape}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa443f22",
   "metadata": {},
   "source": [
    "# Visualization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "706305f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Visualization: difficulty heatmaps\n",
    "==================================\n",
    "\"\"\"\n",
    "\n",
    "os.makedirs('../images/03_hold_difficulty', exist_ok=True)\n",
    "\n",
    "def plot_difficulty_heatmap(score_column='overall_difficulty', title_suffix=\"\", save=True):\n",
    "    \"\"\"Plot hold difficulty scores on the board.\"\"\"\n",
    "    \n",
    "    fig, ax = plt.subplots(figsize=(17, 12))\n",
    "    ax.imshow(board_img, extent=[x_min, x_max, y_min, y_max], aspect='auto')\n",
    "    \n",
    "    df_plot = df_model_features[df_model_features['x'].notna()].copy()\n",
    "    \n",
    "    if score_column not in df_plot.columns:\n",
    "        print(f\"Column '{score_column}' not found\")\n",
    "        plt.close()\n",
    "        return\n",
    "    \n",
    "    df_plot = df_plot[df_plot[score_column].notna()]\n",
    "    \n",
    "    if df_plot.empty:\n",
    "        print(f\"No data for '{score_column}'\")\n",
    "        plt.close()\n",
    "        return\n",
    "    \n",
    "    max_usage = df_plot['usage_count'].max()\n",
    "    size_scale = 20 + 150 * (df_plot['usage_count'] / max_usage)\n",
    "    \n",
    "    scatter = ax.scatter(\n",
    "        df_plot['x'],\n",
    "        df_plot['y'],\n",
    "        c=df_plot[score_column],\n",
    "        s=size_scale,\n",
    "        cmap='seismic',\n",
    "        alpha=0.85,\n",
    "        edgecolors='black',\n",
    "        linewidths=0.5\n",
    "    )\n",
    "    \n",
    "    ax.set_xlabel('X Position (inches)', fontsize=12)\n",
    "    ax.set_ylabel('Y Position (inches)', fontsize=12)\n",
    "    ax.set_title(f'Hold Difficulty: {score_column} {title_suffix}', fontsize=14)\n",
    "    \n",
    "    cbar = plt.colorbar(scatter, ax=ax, shrink=0.5)\n",
    "    cbar.set_label('Difficulty')\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    \n",
    "    if save:\n",
    "        safe_name = score_column.replace('/', '_')\n",
    "        plt.savefig(f'../images/03_hold_difficulty/difficulty_heatmap_{safe_name}.png', dpi=150, bbox_inches='tight')\n",
    "    \n",
    "    plt.show()\n",
    "\n",
    "\n",
    "# Plot main scores\n",
    "plot_difficulty_heatmap('overall_difficulty', \"(Raw Average)\")\n",
    "plot_difficulty_heatmap('angle_weighted_difficulty', \"(Angle-Weighted)\")\n",
    "\n",
    "# Plot role scores\n",
    "plot_difficulty_heatmap('hand_overall_avg', \"(Hand)\")\n",
    "plot_difficulty_heatmap('foot_overall_avg', \"(Foot)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3eb840ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Visualization: per-role per-angle heatmaps\n",
    "==================================\n",
    "\"\"\"\n",
    "\n",
    "def plot_role_angle_heatmap(role_type='hand', angle=40):\n",
    "    \"\"\"Plot difficulty scores for a specific role and angle.\"\"\"\n",
    "        \n",
    "    df_role = df_role_angle[\n",
    "        (df_role_angle['role_type'] == role_type) & \n",
    "        (df_role_angle['angle'] == angle)\n",
    "    ].copy()\n",
    "    \n",
    "    if df_role.empty:\n",
    "        print(f\"No data for {role_type} at {angle}°\")\n",
    "        return\n",
    "    \n",
    "    df_role['x'] = df_role['placement_id'].map(lambda p: placement_coordinates.get(p, (None, None))[0])\n",
    "    df_role['y'] = df_role['placement_id'].map(lambda p: placement_coordinates.get(p, (None, None))[1])\n",
    "    df_role = df_role.dropna(subset=['x', 'y'])\n",
    "    \n",
    "    fig, ax = plt.subplots(figsize=(17, 12))\n",
    "    ax.imshow(board_img, extent=[x_min, x_max, y_min, y_max], aspect='auto')\n",
    "    \n",
    "    scatter = ax.scatter(\n",
    "        df_role['x'],\n",
    "        df_role['y'],\n",
    "        c=df_role['avg_difficulty_smoothed'],\n",
    "        s=100,\n",
    "        cmap='seismic',\n",
    "        alpha=0.85,\n",
    "        edgecolors='black',\n",
    "        linewidths=0.5\n",
    "    )\n",
    "    \n",
    "    ax.set_xlabel('X Position (inches)', fontsize=12)\n",
    "    ax.set_ylabel('Y Position (inches)', fontsize=12)\n",
    "    ax.set_title(f'{role_type.capitalize()} Hold Difficulty at {angle}°', fontsize=14)\n",
    "    \n",
    "    cbar = plt.colorbar(scatter, ax=ax, shrink=0.5)\n",
    "    cbar.set_label('Difficulty')\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    plt.savefig(f'../images/03_hold_difficulty/difficulty_{role_type}_{angle}deg.png', dpi=150, bbox_inches='tight')\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "# Plot for common angles\n",
    "common_angles = [30, 40, 45, 50]\n",
    "\n",
    "for role in ['hand', 'foot']:\n",
    "    for angle in common_angles:\n",
    "        print(f\"\\n{role.capitalize()} at {angle}°:\")\n",
    "        plot_role_angle_heatmap(role, angle)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44c53251",
   "metadata": {},
   "source": [
    "# Conclusion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4f1431c",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Summary Statistics\n",
    "==================================\n",
    "\"\"\"\n",
    "\n",
    "# Material comparison\n",
    "print(\"### Difficulty by Material\\n\")\n",
    "material_diff = df_model_features.groupby('material').agg(\n",
    "    count=('overall_difficulty', 'count'),\n",
    "    avg_difficulty=('overall_difficulty', 'mean'),\n",
    "    median_difficulty=('overall_difficulty', 'median'),\n",
    "    avg_hand=('hand_overall_avg', 'mean'),\n",
    "    avg_foot=('foot_overall_avg', 'mean'),\n",
    "    avg_usage=('usage_count', 'mean')\n",
    ").round(2)\n",
    "\n",
    "display(material_diff)\n",
    "\n",
    "# Default role comparison\n",
    "print(\"\\n### Difficulty by Default Role\\n\")\n",
    "role_diff = df_model_features.groupby('default_role').agg(\n",
    "    count=('overall_difficulty', 'count'),\n",
    "    avg_difficulty=('overall_difficulty', 'mean'),\n",
    "    avg_hand=('hand_overall_avg', 'mean'),\n",
    "    avg_foot=('foot_overall_avg', 'mean'),\n",
    "    avg_usage=('usage_count', 'mean')\n",
    ").round(2)\n",
    "\n",
    "display(role_diff)\n",
    "\n",
    "# Correlation\n",
    "if 'hand_overall_avg' in df_model_features.columns and 'foot_overall_avg' in df_model_features.columns:\n",
    "    valid = df_model_features.dropna(subset=['hand_overall_avg', 'foot_overall_avg'])\n",
    "    if len(valid) > 0:\n",
    "        corr = valid['hand_overall_avg'].corr(valid['foot_overall_avg'])\n",
    "        print(f\"\\nCorrelation (hand vs foot difficulty): {corr:.3f}\")\n",
    "        print(f\"(based on {len(valid)} holds used as both)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d333751",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "==================================\n",
    "Save to  files\n",
    "==================================\n",
    "\"\"\"\n",
    "\n",
    "import os\n",
    "os.makedirs('../data/03_hold_difficulty', exist_ok=True)\n",
    "\n",
    "# Main features table\n",
    "df_model_features.to_csv('../data/03_hold_difficulty/hold_difficulty_scores.csv')\n",
    "\n",
    "# Full pivot for modeling\n",
    "pivot_value_col = 'avg_difficulty_smoothed' if 'avg_difficulty_smoothed' in df_role_angle.columns else 'avg_difficulty'\n",
    "\n",
    "pivot_full = df_role_angle.pivot_table(\n",
    "    index='placement_id',\n",
    "    columns=['role_type', 'angle'],\n",
    "    values=pivot_value_col,\n",
    "    aggfunc='mean'\n",
    ")\n",
    "pivot_full.columns = [f'diff_{role}_{int(angle)}deg' for role, angle in pivot_full.columns]\n",
    "pivot_full.to_csv('../data/03_hold_difficulty/hold_role_angle_difficulty_scores.csv')\n",
    "\n",
    "# Per-role tables\n",
    "for role in ROLE_TYPES:\n",
    "    if role in role_tables:\n",
    "        role_tables[role].to_csv(f'../data/03_hold_difficulty/hold_{role}_difficulty_by_angle.csv')\n",
    "\n",
    "# Detailed records\n",
    "df_role_angle.to_csv('../data/03_hold_difficulty/hold_role_angle_detailed.csv', index=False)\n",
    "\n",
    "print(\"Saved files to ../data/03_hold_difficulty/:\")\n",
    "print(\"  - hold_difficulty_scores.csv (main table)\")\n",
    "print(\"  - hold_role_angle_difficulty_scores.csv (full pivot)\")\n",
    "for role in ROLE_TYPES:\n",
    "    if role in role_tables:\n",
    "        print(f\"  - hold_{role}_difficulty_by_angle.csv\")\n",
    "print(\"  - hold_role_angle_detailed.csv (detailed records)\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "443a6779",
   "metadata": {},
   "source": [
    "\n",
    "## Tables produced:\n",
    "\n",
    "1. `df_model_features` - Main feature table for downstream modeling\n",
    "   - One row per `placement_id`\n",
    "   - Includes metadata, overall scores, angle-level summaries, and role-specific scores\n",
    "   - `overall_difficulty` is the Bayesian-smoothed overall score\n",
    "   - `overall_difficulty_raw` is retained only as a reference column\n",
    "\n",
    "2. `df_role_angle` - Detailed records for visualization / export\n",
    "   - One row per (`placement_id`, `role_type`, `angle`) combination\n",
    "   - Rebuilt after mirror-averaging so plots and exports reflect the final mirrored values\n",
    "\n",
    "3. `role_tables[role]` - Per-role tables\n",
    "   - start, middle, finish, hand, foot\n",
    "   - each with per-angle columns, overall averages, and usage counts"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.14.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}