fixed leakage

2026-03-28 16:03:04 -04:00
parent 880272aaf5
commit 3ab9b77bb7
36 changed files with 2296 additions and 681 deletions
@@ -1,6 +1,6 @@
 # Kilter Board: Predicting Climbing Route Difficulty from Board Data

-I recently got into *board climbing*, and I enjoy climbing on the TB2 and the Kilter Board. I've been climbing on the 12ftx12ft (mirrored) that is available at my local gym, and I've never felt that the phrase "*it hurts so good*" would be so apt. As such, I've done an in depth analysis of TB2 data <a href="https://gitlab.com/psark/Tension-Board-2-Analysis">here</a>, and have decided to mimic that analysis with available Kilter Board data.
+I recently got into *board climbing*, and I enjoy climbing on the TB2 and the Kilter Board. I've been climbing on 12x12ft boards that are available at my local gym, and I've never felt that the phrase "*it hurts so good*" would be so apt. As such, I've done an in depth analysis of TB2 data <a href="https://gitlab.com/psark/Tension-Board-2-Analysis">here</a>, and have decided to mimic that analysis with available Kilter Board data.

 ![Hold Usage Heatmap](images/02_hold_stats/all_holds_all_grades_heatmap.png)

@@ -11,12 +11,12 @@ I recently got into *board climbing*, and I enjoy climbing on the TB2 and the Ki

 ## Overview

-This project analyzes ~300,000 climbs from the Kilter Board in order to do the following.
+This project analyzes ~300,000 climbs on the Kilter Board in order to do the following.
 > 1. **Understand** hold usage patterns and difficulty distributions
 > 2. **Quantify** empircal hold difficulty scores
 > 3. **Predict** climb grades from hold positions and board angle

-Climbing grades are inherently subjective. Different climbers use different beta, setters have different grading standards, and difficulty depends on factors not always captured in data. What makes it harder in the case of the board climbing is that the grade is displayed almost democratically -- it is determined by user input. 
+Climbing grades are inherently subjective. Different climbers use different beta, setters have different grading standards, and difficulty depends on factors not always captured in data. Moreover, on the boards, the displayed grade for any specific climb is based on user input.

 Using a Kilter Board dataset, this project combines:

@@ -153,8 +153,7 @@ Beyond structural analysis, we can also study how board-climbers behave over tim
 ![Hold Difficulty](images/03_hold_difficulty/difficulty_hand_40deg.png)

 * Hold difficulty is estimated from climb data
-* We averaged (pre-role/per-angle) difficulty for each hold (with Bayesian smoothing)
-* Took advantage of the mirrored layout to increase the amount of data per hold
+* We averaged (per-role/per-angle) difficulty for each hold (with Bayesian smoothing)

 ### Key technique: Bayesian smoothing

@@ -167,7 +166,7 @@ This significantly improves downstream feature quality.
 
 ---

-## 6. Many more!
+## 6. Many more

 There are many other statistics, see notebooks [`01`](notebooks/01_data_overview_and_climbing_statistics.ipynb) (climbing statistics), [`02`](notebooks/02_hold_analysis_and_board_heatmaps.ipynb) (climbing hold statistics), and [`03`](notebooks/03_hold_difficulty.ipynb) (hold difficulty). Included are:

@@ -188,35 +187,58 @@ This section focuses on **building predictive models and evaluating performance*

 ---

-## 7. Feature Engineering
+Features are constructed at the climb level using only **structural and geometric information** derived from the climb definition (`angle` and `frames`).

-Features are constructed at the climb level using:
+We explicitly avoid using hold-difficulty-derived features in the predictive models to prevent target leakage.

-* geometry (height, spread, convex hull)
-* structure (number of moves, clustering)
-* hold difficulty (smoothed estimates)
-* interaction features
+Feature categories include:
+
+* **Geometry** — spatial footprint of the climb (height, spread, convex hull)
+* **Movement** — reach distances and spatial relationships between holds
+* **Density** — how tightly or sparsely holds are arranged
+* **Symmetry** — left/right balance and distribution
+* **Path structure** — approximations of movement flow and efficiency
+* **Normalized position** — relative positioning on the board
+* **Interaction features** — simple nonlinear combinations (e.g., angle × hold count)
+
+This results in a **leakage-free feature set** that better reflects the physical structure of climbing.


-| Category      | Description                       | Examples                                    |
-| ------------- | --------------------------------- | ------------------------------------------- |
-| Geometry      | Shape and size of climb           | bbox_area, range_x, range_y                 |
-| Movement      | Reach and movement complexity     | max_hand_reach, path_efficiency             |
-| Difficulty    | Hold-based difficulty metrics     | mean_hold_difficulty, max_hold_difficulty   |
-| Progression   | How difficulty changes over climb | difficulty_gradient, difficulty_progression |
-| Symmetry      | Left/right balance                | symmetry_score, hand_symmetry               |
-| Clustering    | Local density of holds            | mean_neighbors_12in                         |
-| Normalization | Relative board positioning        | mean_y_normalized                           |
-| Distribution  | Vertical distribution of holds    | y_q25, y_q75                                |
+| Category      | Description                              | Examples                                  |
+| ------------- | ---------------------------------------- | ----------------------------------------- |
+| Geometry      | Shape and size of climb                  | bbox_area, range_x, range_y               |
+| Movement      | Reach and movement structure             | mean_hand_reach, path_efficiency          |
+| Density       | Hold spacing and compactness             | hold_density, holds_per_vertical_foot     |
+| Symmetry      | Left/right balance                       | symmetry_score, left_ratio                |
+| Path          | Approximate movement trajectory          | path_length_vertical                      |
+| Position      | Relative board positioning               | mean_y_normalized, start_height_normalized|
+| Distribution  | Vertical distribution of holds           | y_q75, y_iqr                              |
+| Interaction   | Nonlinear feature combinations           | angle_squared, angle_x_holds              |

 ### Important design decision

 The dataset is restricted to:

-> **climbs with angle ≤ 50°**
+> **climbs with angle ≤ 55°**

 to reduce variability and improve consistency. (see [Angle vs Difficulty](#3-angle-vs-difficulty), where average climb grade seems to stabilize or get lower over 50°)

+###
+
+### Important: Leakage and Feature Design
+
+Earlier iterations of this project included features derived from hold difficulty scores (computed from climb grades). While these features slightly improved predictive performance, they introduce a form of **target leakage** if computed globally.
+
+In this version of the project:
+
+* Hold difficulty scores are still computed in Notebook 03 for **exploratory analysis**
+* Predictive models (Notebooks 04–06) use only **leakage-free features**
+* No feature is derived from the target variable (`display_difficulty`)
+
+This allows the model to learn from the **structure of climbs themselves**, rather than from aggregated statistics of the labels.
+
+Note: Hold-difficulty-based features can still be valid in a production setting if computed strictly from historical (training) data, similar to target encoding techniques.
+
 ---

 ## 8. Feature Relationships
@@ -226,10 +248,10 @@ Here are some relationships between features and difficulty
 ![Correlation Heatmap](images/04_climb_features/feature_correlations.png)

 * higher angles allow for harder difficulties
-* hold difficulty features seem to correlate the most to difficulty
-* engineered features capture non-trivial structure
+* distance between holds seems to correlate with difficulty
+* geometric and structural features capture non-trivial climbing patterns

-We have a full feature list in [`data/04_climb_features/feature_list.txt`](data/04_climb_features/feature_list.txt). Explanations are available in [`data/04_climb_features/feature_list_explanations.txt`](data/04_climb_features/feature_explanations.txt).
+We have a full feature list in [`data/04_climb_features/feature_list.txt`](data/04_climb_features/feature_list.txt). Explanations are available in [`data/04_climb_features/feature_explanations.txt`](data/04_climb_features/feature_explanations.txt).

 ---

@@ -253,22 +275,28 @@ Models tested:

 Key drivers:

-* hold difficulty
 * wall angle
-* structural features
+* reach-based features (e.g., mean/max hand reach)
+* spatial density and distribution
+* geometric structure of the climb
+
+This confirms that **difficulty is strongly tied to spatial arrangement and movement constraints**, rather than just individual hold properties.

 ---

 ## 10. Model Performance

-![RF redicted vs Actual](images/05_predictive_modelling/random_forest_predictions.png)
+![RF Predicted vs Actual](images/05_predictive_modelling/random_forest_predictions.png)
 ![NN Predicted vs Actual](images/06_deep_learning/neural_network_predictions.png)

-### Results (in terms of difficulty score)
+### Results (in terms of V-grade)
 Both the RF and NN models performed similarly.
-* **~83% within ±1 V-grade (~45% within ±1 difficulty score)**
-* **~96% within ±2 V-grade (~80% within ±2 difficulty scores)**
+* **~70% within ±1 V-grade (~36% within ±1 difficulty score)**
+* **~90% within ±2 V-grade (~65% within ±2 difficulty scores)**

+In earlier experiements, we were able to achieve ~83% within one V-grade and ~96% within 2. However, that setup used hold-difficulties from notebook 03 derived from climbing grades, creating leakage. This result is more realistic and more independent: the model relies purely on spatial and structural information, without access to hold-based information or beta.
+
+This demonstrates that a substantial portion of climbing difficulty can be attributed to geometry and movement constraints. 

 ### Interpretation

@@ -285,15 +313,15 @@ Both the RF and NN models performed similarly.

 | Metric             | Performance |
 | ------------------ | ----------- |
-| Within ±1 V-grade  | ~83%        |
-| Within ±2 V-grades | ~96%        |
+| Within ±1 V-grade  | ~70%        |
+| Within ±2 V-grades | ~90%        |

 The model can still predict subgrades (e.g., V3 contains 6a and 6a+), but it is not as accurate.

 | Metric             | Performance |
 | ------------------ | ----------- |
-| Within ±1 difficulty-grade  | ~45%        |
-| Within ±2 difficulty-grades | ~80%        |
+| Within ±1 difficulty-grade  | ~36%        |
+| Within ±2 difficulty-grades | ~65%        |

 ---

@@ -308,6 +336,7 @@ The model can still predict subgrades (e.g., V3 contains 6a and 6a+), but it is

 # Future Work

+* Unified grade prediction across boards
 * Combined board analysis
 * Test other models
 * Better spatial features
@@ -1,4 +1,5 @@
 angle
+angle_squared
 total_holds
 hand_holds
 foot_holds
@@ -6,7 +7,6 @@ start_holds
 finish_holds
 middle_holds
 is_nomatch
-mean_x
 mean_y
 std_x
 std_y
@@ -14,107 +14,36 @@ range_x
 range_y
 min_y
 max_y
-start_height
-start_height_min
-start_height_max
-finish_height
-finish_height_min
-finish_height_max
 height_gained
 height_gained_start_finish
 bbox_area
-bbox_aspect_ratio
-bbox_normalized_area
 hold_density
 holds_per_vertical_foot
-left_holds
-right_holds
 left_ratio
 symmetry_score
-hand_left_ratio
-hand_symmetry
-upper_holds
-lower_holds
 upper_ratio
-max_hand_reach
-min_hand_reach
 mean_hand_reach
+max_hand_reach
 std_hand_reach
 hand_spread_x
 hand_spread_y
-max_foot_spread
-mean_foot_spread
-foot_spread_x
-foot_spread_y
-max_hand_to_foot
 min_hand_to_foot
 mean_hand_to_foot
 std_hand_to_foot
-mean_hold_difficulty
-max_hold_difficulty
-min_hold_difficulty
-std_hold_difficulty
-median_hold_difficulty
-difficulty_range
-mean_hand_difficulty
-max_hand_difficulty
-std_hand_difficulty
-mean_foot_difficulty
-max_foot_difficulty
-std_foot_difficulty
-start_difficulty
-finish_difficulty
-hand_foot_ratio
-movement_density
-hold_com_x
-hold_com_y
-weighted_difficulty
 convex_hull_area
-convex_hull_perimeter
 hull_area_to_bbox_ratio
-min_nn_distance
-mean_nn_distance
-max_nn_distance
-std_nn_distance
-mean_neighbors_12in
-max_neighbors_12in
-clustering_ratio
+mean_pairwise_distance
+std_pairwise_distance
 path_length_vertical
 path_efficiency
-difficulty_gradient
-lower_region_difficulty
-middle_region_difficulty
-upper_region_difficulty
-difficulty_progression
-max_difficulty_jump
-mean_difficulty_jump
-difficulty_weighted_reach
-max_weighted_reach
-mean_x_normalized
 mean_y_normalized
-std_x_normalized
-std_y_normalized
 start_height_normalized
 finish_height_normalized
-start_offset_from_typical
-finish_offset_from_typical
 mean_y_relative_to_start
-max_y_relative_to_start
 spread_x_normalized
 spread_y_normalized
-bbox_coverage_x
-bbox_coverage_y
-y_q25
-y_q50
 y_q75
 y_iqr
-holds_bottom_quartile
-holds_top_quartile
+complexity_score
 display_difficulty
 angle_x_holds
-angle_x_difficulty
-angle_squared
-difficulty_x_height
-difficulty_x_density
-complexity_score
-hull_area_x_difficulty
@@ -0,0 +1,35 @@
+
+### Model Performance Summary
+
+| Model | MAE | RMSE | R² | Within ±1 | Within ±2 | Exact V | Within ±1 V |
+|-------|-----|------|----|-----------|-----------|---------|-------------|
+| Linear Regression | 2.088 | 2.670 | 0.560 | 30.1% | 55.9% | 25.9% | 64.8% |
+| Ridge Regression | 2.088 | 2.670 | 0.560 | 30.0% | 55.9% | 25.9% | 64.8% |
+| Lasso Regression | 2.089 | 2.672 | 0.559 | 29.9% | 55.9% | 25.9% | 64.8% |
+| Random Forest (Tuned) | 1.846 | 2.375 | 0.652 | 34.8% | 62.4% | 29.6% | 69.7% |
+
+### Key Findings
+
+1. **Tree-based models remain strongest on this structured feature set.**
+   - Random Forest (Tuned) achieves the best overall balance of MAE, RMSE, and grouped V-grade performance.
+   - Linear models remain useful baselines but leave clear nonlinear signal unexplained.
+
+2. **Fine-grained difficulty prediction is meaningfully harder than grouped grade prediction.**
+   - On the held-out test set, the best model is within ±1 fine-grained difficulty score 34.8% of the time.
+   - The same model is within ±1 grouped V-grade 69.7% of the time.
+
+3. **This gap is expected and informative.**
+   - Small numeric errors often stay inside the same or adjacent V-grade buckets.
+   - The model captures broad difficulty bands more reliably than exact score distinctions.
+
+4. **The project’s main predictive takeaway is practical rather than perfect.**
+   - The models are not exact grade replicators.
+   - They are reasonably strong at placing climbs into the correct neighborhood of difficulty.
+
+### Portfolio Interpretation
+
+From a modelling perspective, this project shows:
+- feature engineering grounded in domain structure,
+- comparison of linear and nonlinear models,
+- honest evaluation on a held-out test set,
+- and the ability to translate raw regression performance into climbing-relevant grouped V-grade metrics.
@@ -0,0 +1,48 @@
+angle
+angle_squared
+total_holds
+hand_holds
+foot_holds
+start_holds
+finish_holds
+middle_holds
+is_nomatch
+mean_y
+std_x
+std_y
+range_x
+range_y
+min_y
+max_y
+height_gained
+height_gained_start_finish
+bbox_area
+hold_density
+holds_per_vertical_foot
+left_ratio
+symmetry_score
+upper_ratio
+mean_hand_reach
+max_hand_reach
+std_hand_reach
+hand_spread_x
+hand_spread_y
+min_hand_to_foot
+mean_hand_to_foot
+std_hand_to_foot
+convex_hull_area
+hull_area_to_bbox_ratio
+mean_pairwise_distance
+std_pairwise_distance
+path_length_vertical
+path_efficiency
+mean_y_normalized
+start_height_normalized
+finish_height_normalized
+mean_y_relative_to_start
+spread_x_normalized
+spread_y_normalized
+y_q75
+y_iqr
+complexity_score
+angle_x_holds
@@ -19,8 +19,11 @@
    "2. **Route structure**  \n",
    "   Examples: number of holds, spatial spread, height gained, move lengths, left/right balance, and other frame-derived quantities.\n",
    "\n",
-    "3. **Hold difficulty priors**  \n",
-    "   Examples: average, maximum, and distributional summaries of the empirical hold scores built in notebook 03.\n",
+    "When this was initially done, we added:\n",
+    "\n",
+    "3. **Hold difficulty priors** \n",
+    "\n",
+    "However, that makes it quite circular -- we'd be using the difficulty data to create difficulty scores to then predict difficulty data. The difficulty is already baked in there, so it is not a very good independent model. Heuristically, I don't think this is a big deal if we **just** want to predict V-grades, but we'll leave it out of our analysis in order to see what sorts of features actually help determine the difficulty of a climb.\n",
    "\n",
    "## Output\n",
    "\n",
@@ -111,8 +114,8 @@
    "Query our data from the DB\n",
    "==================================\n",
    "\n",
-    "We restrict to `layout_id=1` for the Kilter Board Original\n",
-    "\n",
+    "We restrict to `layout_id=1` for the Kilter Board Original.\n",
+    "Again, we set the date to be past 2016 for simplicity (dates start in 2018, with the exception of one in 2006).\n",
    "\"\"\"\n",
    "\n",
    "# Query climbs data\n",
@@ -261,7 +264,7 @@
    "\n",
    "\n",
    "# Test\n",
-    "test_frames = \"p1r5p2r6p3r8p4r5\"\n",
+    "test_frames = \"p1r12p2r13p3r14p4r15\"\n",
    "parsed = parse_frames(test_frames)\n",
    "print(f\"Test parse: {parsed}\")"
   ]
@@ -283,564 +286,212 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "\"\"\"\n",
-    "==================================\n",
-    "Feature Exraction Function\n",
-    "==================================\n",
-    "\"\"\"\n",
-    "\n",
-    "def extract_features(row, placement_coords, df_hold_difficulty):\n",
+    "def extract_features(row, placement_coords):\n",
    "    \"\"\"\n",
-    "    Extract all features from a single climb row.\n",
+    "    Extract a trimmed set of clean geometric/spatial features.\n",
+    "    No hold-difficulty-derived features are used.\n",
    "    \"\"\"\n",
    "    features = {}\n",
-    "    \n",
-    "    # Parse frames\n",
+    "\n",
    "    holds = parse_frames(row['frames'])\n",
    "    angle = row['angle']\n",
-    "    \n",
+    "\n",
    "    if not holds:\n",
    "        return None\n",
-    "    \n",
-    "    # =====================\n",
-    "    # BASIC HOLD EXTRACTION\n",
-    "    # =====================\n",
-    "    \n",
+    "\n",
    "    hold_data = []\n",
    "    for placement_id, role_id in holds:\n",
    "        coords = placement_coords.get(placement_id, (None, None))\n",
    "        if coords[0] is None:\n",
    "            continue\n",
-    "        \n",
+    "\n",
    "        role_type = get_role_type(role_id)\n",
    "        is_hand = role_id in HAND_ROLE_IDS\n",
    "        is_foot = role_id in FOOT_ROLE_IDS\n",
-    "        \n",
-    "        # Get difficulty scores for this hold at this angle\n",
-    "        diff_key = f\"{role_type}_diff_{int(angle)}deg\"\n",
-    "        hand_diff_key = f\"hand_diff_{int(angle)}deg\"\n",
-    "        foot_diff_key = f\"foot_diff_{int(angle)}deg\"\n",
-    "        \n",
-    "        difficulty = None\n",
-    "        if placement_id in df_hold_difficulty.index:\n",
-    "            # Try role-specific first, then aggregate\n",
-    "            if diff_key in df_hold_difficulty.columns:\n",
-    "                difficulty = df_hold_difficulty.loc[placement_id, diff_key]\n",
-    "            if pd.isna(difficulty):\n",
-    "                if is_hand and hand_diff_key in df_hold_difficulty.columns:\n",
-    "                    difficulty = df_hold_difficulty.loc[placement_id, hand_diff_key]\n",
-    "                elif is_foot and foot_diff_key in df_hold_difficulty.columns:\n",
-    "                    difficulty = df_hold_difficulty.loc[placement_id, foot_diff_key]\n",
-    "            \n",
-    "            # Fallback to overall\n",
-    "            if pd.isna(difficulty) and 'overall_difficulty' in df_hold_difficulty.columns:\n",
-    "                difficulty = df_hold_difficulty.loc[placement_id, 'overall_difficulty']\n",
-    "        \n",
+    "\n",
    "        hold_data.append({\n",
    "            'placement_id': placement_id,\n",
    "            'x': coords[0],\n",
    "            'y': coords[1],\n",
-    "            'role_id': role_id,\n",
    "            'role_type': role_type,\n",
    "            'is_hand': is_hand,\n",
    "            'is_foot': is_foot,\n",
-    "            'difficulty': difficulty\n",
    "        })\n",
-    "    \n",
+    "\n",
    "    if not hold_data:\n",
    "        return None\n",
-    "    \n",
+    "\n",
    "    df_holds = pd.DataFrame(hold_data)\n",
-    "    \n",
-    "    # Separate by role\n",
+    "\n",
    "    hand_holds = df_holds[df_holds['is_hand']]\n",
    "    foot_holds = df_holds[df_holds['is_foot']]\n",
    "    start_holds = df_holds[df_holds['role_type'] == 'start']\n",
    "    finish_holds = df_holds[df_holds['role_type'] == 'finish']\n",
    "    middle_holds = df_holds[df_holds['role_type'] == 'middle']\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 1. ANGLE\n",
-    "    # =====================\n",
+    "\n",
+    "    xs = df_holds['x'].to_numpy()\n",
+    "    ys = df_holds['y'].to_numpy()\n",
+    "\n",
+    "    description = row.get('description', '')\n",
+    "    if pd.isna(description):\n",
+    "        description = ''\n",
+    "\n",
+    "    center_x = (x_min + x_max) / 2\n",
+    "\n",
+    "    # Basic\n",
    "    features['angle'] = angle\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 2. BASIC COUNTS\n",
-    "    # =====================\n",
+    "    features['angle_squared'] = angle ** 2\n",
+    "\n",
    "    features['total_holds'] = len(df_holds)\n",
    "    features['hand_holds'] = len(hand_holds)\n",
    "    features['foot_holds'] = len(foot_holds)\n",
    "    features['start_holds'] = len(start_holds)\n",
    "    features['finish_holds'] = len(finish_holds)\n",
    "    features['middle_holds'] = len(middle_holds)\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 3. MATCHING FEATURE\n",
-    "    # =====================\n",
-    "    # A climb is \"matching\" if you are allowed to match your hands at any hold.\n",
-    "    # There are slight difference in difficulties of matchines vs no matching climbs as per our analysis in 01.\n",
-    "    features['is_nomatch'] = int((row['is_nomatch'] == 1) or bool(re.search(r'\\bno\\s*match(ing)?\\b', row['description'], flags=re.IGNORECASE)))\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 4. SPATIAL/POSITION\n",
-    "    # =====================\n",
-    "    xs = df_holds['x'].values\n",
-    "    ys = df_holds['y'].values\n",
-    "    \n",
-    "    features['mean_x'] = np.mean(xs)\n",
+    "\n",
+    "    features['is_nomatch'] = int(\n",
+    "        (row['is_nomatch'] == 1) or\n",
+    "        bool(re.search(r'\\bno\\s*match(ing)?\\b', description, flags=re.IGNORECASE))\n",
+    "    )\n",
+    "\n",
+    "    # Spatial\n",
    "    features['mean_y'] = np.mean(ys)\n",
-    "    features['std_x'] = np.std(xs) if len(xs) > 1 else 0\n",
-    "    features['std_y'] = np.std(ys) if len(ys) > 1 else 0\n",
+    "    features['std_x'] = np.std(xs) if len(xs) > 1 else 0.0\n",
+    "    features['std_y'] = np.std(ys) if len(ys) > 1 else 0.0\n",
    "    features['range_x'] = np.max(xs) - np.min(xs)\n",
    "    features['range_y'] = np.max(ys) - np.min(ys)\n",
    "    features['min_y'] = np.min(ys)\n",
    "    features['max_y'] = np.max(ys)\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 5. HEIGHT FEATURES\n",
-    "    # =====================\n",
-    "    if len(start_holds) > 0:\n",
-    "        features['start_height'] = start_holds['y'].mean()\n",
-    "        features['start_height_min'] = start_holds['y'].min()\n",
-    "        features['start_height_max'] = start_holds['y'].max()\n",
-    "    else:\n",
-    "        features['start_height'] = np.nan\n",
-    "        features['start_height_min'] = np.nan\n",
-    "        features['start_height_max'] = np.nan\n",
-    "    \n",
-    "    if len(finish_holds) > 0:\n",
-    "        features['finish_height'] = finish_holds['y'].mean()\n",
-    "        features['finish_height_min'] = finish_holds['y'].min()\n",
-    "        features['finish_height_max'] = finish_holds['y'].max()\n",
-    "    else:\n",
-    "        features['finish_height'] = np.nan\n",
-    "        features['finish_height_min'] = np.nan\n",
-    "        features['finish_height_max'] = np.nan\n",
-    "    \n",
    "    features['height_gained'] = features['max_y'] - features['min_y']\n",
-    "    \n",
-    "    if pd.notna(features.get('finish_height')) and pd.notna(features.get('start_height')):\n",
-    "        features['height_gained_start_finish'] = features['finish_height'] - features['start_height']\n",
-    "    else:\n",
-    "        features['height_gained_start_finish'] = np.nan\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 6. BBOX FEATURES\n",
-    "    # =====================\n",
-    "    bbox_width = features['range_x']\n",
-    "    bbox_height = features['range_y']\n",
-    "    features['bbox_area'] = bbox_width * bbox_height\n",
-    "    features['bbox_aspect_ratio'] = bbox_width / bbox_height if bbox_height > 0 else 0\n",
-    "    features['bbox_normalized_area'] = features['bbox_area'] / (board_width * board_height)\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 7. HOLD DENSITY\n",
-    "    # =====================\n",
-    "    if features['bbox_area'] > 0:\n",
-    "        features['hold_density'] = features['total_holds'] / features['bbox_area']\n",
-    "    else:\n",
-    "        features['hold_density'] = 0\n",
-    "    \n",
+    "\n",
+    "    # Start / finish heights\n",
+    "    start_height = start_holds['y'].mean() if len(start_holds) > 0 else np.nan\n",
+    "    finish_height = finish_holds['y'].mean() if len(finish_holds) > 0 else np.nan\n",
+    "\n",
+    "    features['height_gained_start_finish'] = (\n",
+    "        finish_height - start_height\n",
+    "        if pd.notna(start_height) and pd.notna(finish_height)\n",
+    "        else np.nan\n",
+    "    )\n",
+    "\n",
+    "    # Density / symmetry\n",
+    "    bbox_area = features['range_x'] * features['range_y']\n",
+    "    features['bbox_area'] = bbox_area\n",
+    "    features['hold_density'] = features['total_holds'] / bbox_area if bbox_area > 0 else 0.0\n",
    "    features['holds_per_vertical_foot'] = features['total_holds'] / max(features['range_y'], 1)\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 8. SYMMETRY/BALANCE\n",
-    "    # =====================\n",
-    "    center_x = (x_min + x_max) / 2\n",
-    "    features['left_holds'] = (df_holds['x'] < center_x).sum()\n",
-    "    features['right_holds'] = (df_holds['x'] >= center_x).sum()\n",
-    "    features['left_ratio'] = features['left_holds'] / features['total_holds'] if features['total_holds'] > 0 else 0.5\n",
-    "    \n",
-    "    # Symmetry score (how balanced left/right)\n",
+    "\n",
+    "    left_holds = (df_holds['x'] < center_x).sum()\n",
+    "    features['left_ratio'] = left_holds / features['total_holds'] if features['total_holds'] > 0 else 0.5\n",
    "    features['symmetry_score'] = 1 - abs(features['left_ratio'] - 0.5) * 2\n",
-    "    \n",
-    "    # Hand symmetry\n",
-    "    if len(hand_holds) > 0:\n",
-    "        hand_left = (hand_holds['x'] < center_x).sum()\n",
-    "        hand_right = (hand_holds['x'] >= center_x).sum()\n",
-    "        features['hand_left_ratio'] = hand_left / len(hand_holds)\n",
-    "        features['hand_symmetry'] = 1 - abs(features['hand_left_ratio'] - 0.5) * 2\n",
-    "    else:\n",
-    "        features['hand_left_ratio'] = np.nan\n",
-    "        features['hand_symmetry'] = np.nan\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 9. VERTICAL DISTRIBUTION\n",
-    "    # =====================\n",
+    "\n",
    "    y_median = np.median(ys)\n",
-    "    features['upper_holds'] = (df_holds['y'] > y_median).sum()\n",
-    "    features['lower_holds'] = (df_holds['y'] <= y_median).sum()\n",
-    "    features['upper_ratio'] = features['upper_holds'] / features['total_holds']\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 10. HAND REACH / SPREAD\n",
-    "    # =====================\n",
+    "    upper_holds = (df_holds['y'] > y_median).sum()\n",
+    "    features['upper_ratio'] = upper_holds / features['total_holds']\n",
+    "\n",
+    "    # Hand reach\n",
    "    if len(hand_holds) >= 2:\n",
-    "        hand_xs = hand_holds['x'].values\n",
-    "        hand_ys = hand_holds['y'].values\n",
-    "        \n",
-    "        hand_distances = []\n",
-    "        for i in range(len(hand_holds)):\n",
-    "            for j in range(i + 1, len(hand_holds)):\n",
-    "                dx = hand_xs[i] - hand_xs[j]\n",
-    "                dy = hand_ys[i] - hand_ys[j]\n",
-    "                hand_distances.append(np.sqrt(dx**2 + dy**2))\n",
-    "        \n",
-    "        features['max_hand_reach'] = max(hand_distances)\n",
-    "        features['min_hand_reach'] = min(hand_distances)\n",
-    "        features['mean_hand_reach'] = np.mean(hand_distances)\n",
-    "        features['std_hand_reach'] = np.std(hand_distances)\n",
-    "        features['hand_spread_x'] = hand_xs.max() - hand_xs.min()\n",
-    "        features['hand_spread_y'] = hand_ys.max() - hand_ys.min()\n",
+    "        hand_points = hand_holds[['x', 'y']].to_numpy()\n",
+    "        hand_distances = pdist(hand_points)\n",
+    "\n",
+    "        hand_xs = hand_holds['x'].to_numpy()\n",
+    "        hand_ys = hand_holds['y'].to_numpy()\n",
+    "\n",
+    "        features['mean_hand_reach'] = float(np.mean(hand_distances))\n",
+    "        features['max_hand_reach'] = float(np.max(hand_distances))\n",
+    "        features['std_hand_reach'] = float(np.std(hand_distances))\n",
+    "        features['hand_spread_x'] = float(hand_xs.max() - hand_xs.min())\n",
+    "        features['hand_spread_y'] = float(hand_ys.max() - hand_ys.min())\n",
    "    else:\n",
-    "        features['max_hand_reach'] = 0\n",
-    "        features['min_hand_reach'] = 0\n",
-    "        features['mean_hand_reach'] = 0\n",
-    "        features['std_hand_reach'] = 0\n",
-    "        features['hand_spread_x'] = 0\n",
-    "        features['hand_spread_y'] = 0\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 11. FOOT SPREAD\n",
-    "    # =====================\n",
-    "    if len(foot_holds) >= 2:\n",
-    "        foot_xs = foot_holds['x'].values\n",
-    "        foot_ys = foot_holds['y'].values\n",
-    "        \n",
-    "        foot_distances = []\n",
-    "        for i in range(len(foot_holds)):\n",
-    "            for j in range(i + 1, len(foot_holds)):\n",
-    "                dx = foot_xs[i] - foot_xs[j]\n",
-    "                dy = foot_ys[i] - foot_ys[j]\n",
-    "                foot_distances.append(np.sqrt(dx**2 + dy**2))\n",
-    "        \n",
-    "        features['max_foot_spread'] = max(foot_distances)\n",
-    "        features['mean_foot_spread'] = np.mean(foot_distances)\n",
-    "        features['foot_spread_x'] = foot_xs.max() - foot_xs.min()\n",
-    "        features['foot_spread_y'] = foot_ys.max() - foot_ys.min()\n",
-    "    else:\n",
-    "        features['max_foot_spread'] = 0\n",
-    "        features['mean_foot_spread'] = 0\n",
-    "        features['foot_spread_x'] = 0\n",
-    "        features['foot_spread_y'] = 0\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 12. HAND-TO-FOOT DISTANCES\n",
-    "    # =====================\n",
+    "        features['mean_hand_reach'] = 0.0\n",
+    "        features['max_hand_reach'] = 0.0\n",
+    "        features['std_hand_reach'] = 0.0\n",
+    "        features['hand_spread_x'] = 0.0\n",
+    "        features['hand_spread_y'] = 0.0\n",
+    "\n",
+    "    # Hand-foot distances\n",
    "    if len(hand_holds) > 0 and len(foot_holds) > 0:\n",
-    "        h2f_distances = []\n",
-    "        for _, h in hand_holds.iterrows():\n",
-    "            for _, f in foot_holds.iterrows():\n",
-    "                dx = h['x'] - f['x']\n",
-    "                dy = h['y'] - f['y']\n",
-    "                h2f_distances.append(np.sqrt(dx**2 + dy**2))\n",
-    "        \n",
-    "        features['max_hand_to_foot'] = max(h2f_distances)\n",
-    "        features['min_hand_to_foot'] = min(h2f_distances)\n",
-    "        features['mean_hand_to_foot'] = np.mean(h2f_distances)\n",
-    "        features['std_hand_to_foot'] = np.std(h2f_distances)\n",
+    "        hand_points = hand_holds[['x', 'y']].to_numpy()\n",
+    "        foot_points = foot_holds[['x', 'y']].to_numpy()\n",
+    "\n",
+    "        dists = []\n",
+    "        for hx, hy in hand_points:\n",
+    "            for fx, fy in foot_points:\n",
+    "                dists.append(np.sqrt((hx - fx)**2 + (hy - fy)**2))\n",
+    "        dists = np.asarray(dists)\n",
+    "\n",
+    "        features['min_hand_to_foot'] = float(np.min(dists))\n",
+    "        features['mean_hand_to_foot'] = float(np.mean(dists))\n",
+    "        features['std_hand_to_foot'] = float(np.std(dists))\n",
    "    else:\n",
-    "        features['max_hand_to_foot'] = 0\n",
-    "        features['min_hand_to_foot'] = 0\n",
-    "        features['mean_hand_to_foot'] = 0\n",
-    "        features['std_hand_to_foot'] = 0\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 13. HOLD DIFFICULTY FEATURES\n",
-    "    # =====================\n",
-    "    difficulties = df_holds['difficulty'].dropna().values\n",
-    "    \n",
-    "    if len(difficulties) > 0:\n",
-    "        features['mean_hold_difficulty'] = np.mean(difficulties)\n",
-    "        features['max_hold_difficulty'] = np.max(difficulties)\n",
-    "        features['min_hold_difficulty'] = np.min(difficulties)\n",
-    "        features['std_hold_difficulty'] = np.std(difficulties)\n",
-    "        features['median_hold_difficulty'] = np.median(difficulties)\n",
-    "        features['difficulty_range'] = features['max_hold_difficulty'] - features['min_hold_difficulty']\n",
-    "    else:\n",
-    "        features['mean_hold_difficulty'] = np.nan\n",
-    "        features['max_hold_difficulty'] = np.nan\n",
-    "        features['min_hold_difficulty'] = np.nan\n",
-    "        features['std_hold_difficulty'] = np.nan\n",
-    "        features['median_hold_difficulty'] = np.nan\n",
-    "        features['difficulty_range'] = np.nan\n",
-    "    \n",
-    "    # Hand difficulty\n",
-    "    hand_diffs = hand_holds['difficulty'].dropna().values if len(hand_holds) > 0 else np.array([])\n",
-    "    if len(hand_diffs) > 0:\n",
-    "        features['mean_hand_difficulty'] = np.mean(hand_diffs)\n",
-    "        features['max_hand_difficulty'] = np.max(hand_diffs)\n",
-    "        features['std_hand_difficulty'] = np.std(hand_diffs)\n",
-    "    else:\n",
-    "        features['mean_hand_difficulty'] = np.nan\n",
-    "        features['max_hand_difficulty'] = np.nan\n",
-    "        features['std_hand_difficulty'] = np.nan\n",
-    "    \n",
-    "    # Foot difficulty\n",
-    "    foot_diffs = foot_holds['difficulty'].dropna().values if len(foot_holds) > 0 else np.array([])\n",
-    "    if len(foot_diffs) > 0:\n",
-    "        features['mean_foot_difficulty'] = np.mean(foot_diffs)\n",
-    "        features['max_foot_difficulty'] = np.max(foot_diffs)\n",
-    "        features['std_foot_difficulty'] = np.std(foot_diffs)\n",
-    "    else:\n",
-    "        features['mean_foot_difficulty'] = np.nan\n",
-    "        features['max_foot_difficulty'] = np.nan\n",
-    "        features['std_foot_difficulty'] = np.nan\n",
-    "    \n",
-    "    # Start/Finish difficulty\n",
-    "    start_diffs = start_holds['difficulty'].dropna().values if len(start_holds) > 0 else np.array([])\n",
-    "    finish_diffs = finish_holds['difficulty'].dropna().values if len(finish_holds) > 0 else np.array([])\n",
-    "    \n",
-    "    features['start_difficulty'] = np.mean(start_diffs) if len(start_diffs) > 0 else np.nan\n",
-    "    features['finish_difficulty'] = np.mean(finish_diffs) if len(finish_diffs) > 0 else np.nan\n",
-    "    \n",
-    "    # =====================\n",
-    "    # 14. ADDITIONAL BASIC FEATURES\n",
-    "    # =====================\n",
-    "    \n",
-    "    # Hand to foot ratio\n",
-    "    features['hand_foot_ratio'] = features['hand_holds'] / max(features['foot_holds'], 1)\n",
-    "    \n",
-    "    # Movement complexity\n",
-    "    features['movement_density'] = features['total_holds'] / max(features['height_gained'], 1)\n",
-    "    \n",
-    "    # Center of mass of holds\n",
-    "    features['hold_com_x'] = np.average(xs, weights=None)\n",
-    "    features['hold_com_y'] = np.average(ys, weights=None)\n",
-    "    \n",
-    "    # Weighted difficulty (by y position)\n",
-    "    if len(difficulties) > 0 and len(ys) >= len(difficulties):\n",
-    "        weights = (ys[:len(difficulties)] - ys.min()) / max(ys.max() - ys.min(), 1) + 0.5\n",
-    "        features['weighted_difficulty'] = np.average(difficulties, weights=weights)\n",
-    "    else:\n",
-    "        features['weighted_difficulty'] = features['mean_hold_difficulty']\n",
-    "    \n",
-    "    # =====================================================\n",
-    "    # 15. GEOMETRIC FEATURES\n",
-    "    # =====================================================\n",
-    "    \n",
-    "    # Convex hull area (2D polygon enclosing all holds)\n",
+    "        features['min_hand_to_foot'] = 0.0\n",
+    "        features['mean_hand_to_foot'] = 0.0\n",
+    "        features['std_hand_to_foot'] = 0.0\n",
+    "\n",
+    "    # Global geometry\n",
+    "    points = np.column_stack([xs, ys])\n",
+    "\n",
    "    if len(df_holds) >= 3:\n",
    "        try:\n",
-    "            points = np.column_stack([xs, ys])\n",
    "            hull = ConvexHull(points)\n",
-    "            features['convex_hull_area'] = hull.volume  # In 2D, volume = area\n",
-    "            features['convex_hull_perimeter'] = hull.area  # In 2D, area = perimeter\n",
-    "            features['hull_area_to_bbox_ratio'] = features['convex_hull_area'] / max(features['bbox_area'], 1)\n",
-    "        except:\n",
+    "            features['convex_hull_area'] = float(hull.volume)\n",
+    "            features['hull_area_to_bbox_ratio'] = features['convex_hull_area'] / max(bbox_area, 1)\n",
+    "        except Exception:\n",
    "            features['convex_hull_area'] = np.nan\n",
-    "            features['convex_hull_perimeter'] = np.nan\n",
    "            features['hull_area_to_bbox_ratio'] = np.nan\n",
    "    else:\n",
-    "        features['convex_hull_area'] = 0\n",
-    "        features['convex_hull_perimeter'] = 0\n",
-    "        features['hull_area_to_bbox_ratio'] = 0\n",
-    "    \n",
-    "    # Nearest neighbor distances\n",
+    "        features['convex_hull_area'] = 0.0\n",
+    "        features['hull_area_to_bbox_ratio'] = 0.0\n",
+    "\n",
    "    if len(df_holds) >= 2:\n",
-    "        points = np.column_stack([xs, ys])\n",
-    "        distances = pdist(points)\n",
-    "        \n",
-    "        features['min_nn_distance'] = np.min(distances)\n",
-    "        features['mean_nn_distance'] = np.mean(distances)\n",
-    "        features['max_nn_distance'] = np.max(distances)\n",
-    "        features['std_nn_distance'] = np.std(distances)\n",
+    "        pairwise = pdist(points)\n",
+    "        features['mean_pairwise_distance'] = float(np.mean(pairwise))\n",
+    "        features['std_pairwise_distance'] = float(np.std(pairwise))\n",
    "    else:\n",
-    "        features['min_nn_distance'] = 0\n",
-    "        features['mean_nn_distance'] = 0\n",
-    "        features['max_nn_distance'] = 0\n",
-    "        features['std_nn_distance'] = 0\n",
-    "    \n",
-    "    # Clustering coefficient (holds grouped vs spread)\n",
-    "    if len(df_holds) >= 3:\n",
-    "        points = np.column_stack([xs, ys])\n",
-    "        dist_matrix = squareform(pdist(points))\n",
-    "        \n",
-    "        # Count neighbors within threshold (e.g., 12 inches)\n",
-    "        threshold = 12.0\n",
-    "        neighbors_count = (dist_matrix < threshold).sum(axis=1) - 1  # Exclude self\n",
-    "        features['mean_neighbors_12in'] = np.mean(neighbors_count)\n",
-    "        features['max_neighbors_12in'] = np.max(neighbors_count)\n",
-    "        \n",
-    "        # Clustering: ratio of actual neighbors to max possible\n",
-    "        avg_neighbors = np.mean(neighbors_count)\n",
-    "        max_possible = len(df_holds) - 1\n",
-    "        features['clustering_ratio'] = avg_neighbors / max_possible if max_possible > 0 else 0\n",
-    "    else:\n",
-    "        features['mean_neighbors_12in'] = 0\n",
-    "        features['max_neighbors_12in'] = 0\n",
-    "        features['clustering_ratio'] = 0\n",
-    "    \n",
-    "    # Path length (greedy nearest-neighbor tour)\n",
+    "        features['mean_pairwise_distance'] = 0.0\n",
+    "        features['std_pairwise_distance'] = 0.0\n",
+    "\n",
    "    if len(df_holds) >= 2:\n",
-    "        # Sort by y (bottom to top) for approximate path\n",
-    "        sorted_indices = np.argsort(ys)\n",
-    "        sorted_points = np.column_stack([xs[sorted_indices], ys[sorted_indices]])\n",
-    "        \n",
-    "        path_length = 0\n",
+    "        sorted_idx = np.argsort(ys)\n",
+    "        sorted_points = points[sorted_idx]\n",
+    "\n",
+    "        path_length = 0.0\n",
    "        for i in range(len(sorted_points) - 1):\n",
-    "            dx = sorted_points[i+1, 0] - sorted_points[i, 0]\n",
-    "            dy = sorted_points[i+1, 1] - sorted_points[i, 1]\n",
+    "            dx = sorted_points[i + 1, 0] - sorted_points[i, 0]\n",
+    "            dy = sorted_points[i + 1, 1] - sorted_points[i, 1]\n",
    "            path_length += np.sqrt(dx**2 + dy**2)\n",
-    "        \n",
+    "\n",
    "        features['path_length_vertical'] = path_length\n",
    "        features['path_efficiency'] = features['height_gained'] / max(path_length, 1)\n",
    "    else:\n",
-    "        features['path_length_vertical'] = 0\n",
-    "        features['path_efficiency'] = 0\n",
-    "    \n",
-    "    # =====================================================\n",
-    "    # 16. DIFFICULTY-WEIGHTED FEATURES\n",
-    "    # =====================================================\n",
-    "    \n",
-    "    # Difficulty gradient (finish vs start)\n",
-    "    if pd.notna(features.get('finish_difficulty')) and pd.notna(features.get('start_difficulty')):\n",
-    "        features['difficulty_gradient'] = features['finish_difficulty'] - features['start_difficulty']\n",
-    "    else:\n",
-    "        features['difficulty_gradient'] = np.nan\n",
-    "    \n",
-    "    # Difficulty variance by vertical region (split into thirds)\n",
-    "    if len(difficulties) > 0:\n",
-    "        y_min_val, y_max_val = ys.min(), ys.max()\n",
-    "        y_range = y_max_val - y_min_val\n",
-    "        \n",
-    "        if y_range > 0:\n",
-    "            # Split into lower, middle, upper thirds\n",
-    "            lower_mask = ys <= (y_min_val + y_range / 3)\n",
-    "            middle_mask = (ys > y_min_val + y_range / 3) & (ys <= y_min_val + 2 * y_range / 3)\n",
-    "            upper_mask = ys > (y_min_val + 2 * y_range / 3)\n",
-    "            \n",
-    "            # Get difficulties for each region\n",
-    "            df_with_diff = df_holds.copy()\n",
-    "            df_with_diff['lower'] = lower_mask\n",
-    "            df_with_diff['middle'] = middle_mask\n",
-    "            df_with_diff['upper'] = upper_mask\n",
-    "            \n",
-    "            lower_diffs = df_with_diff[df_with_diff['lower'] & df_with_diff['difficulty'].notna()]['difficulty']\n",
-    "            middle_diffs = df_with_diff[df_with_diff['middle'] & df_with_diff['difficulty'].notna()]['difficulty']\n",
-    "            upper_diffs = df_with_diff[df_with_diff['upper'] & df_with_diff['difficulty'].notna()]['difficulty']\n",
-    "            \n",
-    "            features['lower_region_difficulty'] = lower_diffs.mean() if len(lower_diffs) > 0 else np.nan\n",
-    "            features['middle_region_difficulty'] = middle_diffs.mean() if len(middle_diffs) > 0 else np.nan\n",
-    "            features['upper_region_difficulty'] = upper_diffs.mean() if len(upper_diffs) > 0 else np.nan\n",
-    "            \n",
-    "            # Difficulty progression (upper - lower)\n",
-    "            if pd.notna(features['lower_region_difficulty']) and pd.notna(features['upper_region_difficulty']):\n",
-    "                features['difficulty_progression'] = features['upper_region_difficulty'] - features['lower_region_difficulty']\n",
-    "            else:\n",
-    "                features['difficulty_progression'] = np.nan\n",
-    "        else:\n",
-    "            features['lower_region_difficulty'] = features['mean_hold_difficulty']\n",
-    "            features['middle_region_difficulty'] = features['mean_hold_difficulty']\n",
-    "            features['upper_region_difficulty'] = features['mean_hold_difficulty']\n",
-    "            features['difficulty_progression'] = 0\n",
-    "    else:\n",
-    "        features['lower_region_difficulty'] = np.nan\n",
-    "        features['middle_region_difficulty'] = np.nan\n",
-    "        features['upper_region_difficulty'] = np.nan\n",
-    "        features['difficulty_progression'] = np.nan\n",
-    "    \n",
-    "    # Hardest move estimate (max difficulty jump between consecutive holds)\n",
-    "    if len(hand_holds) >= 2 and len(hand_diffs) >= 2:\n",
-    "        # Sort hand holds by y position\n",
-    "        hand_sorted = hand_holds.sort_values('y')\n",
-    "        hand_diff_sorted = hand_sorted['difficulty'].dropna().values\n",
-    "        \n",
-    "        if len(hand_diff_sorted) >= 2:\n",
-    "            difficulty_jumps = np.abs(np.diff(hand_diff_sorted))\n",
-    "            features['max_difficulty_jump'] = np.max(difficulty_jumps) if len(difficulty_jumps) > 0 else 0\n",
-    "            features['mean_difficulty_jump'] = np.mean(difficulty_jumps) if len(difficulty_jumps) > 0 else 0\n",
-    "        else:\n",
-    "            features['max_difficulty_jump'] = 0\n",
-    "            features['mean_difficulty_jump'] = 0\n",
-    "    else:\n",
-    "        features['max_difficulty_jump'] = 0\n",
-    "        features['mean_difficulty_jump'] = 0\n",
-    "    \n",
-    "    # Difficulty-weighted reach (combine difficulty with distances)\n",
-    "    if len(hand_holds) >= 2 and len(hand_diffs) >= 2:\n",
-    "        hand_sorted = hand_holds.sort_values('y')\n",
-    "        xs_sorted = hand_sorted['x'].values\n",
-    "        ys_sorted = hand_sorted['y'].values\n",
-    "        diffs_sorted = hand_sorted['difficulty'].fillna(hand_diffs.mean()).values\n",
-    "        \n",
-    "        weighted_reach = []\n",
-    "        for i in range(len(hand_sorted) - 1):\n",
-    "            dx = xs_sorted[i+1] - xs_sorted[i]\n",
-    "            dy = ys_sorted[i+1] - ys_sorted[i]\n",
-    "            dist = np.sqrt(dx**2 + dy**2)\n",
-    "            avg_diff = (diffs_sorted[i] + diffs_sorted[i+1]) / 2\n",
-    "            weighted_reach.append(dist * avg_diff)\n",
-    "        \n",
-    "        features['difficulty_weighted_reach'] = np.mean(weighted_reach) if weighted_reach else 0\n",
-    "        features['max_weighted_reach'] = np.max(weighted_reach) if weighted_reach else 0\n",
-    "    else:\n",
-    "        features['difficulty_weighted_reach'] = 0\n",
-    "        features['max_weighted_reach'] = 0\n",
-    "    \n",
-    "    # =====================================================\n",
-    "    # 17. POSITION-NORMALIZED FEATURES\n",
-    "    # =====================================================\n",
-    "    \n",
-    "    # Normalized positions (0-1 scale)\n",
-    "    features['mean_x_normalized'] = (features['mean_x'] - x_min) / board_width\n",
+    "        features['path_length_vertical'] = 0.0\n",
+    "        features['path_efficiency'] = 0.0\n",
+    "\n",
+    "    # Normalized / relative\n",
    "    features['mean_y_normalized'] = (features['mean_y'] - y_min) / board_height\n",
-    "    features['std_x_normalized'] = features['std_x'] / board_width\n",
-    "    features['std_y_normalized'] = features['std_y'] / board_height\n",
-    "    \n",
-    "    # Start/finish normalized\n",
-    "    if pd.notna(features.get('start_height')):\n",
-    "        features['start_height_normalized'] = (features['start_height'] - y_min) / board_height\n",
-    "    else:\n",
-    "        features['start_height_normalized'] = np.nan\n",
-    "    \n",
-    "    if pd.notna(features.get('finish_height')):\n",
-    "        features['finish_height_normalized'] = (features['finish_height'] - y_min) / board_height\n",
-    "    else:\n",
-    "        features['finish_height_normalized'] = np.nan\n",
-    "    \n",
-    "    # Distance from typical positions (center bottom for start, center top for finish)\n",
-    "    typical_start_y = y_min + board_height * 0.15\n",
-    "    typical_finish_y = y_min + board_height * 0.85\n",
-    "    \n",
-    "    if pd.notna(features.get('start_height')):\n",
-    "        features['start_offset_from_typical'] = abs(features['start_height'] - typical_start_y)\n",
-    "    else:\n",
-    "        features['start_offset_from_typical'] = np.nan\n",
-    "    \n",
-    "    if pd.notna(features.get('finish_height')):\n",
-    "        features['finish_offset_from_typical'] = abs(features['finish_height'] - typical_finish_y)\n",
-    "    else:\n",
-    "        features['finish_offset_from_typical'] = np.nan\n",
-    "    \n",
-    "    # Hold positions relative to start\n",
-    "    if len(start_holds) > 0:\n",
-    "        start_y = start_holds['y'].mean()\n",
-    "        features['mean_y_relative_to_start'] = features['mean_y'] - start_y\n",
-    "        features['max_y_relative_to_start'] = features['max_y'] - start_y\n",
-    "    else:\n",
-    "        features['mean_y_relative_to_start'] = np.nan\n",
-    "        features['max_y_relative_to_start'] = np.nan\n",
-    "    \n",
-    "    # Spread normalized by board\n",
+    "    features['start_height_normalized'] = (\n",
+    "        (start_height - y_min) / board_height if pd.notna(start_height) else np.nan\n",
+    "    )\n",
+    "    features['finish_height_normalized'] = (\n",
+    "        (finish_height - y_min) / board_height if pd.notna(finish_height) else np.nan\n",
+    "    )\n",
+    "    features['mean_y_relative_to_start'] = (\n",
+    "        features['mean_y'] - start_height if pd.notna(start_height) else np.nan\n",
+    "    )\n",
    "    features['spread_x_normalized'] = features['range_x'] / board_width\n",
    "    features['spread_y_normalized'] = features['range_y'] / board_height\n",
-    "    \n",
-    "    # Bbox coverage (percentage of board covered)\n",
-    "    features['bbox_coverage_x'] = features['range_x'] / board_width\n",
-    "    features['bbox_coverage_y'] = features['range_y'] / board_height\n",
-    "    \n",
-    "    # Position quartile features\n",
-    "    y_quartiles = np.percentile(ys, [25, 50, 75])\n",
-    "    features['y_q25'] = y_quartiles[0]\n",
-    "    features['y_q50'] = y_quartiles[1]\n",
-    "    features['y_q75'] = y_quartiles[2]\n",
-    "    features['y_iqr'] = y_quartiles[2] - y_quartiles[0]\n",
-    "    \n",
-    "    # Holds in each vertical quartile\n",
-    "    features['holds_bottom_quartile'] = (ys < y_quartiles[0]).sum()\n",
-    "    features['holds_top_quartile'] = (ys >= y_quartiles[2]).sum()\n",
-    "    \n",
+    "\n",
+    "    y_q75 = np.percentile(ys, 75)\n",
+    "    y_q25 = np.percentile(ys, 25)\n",
+    "    features['y_q75'] = y_q75\n",
+    "    features['y_iqr'] = y_q75 - y_q25\n",
+    "\n",
+    "    # Optional engineered clean feature\n",
+    "    features['complexity_score'] = (\n",
+    "        features['mean_hand_reach']\n",
+    "        * np.log1p(features['total_holds'])\n",
+    "        * (1 + features['hold_density'])\n",
+    "    )\n",
+    "\n",
    "    return features"
   ]
  },
@@ -851,7 +502,7 @@
   "source": [
    "## Sanity Check on One Example\n",
    "\n",
-    "Before extracting features for the entire dataset, we inspect one representative climb to confirm that the parsing logic and the computed geometric summaries behave as expected. Let's do the climb \"Ooo La La\" from notebook two.\n",
+    "Before extracting features for the entire dataset, we inspect one representative climb to confirm that the parsing logic and the computed geometric summaries behave as expected. Let's do the climb \"Anna Got Me Clickin'\" from notebook two.\n",
    "\n",
    "![Anna Got Me Clickin](../images/02_hold_stats/Anna_Got_Me_Clickin.png)\n"
   ]
@@ -863,7 +514,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "extract_features(df_climbs.iloc[10000], placement_coords, df_placements)"
+    "extract_features(df_climbs.iloc[10000], placement_coords)"
   ]
  },
  {
@@ -902,7 +553,7 @@
    "feature_list = []\n",
    "\n",
    "for idx, row in tqdm(df_climbs.iterrows(), total=len(df_climbs)):\n",
-    "    features = extract_features(row, placement_coords, df_hold_difficulty)\n",
+    "    features = extract_features(row, placement_coords)\n",
    "    if features:\n",
    "        features['climb_uuid'] = row['uuid']\n",
    "        features['display_difficulty'] = row['display_difficulty']\n",
@@ -997,22 +648,37 @@
    "fig, axes = plt.subplots(4, 4, figsize=(16, 16))\n",
    "\n",
    "key_features = [\n",
+    "    # Core driver\n",
    "    'angle',\n",
+    "\n",
+    "    # Basic structure\n",
    "    'total_holds',\n",
    "    'height_gained',\n",
-    "    'mean_hold_difficulty',\n",
-    "    'max_hold_difficulty',\n",
-    "    'mean_hand_reach',\n",
+    "\n",
+    "    # Density / compactness\n",
    "    'hold_density',\n",
+    "    'holds_per_vertical_foot',\n",
+    "\n",
+    "    # Hand geometry (very important)\n",
+    "    'mean_hand_reach',\n",
+    "    'max_hand_reach',\n",
+    "    'std_hand_reach',\n",
+    "\n",
+    "    # Hand-foot interaction\n",
+    "    'mean_hand_to_foot',\n",
+    "    'std_hand_to_foot',\n",
+    "\n",
+    "    # Spatial layout\n",
    "    'symmetry_score',\n",
-    "    'is_nomatch',\n",
+    "    'upper_ratio',\n",
+    "\n",
+    "    # Global geometry\n",
    "    'convex_hull_area',\n",
-    "    'difficulty_progression',\n",
-    "    'mean_y_normalized',\n",
-    "    'clustering_ratio',\n",
-    "    'path_efficiency',\n",
-    "    'max_difficulty_jump',\n",
-    "    'difficulty_weighted_reach'\n",
+    "    'hull_area_to_bbox_ratio',\n",
+    "\n",
+    "    # Path / flow\n",
+    "    'path_length_vertical',\n",
+    "    'path_efficiency'\n",
    "]\n",
    "\n",
    "for ax, feature in zip(axes.flat, key_features):\n",
@@ -1042,12 +708,8 @@
    "\n",
    "# Angle interactions\n",
    "df_features['angle_x_holds'] = df_features['angle'] * df_features['total_holds']\n",
-    "df_features['angle_x_difficulty'] = df_features['angle'] * df_features['mean_hold_difficulty'].fillna(0)\n",
    "df_features['angle_squared'] = df_features['angle'] ** 2\n",
    "\n",
-    "# Difficulty interactions\n",
-    "df_features['difficulty_x_height'] = df_features['mean_hold_difficulty'].fillna(0) * df_features['height_gained']\n",
-    "df_features['difficulty_x_density'] = df_features['mean_hold_difficulty'].fillna(0) * df_features['hold_density']\n",
    "\n",
    "# Complexity features\n",
    "df_features['complexity_score'] = (\n",
@@ -1056,9 +718,6 @@
    "    df_features['hold_density']\n",
    ")\n",
    "\n",
-    "# Geometric × difficulty\n",
-    "df_features['hull_area_x_difficulty'] = df_features['convex_hull_area'].fillna(0) * df_features['mean_hold_difficulty'].fillna(0)\n",
-    "\n",
    "print(f\"Added interaction features. Total columns: {len(df_features.columns)}\")"
   ]
  },
@@ -1081,23 +740,6 @@
    "print(\"### Columns with Missing Values\\n\")\n",
    "display(missing_cols.to_frame('missing'))\n",
    "\n",
-    "# Fill difficulty NaNs with column mean\n",
-    "difficulty_cols = [c for c in df_features.columns if 'difficulty' in c.lower()]\n",
-    "for col in difficulty_cols:\n",
-    "    if df_features[col].isna().any():\n",
-    "        df_features[col] = df_features[col].fillna(df_features[col].mean())\n",
-    "\n",
-    "# Fill start/finish height with min_y/max_y if missing\n",
-    "df_features['start_height'] = df_features['start_height'].fillna(df_features['min_y'])\n",
-    "df_features['finish_height'] = df_features['finish_height'].fillna(df_features['max_y'])\n",
-    "\n",
-    "# Fill normalized features\n",
-    "df_features['start_height_normalized'] = df_features['start_height_normalized'].fillna(\n",
-    "    (df_features['start_height'] - y_min) / board_height\n",
-    ")\n",
-    "df_features['finish_height_normalized'] = df_features['finish_height_normalized'].fillna(\n",
-    "    (df_features['finish_height'] - y_min) / board_height\n",
-    ")\n",
    "\n",
    "# Fill other NaNs with column means\n",
    "for col in df_features.columns:\n",
@@ -1206,8 +848,7 @@
    "print(\"\"\"\\nInterpretation:\n",
    "- Each row is a climb-angle observation.\n",
    "- The target is `display_difficulty`.\n",
-    "- The predictors combine geometry, hold statistics, and aggregate difficulty information.\n",
-    "- Hold-difficulty-based features use Bayesian-smoothed hold scores from Notebook 03.\n",
+    "- The predictors combine geometry and structure\n",
    "- The next notebook tests how much predictive signal these engineered features actually contain.\n",
    "\"\"\")\n"
   ]
@@ -0,0 +1,700 @@
+import re
+from pathlib import Path
+
+import joblib
+import numpy as np
+import pandas as pd
+from scipy.spatial import ConvexHull
+from scipy.spatial.distance import pdist, squareform
+
+try:
+    import torch
+    import torch.nn as nn
+    TORCH_AVAILABLE = True
+except ImportError:
+    TORCH_AVAILABLE = False
+
+
+# ============================================================
+# Paths
+# ============================================================
+
+ROOT = Path(__file__).resolve().parents[1]
+
+SCALER_PATH = ROOT / "models" / "feature_scaler.pkl"
+FEATURE_NAMES_PATH = ROOT / "models" / "feature_names.txt"
+PLACEMENTS_PATH = ROOT / "data" / "placements.csv"  # adjust if needed
+
+
+# ============================================================
+# Model registry
+# ============================================================
+
+MODEL_REGISTRY = {
+    "linear": {
+        "path": ROOT / "models" / "linear_regression.pkl",
+        "kind": "sklearn",
+        "needs_scaling": True,
+    },
+    "ridge": {
+        "path": ROOT / "models" / "ridge_regression.pkl",
+        "kind": "sklearn",
+        "needs_scaling": True,
+    },
+    "lasso": {
+        "path": ROOT / "models" / "lasso_regression.pkl",
+        "kind": "sklearn",
+        "needs_scaling": True,
+    },
+    "random_forest": {
+        "path": ROOT / "models" / "random_forest_tuned.pkl",
+        "kind": "sklearn",
+        "needs_scaling": False,
+    },
+    "nn_best": {
+        "path": ROOT / "models" / "neural_network_best.pth",
+        "kind": "torch_checkpoint",
+        "needs_scaling": True,
+    },
+}
+
+DEFAULT_MODEL = "random_forest"
+
+
+# ============================================================
+# Board constants
+# Adjust if your board coordinate system differs
+# ============================================================
+
+x_min, x_max = -24, 168
+y_min, y_max = 0, 156
+board_width = x_max - x_min
+board_height = y_max - y_min
+
+
+# ============================================================
+# Role mappings
+# ============================================================
+
+HAND_ROLE_IDS = {12, 13, 14}
+FOOT_ROLE_IDS = {15}
+
+
+def get_role_type(role_id: int) -> str:
+    mapping = {
+        12: "start",
+        13: "middle",
+        14: "finish",
+        15: "foot",
+    }
+    return mapping.get(role_id, "middle")
+
+
+# ============================================================
+# Grade map
+# ============================================================
+
+grade_map = {
+    10: '4a/V0',
+    11: '4b/V0',
+    12: '4c/V0',
+    13: '5a/V1',
+    14: '5b/V1',
+    15: '5c/V2',
+    16: '6a/V3',
+    17: '6a+/V3',
+    18: '6b/V4',
+    19: '6b+/V4',
+    20: '6c/V5',
+    21: '6c+/V5',
+    22: '7a/V6',
+    23: '7a+/V7',
+    24: '7b/V8',
+    25: '7b+/V8',
+    26: '7c/V9',
+    27: '7c+/V10',
+    28: '8a/V11',
+    29: '8a+/V12',
+    30: '8b/V13',
+    31: '8b+/V14',
+    32: '8c/V15',
+    33: '8c+/V16'
+}
+
+MIN_GRADE = min(grade_map)
+MAX_GRADE = max(grade_map)
+
+
+# ============================================================
+# Neural network architecture from Notebook 06
+# ============================================================
+
+if TORCH_AVAILABLE:
+    class ClimbGradePredictor(nn.Module):
+        def __init__(self, input_dim, hidden_layers=None, dropout_rate=0.2):
+            super().__init__()
+
+            if hidden_layers is None:
+                hidden_layers = [256, 128, 64]
+
+            layers = []
+            prev_dim = input_dim
+
+            for hidden_dim in hidden_layers:
+                layers.append(nn.Linear(prev_dim, hidden_dim))
+                layers.append(nn.BatchNorm1d(hidden_dim))
+                layers.append(nn.ReLU())
+                layers.append(nn.Dropout(dropout_rate))
+                prev_dim = hidden_dim
+
+            layers.append(nn.Linear(prev_dim, 1))
+            self.network = nn.Sequential(*layers)
+
+        def forward(self, x):
+            return self.network(x)
+
+
+# ============================================================
+# Load shared artifacts
+# ============================================================
+
+scaler = joblib.load(SCALER_PATH)
+
+with open(FEATURE_NAMES_PATH, "r") as f:
+    FEATURE_NAMES = [line.strip() for line in f if line.strip()]
+
+df_placements = pd.read_csv(PLACEMENTS_PATH)
+
+placement_coords = {
+    int(row["placement_id"]): (row["x"], row["y"])
+    for _, row in df_placements.iterrows()
+}
+
+
+# ============================================================
+# Model loading
+# ============================================================
+
+_MODEL_CACHE = {}
+
+
+def normalize_model_name(model_name: str) -> str:
+    if model_name == "nn":
+        return "nn_best"
+    return model_name
+
+
+def load_model(model_name=DEFAULT_MODEL):
+    model_name = normalize_model_name(model_name)
+
+    if model_name not in MODEL_REGISTRY:
+        raise ValueError(
+            f"Unknown model '{model_name}'. Choose from: {list(MODEL_REGISTRY.keys()) + ['nn']}"
+        )
+
+    if model_name in _MODEL_CACHE:
+        return _MODEL_CACHE[model_name]
+
+    info = MODEL_REGISTRY[model_name]
+    path = info["path"]
+
+    if info["kind"] == "sklearn":
+        model = joblib.load(path)
+
+    elif info["kind"] == "torch_checkpoint":
+        if not TORCH_AVAILABLE:
+            raise ImportError("PyTorch is not installed, so the neural network model cannot be used.")
+
+        checkpoint = torch.load(path, map_location="cpu")
+
+        if hasattr(checkpoint, "eval"):
+            model = checkpoint
+            model.eval()
+
+        elif isinstance(checkpoint, dict):
+            input_dim = checkpoint.get("input_dim", len(FEATURE_NAMES))
+            hidden_layers = checkpoint.get("hidden_layers", [256, 128, 64])
+            dropout_rate = checkpoint.get("dropout_rate", 0.2)
+
+            model = ClimbGradePredictor(
+                input_dim=input_dim,
+                hidden_layers=hidden_layers,
+                dropout_rate=dropout_rate,
+            )
+
+            if "model_state_dict" in checkpoint:
+                model.load_state_dict(checkpoint["model_state_dict"])
+            else:
+                model.load_state_dict(checkpoint)
+
+            model.eval()
+
+        else:
+            raise RuntimeError(
+                f"Unsupported checkpoint type for {model_name}: {type(checkpoint)}"
+            )
+
+    else:
+        raise ValueError(f"Unsupported model kind: {info['kind']}")
+
+    _MODEL_CACHE[model_name] = model
+    return model
+
+
+# ============================================================
+# Helpers
+# ============================================================
+
+def parse_frames(frames: str):
+    """
+    Parse strings like:
+        p304r8p378r6p552r6
+    into:
+        [(304, 8), (378, 6), (552, 6)]
+    """
+    if not isinstance(frames, str) or not frames.strip():
+        return []
+    matches = re.findall(r"p(\d+)r(\d+)", frames)
+    return [(int(p), int(r)) for p, r in matches]
+
+
+# ============================================================
+# Feature extraction
+# ============================================================
+
+def extract_features_from_raw(angle, frames, is_nomatch=0, description=""):
+    """
+    Extract the clean, leakage-free feature set used by the updated models.
+    """
+    holds = parse_frames(frames)
+    if not holds:
+        raise ValueError("Could not parse any holds from frames.")
+
+    hold_data = []
+    for placement_id, role_id in holds:
+        coords = placement_coords.get(placement_id, (None, None))
+        if coords[0] is None:
+            continue
+
+        role_type = get_role_type(role_id)
+        is_hand_role = role_id in HAND_ROLE_IDS
+        is_foot_role = role_id in FOOT_ROLE_IDS
+
+        hold_data.append({
+            "placement_id": placement_id,
+            "x": coords[0],
+            "y": coords[1],
+            "role_type": role_type,
+            "is_hand": is_hand_role,
+            "is_foot": is_foot_role,
+        })
+
+    if not hold_data:
+        raise ValueError("No valid holds found after parsing frames.")
+
+    df_holds = pd.DataFrame(hold_data)
+
+    hand_holds = df_holds[df_holds["is_hand"]]
+    foot_holds = df_holds[df_holds["is_foot"]]
+    start_holds = df_holds[df_holds["role_type"] == "start"]
+    finish_holds = df_holds[df_holds["role_type"] == "finish"]
+    middle_holds = df_holds[df_holds["role_type"] == "middle"]
+
+    xs = df_holds["x"].to_numpy()
+    ys = df_holds["y"].to_numpy()
+
+    desc = str(description) if description is not None else ""
+    if pd.isna(desc):
+        desc = ""
+
+    center_x = (x_min + x_max) / 2
+    features = {}
+
+    # Core / counts
+    features["angle"] = float(angle)
+    features["angle_squared"] = float(angle) ** 2
+    features["total_holds"] = int(len(df_holds))
+    features["hand_holds"] = int(len(hand_holds))
+    features["foot_holds"] = int(len(foot_holds))
+    features["start_holds"] = int(len(start_holds))
+    features["finish_holds"] = int(len(finish_holds))
+    features["middle_holds"] = int(len(middle_holds))
+    features["is_nomatch"] = int(
+        (is_nomatch == 1) or
+        bool(re.search(r"\bno\s*match(ing)?\b", desc, flags=re.IGNORECASE))
+    )
+
+    # Spatial
+    features["mean_y"] = float(np.mean(ys))
+    features["std_x"] = float(np.std(xs)) if len(xs) > 1 else 0.0
+    features["std_y"] = float(np.std(ys)) if len(ys) > 1 else 0.0
+    features["range_x"] = float(np.max(xs) - np.min(xs))
+    features["range_y"] = float(np.max(ys) - np.min(ys))
+    features["min_y"] = float(np.min(ys))
+    features["max_y"] = float(np.max(ys))
+    features["height_gained"] = features["max_y"] - features["min_y"]
+
+    start_height = float(start_holds["y"].mean()) if len(start_holds) > 0 else np.nan
+    finish_height = float(finish_holds["y"].mean()) if len(finish_holds) > 0 else np.nan
+    features["height_gained_start_finish"] = (
+        finish_height - start_height
+        if pd.notna(start_height) and pd.notna(finish_height)
+        else np.nan
+    )
+
+    # Density / symmetry
+    bbox_area = features["range_x"] * features["range_y"]
+    features["bbox_area"] = float(bbox_area)
+    features["hold_density"] = float(features["total_holds"] / bbox_area) if bbox_area > 0 else 0.0
+    features["holds_per_vertical_foot"] = float(features["total_holds"] / max(features["range_y"], 1))
+
+    left_holds = int((df_holds["x"] < center_x).sum())
+    features["left_ratio"] = left_holds / features["total_holds"] if features["total_holds"] > 0 else 0.5
+    features["symmetry_score"] = 1 - abs(features["left_ratio"] - 0.5) * 2
+
+    y_median = np.median(ys)
+    upper_holds = int((df_holds["y"] > y_median).sum())
+    features["upper_ratio"] = upper_holds / features["total_holds"]
+
+    # Hand reach
+    if len(hand_holds) >= 2:
+        hand_points = hand_holds[["x", "y"]].to_numpy()
+        hand_distances = pdist(hand_points)
+        hand_xs = hand_holds["x"].to_numpy()
+        hand_ys = hand_holds["y"].to_numpy()
+
+        features["mean_hand_reach"] = float(np.mean(hand_distances))
+        features["max_hand_reach"] = float(np.max(hand_distances))
+        features["std_hand_reach"] = float(np.std(hand_distances))
+        features["hand_spread_x"] = float(hand_xs.max() - hand_xs.min())
+        features["hand_spread_y"] = float(hand_ys.max() - hand_ys.min())
+    else:
+        features["mean_hand_reach"] = 0.0
+        features["max_hand_reach"] = 0.0
+        features["std_hand_reach"] = 0.0
+        features["hand_spread_x"] = 0.0
+        features["hand_spread_y"] = 0.0
+
+    # Hand-foot distances
+    if len(hand_holds) > 0 and len(foot_holds) > 0:
+        hand_points = hand_holds[["x", "y"]].to_numpy()
+        foot_points = foot_holds[["x", "y"]].to_numpy()
+        dists = []
+        for hx, hy in hand_points:
+            for fx, fy in foot_points:
+                dists.append(np.sqrt((hx - fx) ** 2 + (hy - fy) ** 2))
+        dists = np.asarray(dists, dtype=float)
+
+        features["min_hand_to_foot"] = float(np.min(dists))
+        features["mean_hand_to_foot"] = float(np.mean(dists))
+        features["std_hand_to_foot"] = float(np.std(dists))
+    else:
+        features["min_hand_to_foot"] = 0.0
+        features["mean_hand_to_foot"] = 0.0
+        features["std_hand_to_foot"] = 0.0
+
+    # Global geometry
+    points = np.column_stack([xs, ys])
+
+    if len(df_holds) >= 3:
+        try:
+            hull = ConvexHull(points)
+            features["convex_hull_area"] = float(hull.volume)
+            features["hull_area_to_bbox_ratio"] = float(features["convex_hull_area"] / max(bbox_area, 1))
+        except Exception:
+            features["convex_hull_area"] = np.nan
+            features["hull_area_to_bbox_ratio"] = np.nan
+    else:
+        features["convex_hull_area"] = 0.0
+        features["hull_area_to_bbox_ratio"] = 0.0
+
+    if len(df_holds) >= 2:
+        pairwise = pdist(points)
+        features["mean_pairwise_distance"] = float(np.mean(pairwise))
+        features["std_pairwise_distance"] = float(np.std(pairwise))
+    else:
+        features["mean_pairwise_distance"] = 0.0
+        features["std_pairwise_distance"] = 0.0
+
+    if len(df_holds) >= 2:
+        sorted_idx = np.argsort(ys)
+        sorted_points = points[sorted_idx]
+        path_length = 0.0
+        for i in range(len(sorted_points) - 1):
+            dx = sorted_points[i + 1, 0] - sorted_points[i, 0]
+            dy = sorted_points[i + 1, 1] - sorted_points[i, 1]
+            path_length += np.sqrt(dx ** 2 + dy ** 2)
+
+        features["path_length_vertical"] = float(path_length)
+        features["path_efficiency"] = float(features["height_gained"] / max(path_length, 1))
+    else:
+        features["path_length_vertical"] = 0.0
+        features["path_efficiency"] = 0.0
+
+    # Normalized / relative
+    features["mean_y_normalized"] = float((features["mean_y"] - y_min) / board_height)
+    features["start_height_normalized"] = float((start_height - y_min) / board_height) if pd.notna(start_height) else np.nan
+    features["finish_height_normalized"] = float((finish_height - y_min) / board_height) if pd.notna(finish_height) else np.nan
+    features["mean_y_relative_to_start"] = float(features["mean_y"] - start_height) if pd.notna(start_height) else np.nan
+    features["spread_x_normalized"] = float(features["range_x"] / board_width)
+    features["spread_y_normalized"] = float(features["range_y"] / board_height)
+
+    y_q75 = np.percentile(ys, 75)
+    y_q25 = np.percentile(ys, 25)
+    features["y_q75"] = float(y_q75)
+    features["y_iqr"] = float(y_q75 - y_q25)
+
+    # Engineered clean features
+    features["complexity_score"] = float(
+        features["mean_hand_reach"]
+        * np.log1p(features["total_holds"])
+        * (1 + features["hold_density"])
+    )
+    features["angle_x_holds"] = float(features["angle"] * features["total_holds"])
+
+    return features
+
+
+# ============================================================
+# Model input preparation
+# ============================================================
+
+def prepare_feature_vector(features: dict) -> pd.DataFrame:
+    row = {}
+    for col in FEATURE_NAMES:
+        value = features.get(col, 0.0)
+        row[col] = 0.0 if pd.isna(value) else value
+    return pd.DataFrame([row], columns=FEATURE_NAMES)
+
+
+# ============================================================
+# Prediction helpers
+# ============================================================
+
+def format_prediction(pred: float):
+    rounded = int(round(pred))
+    rounded = max(min(rounded, MAX_GRADE), MIN_GRADE)
+
+    return {
+        "predicted_numeric": float(pred),
+        "predicted_display_difficulty": rounded,
+        "predicted_boulder_grade": grade_map[rounded],
+    }
+
+
+def predict_with_model(model, X: pd.DataFrame, model_name: str):
+    model_name = normalize_model_name(model_name)
+    info = MODEL_REGISTRY[model_name]
+
+    if info["kind"] == "sklearn":
+        X_input = scaler.transform(X) if info["needs_scaling"] else X
+        pred = model.predict(X_input)[0]
+        return float(pred)
+
+    if info["kind"] == "torch_checkpoint":
+        if not TORCH_AVAILABLE:
+            raise ImportError("PyTorch is not installed.")
+
+        X_input = scaler.transform(X) if info["needs_scaling"] else X
+        X_tensor = torch.tensor(np.asarray(X_input), dtype=torch.float32)
+
+        with torch.no_grad():
+            out = model(X_tensor)
+
+        if isinstance(out, tuple):
+            out = out[0]
+
+        pred = np.asarray(out).reshape(-1)[0]
+        return float(pred)
+
+    raise ValueError(f"Unsupported model kind: {info['kind']}")
+
+
+# ============================================================
+# Public API
+# ============================================================
+
+def predict(
+    angle,
+    frames,
+    is_nomatch=0,
+    description="",
+    model_name=DEFAULT_MODEL,
+    return_numeric=False,
+    debug=False,
+):
+    model_name = normalize_model_name(model_name)
+    model = load_model(model_name)
+
+    features = extract_features_from_raw(
+        angle=angle,
+        frames=frames,
+        is_nomatch=is_nomatch,
+        description=description,
+    )
+
+    X = prepare_feature_vector(features)
+
+    if debug:
+        print("\nNonzero / non-null feature values:")
+        for col, val in X.iloc[0].items():
+            if pd.notna(val) and val != 0:
+                print(f"{col}: {val}")
+
+    pred = predict_with_model(model, X, model_name=model_name)
+
+    if return_numeric:
+        return float(pred)
+
+    result = format_prediction(pred)
+    result["model"] = model_name
+    return result
+
+
+def predict_csv(
+    input_csv,
+    output_csv=None,
+    model_name=DEFAULT_MODEL,
+    angle_col="angle",
+    frames_col="frames",
+    is_nomatch_col="is_nomatch",
+    description_col="description",
+):
+    """
+    Batch prediction over a CSV file.
+
+    Required columns:
+        - angle
+        - frames
+
+    Optional columns:
+        - is_nomatch
+        - description
+    """
+    model_name = normalize_model_name(model_name)
+
+    df = pd.read_csv(input_csv)
+
+    if angle_col not in df.columns:
+        raise ValueError(f"Missing required column: '{angle_col}'")
+    if frames_col not in df.columns:
+        raise ValueError(f"Missing required column: '{frames_col}'")
+
+    results = []
+
+    for _, row in df.iterrows():
+        angle = row[angle_col]
+        frames = row[frames_col]
+        is_nomatch = row[is_nomatch_col] if is_nomatch_col in df.columns and pd.notna(row[is_nomatch_col]) else 0
+        description = row[description_col] if description_col in df.columns and pd.notna(row[description_col]) else ""
+
+        pred = predict(
+            angle=angle,
+            frames=frames,
+            is_nomatch=is_nomatch,
+            description=description,
+            model_name=model_name,
+            return_numeric=False,
+            debug=False,
+        )
+
+        results.append(pred)
+
+    pred_df = pd.DataFrame(results)
+    out = pd.concat([df.reset_index(drop=True), pred_df.reset_index(drop=True)], axis=1)
+
+    if output_csv is not None:
+        out.to_csv(output_csv, index=False)
+
+    return out
+
+
+def evaluate_predictions(df, true_col="display_difficulty", pred_col="predicted_numeric"):
+    """
+    Simple evaluation summary for labeled batch predictions.
+    """
+    if true_col not in df.columns:
+        raise ValueError(f"Missing true target column: '{true_col}'")
+    if pred_col not in df.columns:
+        raise ValueError(f"Missing prediction column: '{pred_col}'")
+
+    y_true = df[true_col].astype(float)
+    y_pred = df[pred_col].astype(float)
+
+    mae = np.mean(np.abs(y_true - y_pred))
+    rmse = np.sqrt(np.mean((y_true - y_pred) ** 2))
+    within_1 = np.mean(np.abs(y_true - y_pred) <= 1)
+    within_2 = np.mean(np.abs(y_true - y_pred) <= 2)
+
+    return {
+        "mae": float(mae),
+        "rmse": float(rmse),
+        "within_1": float(within_1),
+        "within_2": float(within_2),
+    }
+
+
+# ============================================================
+# CLI
+# ============================================================
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+
+    # Single prediction mode
+    parser.add_argument("--angle", type=int)
+    parser.add_argument("--frames", type=str)
+    parser.add_argument("--is_nomatch", type=int, default=0)
+    parser.add_argument("--description", type=str, default="")
+
+    # Batch mode
+    parser.add_argument("--input_csv", type=str)
+    parser.add_argument("--output_csv", type=str)
+
+    parser.add_argument(
+        "--model",
+        type=str,
+        default=DEFAULT_MODEL,
+        choices=list(MODEL_REGISTRY.keys()) + ["nn"],
+        help="Which trained model to use",
+    )
+    parser.add_argument("--numeric", action="store_true")
+    parser.add_argument("--debug", action="store_true")
+    parser.add_argument("--evaluate", action="store_true")
+
+    args = parser.parse_args()
+
+    if args.input_csv:
+        df_out = predict_csv(
+            input_csv=args.input_csv,
+            output_csv=args.output_csv,
+            model_name=args.model,
+        )
+
+        print(df_out.head())
+
+        if args.evaluate:
+            try:
+                metrics = evaluate_predictions(df_out)
+                print("\nEvaluation:")
+                for k, v in metrics.items():
+                    print(f"{k}: {v:.4f}")
+            except Exception as e:
+                print(f"\nCould not evaluate predictions: {e}")
+
+    else:
+        if args.angle is None or args.frames is None:
+            raise ValueError("For single prediction, you must provide --angle and --frames")
+
+        pred = predict(
+            angle=args.angle,
+            frames=args.frames,
+            is_nomatch=args.is_nomatch,
+            description=args.description,
+            model_name=args.model,
+            return_numeric=args.numeric,
+            debug=args.debug,
+        )
+        print(pred)