fixed leakage

2026-03-28 16:03:04 -04:00
parent 880272aaf5
commit 3ab9b77bb7
36 changed files with 2296 additions and 681 deletions
@@ -1,6 +1,6 @@
 # Kilter Board: Predicting Climbing Route Difficulty from Board Data

-I recently got into *board climbing*, and I enjoy climbing on the TB2 and the Kilter Board. I've been climbing on the 12ftx12ft (mirrored) that is available at my local gym, and I've never felt that the phrase "*it hurts so good*" would be so apt. As such, I've done an in depth analysis of TB2 data <a href="https://gitlab.com/psark/Tension-Board-2-Analysis">here</a>, and have decided to mimic that analysis with available Kilter Board data.
+I recently got into *board climbing*, and I enjoy climbing on the TB2 and the Kilter Board. I've been climbing on 12x12ft boards that are available at my local gym, and I've never felt that the phrase "*it hurts so good*" would be so apt. As such, I've done an in depth analysis of TB2 data <a href="https://gitlab.com/psark/Tension-Board-2-Analysis">here</a>, and have decided to mimic that analysis with available Kilter Board data.

 ![Hold Usage Heatmap](images/02_hold_stats/all_holds_all_grades_heatmap.png)

@@ -11,12 +11,12 @@ I recently got into *board climbing*, and I enjoy climbing on the TB2 and the Ki

 ## Overview

-This project analyzes ~300,000 climbs from the Kilter Board in order to do the following.
+This project analyzes ~300,000 climbs on the Kilter Board in order to do the following.
 > 1. **Understand** hold usage patterns and difficulty distributions
 > 2. **Quantify** empircal hold difficulty scores
 > 3. **Predict** climb grades from hold positions and board angle

-Climbing grades are inherently subjective. Different climbers use different beta, setters have different grading standards, and difficulty depends on factors not always captured in data. What makes it harder in the case of the board climbing is that the grade is displayed almost democratically -- it is determined by user input. 
+Climbing grades are inherently subjective. Different climbers use different beta, setters have different grading standards, and difficulty depends on factors not always captured in data. Moreover, on the boards, the displayed grade for any specific climb is based on user input.

 Using a Kilter Board dataset, this project combines:

@@ -153,8 +153,7 @@ Beyond structural analysis, we can also study how board-climbers behave over tim
 ![Hold Difficulty](images/03_hold_difficulty/difficulty_hand_40deg.png)

 * Hold difficulty is estimated from climb data
-* We averaged (pre-role/per-angle) difficulty for each hold (with Bayesian smoothing)
-* Took advantage of the mirrored layout to increase the amount of data per hold
+* We averaged (per-role/per-angle) difficulty for each hold (with Bayesian smoothing)

 ### Key technique: Bayesian smoothing

@@ -167,7 +166,7 @@ This significantly improves downstream feature quality.
 
 ---

-## 6. Many more!
+## 6. Many more

 There are many other statistics, see notebooks [`01`](notebooks/01_data_overview_and_climbing_statistics.ipynb) (climbing statistics), [`02`](notebooks/02_hold_analysis_and_board_heatmaps.ipynb) (climbing hold statistics), and [`03`](notebooks/03_hold_difficulty.ipynb) (hold difficulty). Included are:

@@ -188,35 +187,58 @@ This section focuses on **building predictive models and evaluating performance*

 ---

-## 7. Feature Engineering
+Features are constructed at the climb level using only **structural and geometric information** derived from the climb definition (`angle` and `frames`).

-Features are constructed at the climb level using:
+We explicitly avoid using hold-difficulty-derived features in the predictive models to prevent target leakage.

-* geometry (height, spread, convex hull)
-* structure (number of moves, clustering)
-* hold difficulty (smoothed estimates)
-* interaction features
+Feature categories include:
+
+* **Geometry** — spatial footprint of the climb (height, spread, convex hull)
+* **Movement** — reach distances and spatial relationships between holds
+* **Density** — how tightly or sparsely holds are arranged
+* **Symmetry** — left/right balance and distribution
+* **Path structure** — approximations of movement flow and efficiency
+* **Normalized position** — relative positioning on the board
+* **Interaction features** — simple nonlinear combinations (e.g., angle × hold count)
+
+This results in a **leakage-free feature set** that better reflects the physical structure of climbing.


-| Category      | Description                       | Examples                                    |
-| ------------- | --------------------------------- | ------------------------------------------- |
-| Geometry      | Shape and size of climb           | bbox_area, range_x, range_y                 |
-| Movement      | Reach and movement complexity     | max_hand_reach, path_efficiency             |
-| Difficulty    | Hold-based difficulty metrics     | mean_hold_difficulty, max_hold_difficulty   |
-| Progression   | How difficulty changes over climb | difficulty_gradient, difficulty_progression |
-| Symmetry      | Left/right balance                | symmetry_score, hand_symmetry               |
-| Clustering    | Local density of holds            | mean_neighbors_12in                         |
-| Normalization | Relative board positioning        | mean_y_normalized                           |
-| Distribution  | Vertical distribution of holds    | y_q25, y_q75                                |
+| Category      | Description                              | Examples                                  |
+| ------------- | ---------------------------------------- | ----------------------------------------- |
+| Geometry      | Shape and size of climb                  | bbox_area, range_x, range_y               |
+| Movement      | Reach and movement structure             | mean_hand_reach, path_efficiency          |
+| Density       | Hold spacing and compactness             | hold_density, holds_per_vertical_foot     |
+| Symmetry      | Left/right balance                       | symmetry_score, left_ratio                |
+| Path          | Approximate movement trajectory          | path_length_vertical                      |
+| Position      | Relative board positioning               | mean_y_normalized, start_height_normalized|
+| Distribution  | Vertical distribution of holds           | y_q75, y_iqr                              |
+| Interaction   | Nonlinear feature combinations           | angle_squared, angle_x_holds              |

 ### Important design decision

 The dataset is restricted to:

-> **climbs with angle ≤ 50°**
+> **climbs with angle ≤ 55°**

 to reduce variability and improve consistency. (see [Angle vs Difficulty](#3-angle-vs-difficulty), where average climb grade seems to stabilize or get lower over 50°)

+###
+
+### Important: Leakage and Feature Design
+
+Earlier iterations of this project included features derived from hold difficulty scores (computed from climb grades). While these features slightly improved predictive performance, they introduce a form of **target leakage** if computed globally.
+
+In this version of the project:
+
+* Hold difficulty scores are still computed in Notebook 03 for **exploratory analysis**
+* Predictive models (Notebooks 04–06) use only **leakage-free features**
+* No feature is derived from the target variable (`display_difficulty`)
+
+This allows the model to learn from the **structure of climbs themselves**, rather than from aggregated statistics of the labels.
+
+Note: Hold-difficulty-based features can still be valid in a production setting if computed strictly from historical (training) data, similar to target encoding techniques.
+
 ---

 ## 8. Feature Relationships
@@ -226,10 +248,10 @@ Here are some relationships between features and difficulty
 ![Correlation Heatmap](images/04_climb_features/feature_correlations.png)

 * higher angles allow for harder difficulties
-* hold difficulty features seem to correlate the most to difficulty
-* engineered features capture non-trivial structure
+* distance between holds seems to correlate with difficulty
+* geometric and structural features capture non-trivial climbing patterns

-We have a full feature list in [`data/04_climb_features/feature_list.txt`](data/04_climb_features/feature_list.txt). Explanations are available in [`data/04_climb_features/feature_list_explanations.txt`](data/04_climb_features/feature_explanations.txt).
+We have a full feature list in [`data/04_climb_features/feature_list.txt`](data/04_climb_features/feature_list.txt). Explanations are available in [`data/04_climb_features/feature_explanations.txt`](data/04_climb_features/feature_explanations.txt).

 ---

@@ -253,22 +275,28 @@ Models tested:

 Key drivers:

-* hold difficulty
 * wall angle
-* structural features
+* reach-based features (e.g., mean/max hand reach)
+* spatial density and distribution
+* geometric structure of the climb
+
+This confirms that **difficulty is strongly tied to spatial arrangement and movement constraints**, rather than just individual hold properties.

 ---

 ## 10. Model Performance

-![RF redicted vs Actual](images/05_predictive_modelling/random_forest_predictions.png)
+![RF Predicted vs Actual](images/05_predictive_modelling/random_forest_predictions.png)
 ![NN Predicted vs Actual](images/06_deep_learning/neural_network_predictions.png)

-### Results (in terms of difficulty score)
+### Results (in terms of V-grade)
 Both the RF and NN models performed similarly.
-* **~83% within ±1 V-grade (~45% within ±1 difficulty score)**
-* **~96% within ±2 V-grade (~80% within ±2 difficulty scores)**
+* **~70% within ±1 V-grade (~36% within ±1 difficulty score)**
+* **~90% within ±2 V-grade (~65% within ±2 difficulty scores)**

+In earlier experiements, we were able to achieve ~83% within one V-grade and ~96% within 2. However, that setup used hold-difficulties from notebook 03 derived from climbing grades, creating leakage. This result is more realistic and more independent: the model relies purely on spatial and structural information, without access to hold-based information or beta.
+
+This demonstrates that a substantial portion of climbing difficulty can be attributed to geometry and movement constraints. 

 ### Interpretation

@@ -285,15 +313,15 @@ Both the RF and NN models performed similarly.

 | Metric             | Performance |
 | ------------------ | ----------- |
-| Within ±1 V-grade  | ~83%        |
-| Within ±2 V-grades | ~96%        |
+| Within ±1 V-grade  | ~70%        |
+| Within ±2 V-grades | ~90%        |

 The model can still predict subgrades (e.g., V3 contains 6a and 6a+), but it is not as accurate.

 | Metric             | Performance |
 | ------------------ | ----------- |
-| Within ±1 difficulty-grade  | ~45%        |
-| Within ±2 difficulty-grades | ~80%        |
+| Within ±1 difficulty-grade  | ~36%        |
+| Within ±2 difficulty-grades | ~65%        |

 ---

@@ -308,6 +336,7 @@ The model can still predict subgrades (e.g., V3 contains 6a and 6a+), but it is

 # Future Work

+* Unified grade prediction across boards
 * Combined board analysis
 * Test other models
 * Better spatial features