fixed leakage

This commit is contained in:
Pawel Sarkowicz
2026-03-28 12:19:09 -04:00
parent 321fe78105
commit 1530c02961
24 changed files with 8224 additions and 1086 deletions

View File

@@ -1,324 +1,201 @@
Tension Board 2 Feature Engineering Explanation
# Feature Explanations
This document explains the engineered features used for climb difficulty prediction.
## Core Climb Attributes
--------------------------------------------------
1. BASIC STRUCTURE FEATURES
--------------------------------------------------
**angle**
Board angle in degrees. Higher angles correspond to steeper (more overhanging) climbs, which generally increase difficulty.
angle
Wall angle in degrees.
**angle_squared**
Square of the board angle. Captures nonlinear effects of steepness (difficulty often increases faster at higher angles).
total_holds
Total number of holds in the climb.
**display_difficulty**
Target variable: the climbs difficulty rating provided by the dataset.
hand_holds / foot_holds
Number of hand vs foot holds.
**angle_x_holds**
Interaction between angle and number of holds. Captures how steepness and hold count jointly affect difficulty (e.g., many holds on steep terrain vs few holds on slab).
start_holds / finish_holds / middle_holds
Counts of hold roles.
---
## Hold Counts / Composition
--------------------------------------------------
2. MATCHING FEATURE
--------------------------------------------------
**total_holds**
Total number of holds used in the climb.
is_nomatch
Binary feature indicating whether matching is disallowed.
Derived from:
- explicit flag OR
- description text (e.g. “no match”, “no matching”)
**hand_holds**
Number of holds intended for hands.
**foot_holds**
Number of holds intended for feet.
--------------------------------------------------
3. SPATIAL FEATURES
--------------------------------------------------
**start_holds**
Number of starting holds.
mean_x, mean_y
Center of mass of all holds.
**finish_holds**
Number of finishing holds.
std_x, std_y
Spread of holds.
**middle_holds**
Number of intermediate (non-start, non-finish) holds.
range_x, range_y
Width and height of climb.
**is_nomatch**
Binary indicator for whether matching hands on holds is disallowed. No-match climbs tend to be more difficult due to restricted movement options.
min_y, max_y
Lowest and highest holds.
---
height_gained
Total vertical gain.
## Spatial Position (Raw Coordinates)
height_gained_start_finish
Vertical gain from start to finish.
**mean_y**
Average vertical position of all holds. Higher values indicate climbs concentrated toward the top of the board.
**std_x**
Standard deviation of horizontal hold positions. Measures left-right spread.
--------------------------------------------------
4. START / FINISH FEATURES
--------------------------------------------------
**std_y**
Standard deviation of vertical hold positions. Measures vertical dispersion.
start_height, finish_height
Average height of start/finish holds.
**range_x**
Horizontal range (max min x). Indicates how wide the climb is.
start_height_min/max, finish_height_min/max
Range of start/finish positions.
**range_y**
Vertical range (max min y). Indicates how tall the climb is.
**min_y**
Lowest hold position.
--------------------------------------------------
5. BOUNDING BOX FEATURES
--------------------------------------------------
**max_y**
Highest hold position.
bbox_area
Area covered by climb.
**height_gained**
Total vertical distance covered by the climb.
bbox_aspect_ratio
Horizontal vs vertical shape.
**height_gained_start_finish**
Vertical distance between average start and finish holds.
bbox_normalized_area
Relative coverage of board.
---
hold_density
Holds per unit area.
## Density / Coverage
holds_per_vertical_foot
Vertical density.
**bbox_area**
Area of the bounding box containing all holds.
**hold_density**
Number of holds per unit area. Higher density often means more options and potentially easier climbing.
--------------------------------------------------
6. SYMMETRY FEATURES
--------------------------------------------------
**holds_per_vertical_foot**
Number of holds per unit vertical distance. Captures how “ladder-like” a climb is.
left_holds, right_holds
Distribution across board center.
---
left_ratio
Fraction of holds on left.
## Symmetry / Balance
symmetry_score
Symmetry measure (1 = perfectly balanced).
**left_ratio**
Proportion of holds on the left side of the board.
hand_left_ratio, hand_symmetry
Same but for hand holds.
**symmetry_score**
How balanced the climb is left-to-right (1 = perfectly balanced, 0 = fully one-sided).
**upper_ratio**
Fraction of holds located above the median vertical position. Indicates whether the climb is top-heavy.
--------------------------------------------------
7. VERTICAL DISTRIBUTION
--------------------------------------------------
---
upper_holds, lower_holds
Split around median height.
## Hand Geometry (Reach / Movement)
upper_ratio
Proportion of upper holds.
**mean_hand_reach**
Average distance between pairs of hand holds. Proxy for typical move size.
**max_hand_reach**
Maximum distance between hand holds. Captures hardest reach or span.
--------------------------------------------------
8. REACH / DISTANCE FEATURES
--------------------------------------------------
**std_hand_reach**
Variation in hand distances. Measures consistency vs variability of moves.
max_hand_reach, mean_hand_reach, std_hand_reach
Distances between hand holds.
**hand_spread_x**
Horizontal spread of hand holds.
hand_spread_x, hand_spread_y
Spatial extent of hand holds.
**hand_spread_y**
Vertical spread of hand holds.
max_foot_spread, mean_foot_spread
Foot hold spacing.
---
max_hand_to_foot, mean_hand_to_foot
Hand-foot distances.
## HandFoot Interaction
**min_hand_to_foot**
Minimum distance between any hand and foot hold. Indicates tight body positioning.
--------------------------------------------------
9. HOLD DIFFICULTY FEATURES
--------------------------------------------------
**mean_hand_to_foot**
Average distance between hands and feet. Proxy for body extension requirements.
mean_hold_difficulty
Average difficulty of holds.
**std_hand_to_foot**
Variation in hand-foot distances. Measures consistency of body positioning.
max_hold_difficulty / min_hold_difficulty
Extremes.
---
std_hold_difficulty
Variation.
## Global Geometry
median_hold_difficulty
Central tendency.
**convex_hull_area**
Area of the convex hull enclosing all holds. Measures overall spatial footprint.
difficulty_range
Spread.
**hull_area_to_bbox_ratio**
Ratio of convex hull area to bounding box area. Indicates how “filled” or “sparse” the hold distribution is.
mean_hand_difficulty / mean_foot_difficulty
Role-specific difficulty.
**mean_pairwise_distance**
Average distance between all pairs of holds. Global spacing measure.
start_difficulty / finish_difficulty
Entry and exit difficulty.
**std_pairwise_distance**
Variation in distances between holds. Captures clustering vs spread.
---
--------------------------------------------------
10. COMBINED / INTERACTION FEATURES
--------------------------------------------------
## Path / Flow
hand_foot_ratio
Balance of hands vs feet.
**path_length_vertical**
Approximate total path length when moving from bottom to top (based on sorted vertical positions).
movement_density
Holds per vertical distance.
**path_efficiency**
Ratio of vertical gain to path length. Higher values indicate more direct movement; lower values indicate more traversing or inefficiency.
weighted_difficulty
Height-weighted difficulty.
---
difficulty_gradient
Difference between start and finish difficulty.
## Normalized / Relative Position
**mean_y_normalized**
Average vertical position scaled to board height (01).
--------------------------------------------------
11. SHAPE / GEOMETRY FEATURES
--------------------------------------------------
**start_height_normalized**
Start hold height relative to board height.
convex_hull_area
Area of convex hull around holds.
**finish_height_normalized**
Finish hold height relative to board height.
convex_hull_perimeter
Perimeter.
**mean_y_relative_to_start**
Average hold height relative to starting position.
hull_area_to_bbox_ratio
Compactness.
**spread_x_normalized**
Horizontal spread normalized by board width.
**spread_y_normalized**
Vertical spread normalized by board height.
--------------------------------------------------
12. NEAREST-NEIGHBOR FEATURES
--------------------------------------------------
---
min_nn_distance / mean_nn_distance
Spacing between holds.
## Distribution Features
max_nn_distance
Maximum separation.
**y_q75**
75th percentile of hold heights. Indicates where upper holds are concentrated.
std_nn_distance
Spread.
**y_iqr**
Interquartile range (75th 25th percentile) of hold heights. Measures vertical spread excluding extremes.
---
--------------------------------------------------
13. CLUSTERING FEATURES
--------------------------------------------------
## Engineered Feature
mean_neighbors_12in
Average nearby holds within 12 inches.
**complexity_score**
Composite feature combining:
max_neighbors_12in
Max clustering.
* hand reach (movement difficulty),
* number of holds (sequence length),
* hold density (spacing).
clustering_ratio
Normalized clustering.
Designed to capture overall climb complexity in a single metric.
--------------------------------------------------
14. PATH FEATURES
--------------------------------------------------
path_length_vertical
Estimated movement path length.
path_efficiency
Vertical gain vs path length.
--------------------------------------------------
15. REGIONAL DIFFICULTY FEATURES
--------------------------------------------------
lower_region_difficulty
Bottom third difficulty.
middle_region_difficulty
Middle third difficulty.
upper_region_difficulty
Top third difficulty.
difficulty_progression
Change in difficulty from bottom to top.
--------------------------------------------------
16. DIFFICULTY TRANSITIONS
--------------------------------------------------
max_difficulty_jump
Largest jump between moves.
mean_difficulty_jump
Average jump.
difficulty_weighted_reach
Distance weighted by difficulty.
--------------------------------------------------
17. NORMALIZED FEATURES
--------------------------------------------------
mean_x_normalized, mean_y_normalized
Relative board position.
std_x_normalized, std_y_normalized
Normalized spread.
start_height_normalized, finish_height_normalized
Relative heights.
spread_x_normalized, spread_y_normalized
Coverage.
--------------------------------------------------
18. RELATIVE POSITION FEATURES
--------------------------------------------------
start_offset_from_typical
Deviation from typical start height.
finish_offset_from_typical
Deviation from typical finish height.
mean_y_relative_to_start
Average height relative to start.
max_y_relative_to_start
Highest point relative to start.
--------------------------------------------------
19. DISTRIBUTION FEATURES
--------------------------------------------------
y_q25, y_q50, y_q75
Height quartiles.
y_iqr
Spread.
holds_bottom_quartile
Lower density.
holds_top_quartile
Upper density.
--------------------------------------------------
SUMMARY
--------------------------------------------------
These features capture:
- Geometry (shape, spread)
- Movement (reach, density, path)
- Difficulty (hold-based + progression)
- Symmetry and balance
- Spatial distribution
Together they allow the model to approximate both:
- physical movement complexity
- and hold difficulty structure of a climb.

View File

@@ -1,4 +1,5 @@
angle
angle_squared
total_holds
hand_holds
foot_holds
@@ -6,7 +7,6 @@ start_holds
finish_holds
middle_holds
is_nomatch
mean_x
mean_y
std_x
std_y
@@ -14,107 +14,36 @@ range_x
range_y
min_y
max_y
start_height
start_height_min
start_height_max
finish_height
finish_height_min
finish_height_max
height_gained
height_gained_start_finish
bbox_area
bbox_aspect_ratio
bbox_normalized_area
hold_density
holds_per_vertical_foot
left_holds
right_holds
left_ratio
symmetry_score
hand_left_ratio
hand_symmetry
upper_holds
lower_holds
upper_ratio
max_hand_reach
min_hand_reach
mean_hand_reach
max_hand_reach
std_hand_reach
hand_spread_x
hand_spread_y
max_foot_spread
mean_foot_spread
foot_spread_x
foot_spread_y
max_hand_to_foot
min_hand_to_foot
mean_hand_to_foot
std_hand_to_foot
mean_hold_difficulty
max_hold_difficulty
min_hold_difficulty
std_hold_difficulty
median_hold_difficulty
difficulty_range
mean_hand_difficulty
max_hand_difficulty
std_hand_difficulty
mean_foot_difficulty
max_foot_difficulty
std_foot_difficulty
start_difficulty
finish_difficulty
hand_foot_ratio
movement_density
hold_com_x
hold_com_y
weighted_difficulty
convex_hull_area
convex_hull_perimeter
hull_area_to_bbox_ratio
min_nn_distance
mean_nn_distance
max_nn_distance
std_nn_distance
mean_neighbors_12in
max_neighbors_12in
clustering_ratio
mean_pairwise_distance
std_pairwise_distance
path_length_vertical
path_efficiency
difficulty_gradient
lower_region_difficulty
middle_region_difficulty
upper_region_difficulty
difficulty_progression
max_difficulty_jump
mean_difficulty_jump
difficulty_weighted_reach
max_weighted_reach
mean_x_normalized
mean_y_normalized
std_x_normalized
std_y_normalized
start_height_normalized
finish_height_normalized
start_offset_from_typical
finish_offset_from_typical
mean_y_relative_to_start
max_y_relative_to_start
spread_x_normalized
spread_y_normalized
bbox_coverage_x
bbox_coverage_y
y_q25
y_q50
y_q75
y_iqr
holds_bottom_quartile
holds_top_quartile
complexity_score
display_difficulty
angle_x_holds
angle_x_difficulty
angle_squared
difficulty_x_height
difficulty_x_density
complexity_score
hull_area_x_difficulty

View File

@@ -3,10 +3,10 @@
| Model | MAE | RMSE | R² | Within ±1 | Within ±2 | Exact V | Within ±1 V |
|-------|-----|------|----|-----------|-----------|---------|-------------|
| Linear Regression | 1.467 | 1.882 | 0.782 | 42.6% | 73.3% | 34.9% | 79.4% |
| Ridge Regression | 1.467 | 1.882 | 0.782 | 42.6% | 73.3% | 34.9% | 79.4% |
| Lasso Regression | 1.475 | 1.891 | 0.780 | 42.2% | 73.0% | 34.6% | 79.3% |
| Random Forest (Tuned) | 1.325 | 1.718 | 0.818 | 47.0% | 77.7% | 38.6% | 83.0% |
| Linear Regression | 2.191 | 2.742 | 0.537 | 28.1% | 53.1% | 23.9% | 61.3% |
| Ridge Regression | 2.191 | 2.742 | 0.537 | 28.1% | 53.1% | 23.9% | 61.3% |
| Lasso Regression | 2.192 | 2.741 | 0.538 | 27.9% | 53.1% | 23.8% | 61.3% |
| Random Forest (Tuned) | 1.788 | 2.293 | 0.676 | 36.1% | 64.3% | 30.2% | 70.8% |
### Key Findings
@@ -15,8 +15,8 @@
- Linear models remain useful baselines but leave clear nonlinear signal unexplained.
2. **Fine-grained difficulty prediction is meaningfully harder than grouped grade prediction.**
- On the held-out test set, the best model is within ±1 fine-grained difficulty score 47.0% of the time.
- The same model is within ±1 grouped V-grade 83.0% of the time.
- On the held-out test set, the best model is within ±1 fine-grained difficulty score 36.1% of the time.
- The same model is within ±1 grouped V-grade 70.8% of the time.
3. **This gap is expected and informative.**
- Small numeric errors often stay inside the same or adjacent V-grade buckets.

View File

@@ -2,25 +2,25 @@
### Neural Network Model Summary
**Architecture:**
- Input: 119 features
- Input: 48 features
- Hidden layers: [256, 128, 64]
- Dropout rate: 0.2
- Total parameters: 72,833
- Total parameters: 54,657
**Training:**
- Optimizer: Adam (lr=0.001)
- Early stopping: 25 epochs patience
- Best epoch: 121
- Best epoch: 153
**Test Set Performance:**
- MAE: 1.270
- RMSE: 1.643
- R²: 0.834
- Accuracy within ±1 grade: 49.0%
- Accuracy within ±2 grades: 80.2%
- Exact grouped V-grade accuracy: 39.2%
- Accuracy within ±1 V-grade: 84.3%
- Accuracy within ±2 V-grades: 96.8%
- MAE: 1.893
- RMSE: 2.398
- R²: 0.646
- Accuracy within ±1 grade: 33.8%
- Accuracy within ±2 grades: 60.5%
- Exact grouped V-grade accuracy: 27.8%
- Accuracy within ±1 V-grade: 67.9%
- Accuracy within ±2 V-grades: 88.4%
**Key Findings:**
1. The neural network is competitive, but not clearly stronger than the best tree-based baseline.