fixed leakage
This commit is contained in:
@@ -1,4 +1,5 @@
|
||||
angle
|
||||
angle_squared
|
||||
total_holds
|
||||
hand_holds
|
||||
foot_holds
|
||||
@@ -6,7 +7,6 @@ start_holds
|
||||
finish_holds
|
||||
middle_holds
|
||||
is_nomatch
|
||||
mean_x
|
||||
mean_y
|
||||
std_x
|
||||
std_y
|
||||
@@ -14,107 +14,36 @@ range_x
|
||||
range_y
|
||||
min_y
|
||||
max_y
|
||||
start_height
|
||||
start_height_min
|
||||
start_height_max
|
||||
finish_height
|
||||
finish_height_min
|
||||
finish_height_max
|
||||
height_gained
|
||||
height_gained_start_finish
|
||||
bbox_area
|
||||
bbox_aspect_ratio
|
||||
bbox_normalized_area
|
||||
hold_density
|
||||
holds_per_vertical_foot
|
||||
left_holds
|
||||
right_holds
|
||||
left_ratio
|
||||
symmetry_score
|
||||
hand_left_ratio
|
||||
hand_symmetry
|
||||
upper_holds
|
||||
lower_holds
|
||||
upper_ratio
|
||||
max_hand_reach
|
||||
min_hand_reach
|
||||
mean_hand_reach
|
||||
max_hand_reach
|
||||
std_hand_reach
|
||||
hand_spread_x
|
||||
hand_spread_y
|
||||
max_foot_spread
|
||||
mean_foot_spread
|
||||
foot_spread_x
|
||||
foot_spread_y
|
||||
max_hand_to_foot
|
||||
min_hand_to_foot
|
||||
mean_hand_to_foot
|
||||
std_hand_to_foot
|
||||
mean_hold_difficulty
|
||||
max_hold_difficulty
|
||||
min_hold_difficulty
|
||||
std_hold_difficulty
|
||||
median_hold_difficulty
|
||||
difficulty_range
|
||||
mean_hand_difficulty
|
||||
max_hand_difficulty
|
||||
std_hand_difficulty
|
||||
mean_foot_difficulty
|
||||
max_foot_difficulty
|
||||
std_foot_difficulty
|
||||
start_difficulty
|
||||
finish_difficulty
|
||||
hand_foot_ratio
|
||||
movement_density
|
||||
hold_com_x
|
||||
hold_com_y
|
||||
weighted_difficulty
|
||||
convex_hull_area
|
||||
convex_hull_perimeter
|
||||
hull_area_to_bbox_ratio
|
||||
min_nn_distance
|
||||
mean_nn_distance
|
||||
max_nn_distance
|
||||
std_nn_distance
|
||||
mean_neighbors_12in
|
||||
max_neighbors_12in
|
||||
clustering_ratio
|
||||
mean_pairwise_distance
|
||||
std_pairwise_distance
|
||||
path_length_vertical
|
||||
path_efficiency
|
||||
difficulty_gradient
|
||||
lower_region_difficulty
|
||||
middle_region_difficulty
|
||||
upper_region_difficulty
|
||||
difficulty_progression
|
||||
max_difficulty_jump
|
||||
mean_difficulty_jump
|
||||
difficulty_weighted_reach
|
||||
max_weighted_reach
|
||||
mean_x_normalized
|
||||
mean_y_normalized
|
||||
std_x_normalized
|
||||
std_y_normalized
|
||||
start_height_normalized
|
||||
finish_height_normalized
|
||||
start_offset_from_typical
|
||||
finish_offset_from_typical
|
||||
mean_y_relative_to_start
|
||||
max_y_relative_to_start
|
||||
spread_x_normalized
|
||||
spread_y_normalized
|
||||
bbox_coverage_x
|
||||
bbox_coverage_y
|
||||
y_q25
|
||||
y_q50
|
||||
y_q75
|
||||
y_iqr
|
||||
holds_bottom_quartile
|
||||
holds_top_quartile
|
||||
complexity_score
|
||||
display_difficulty
|
||||
angle_x_holds
|
||||
angle_x_difficulty
|
||||
angle_squared
|
||||
difficulty_x_height
|
||||
difficulty_x_density
|
||||
complexity_score
|
||||
hull_area_x_difficulty
|
||||
|
||||
35
data/05_predictive_modelling/model_summary.txt
Normal file
35
data/05_predictive_modelling/model_summary.txt
Normal file
@@ -0,0 +1,35 @@
|
||||
|
||||
### Model Performance Summary
|
||||
|
||||
| Model | MAE | RMSE | R² | Within ±1 | Within ±2 | Exact V | Within ±1 V |
|
||||
|-------|-----|------|----|-----------|-----------|---------|-------------|
|
||||
| Linear Regression | 2.088 | 2.670 | 0.560 | 30.1% | 55.9% | 25.9% | 64.8% |
|
||||
| Ridge Regression | 2.088 | 2.670 | 0.560 | 30.0% | 55.9% | 25.9% | 64.8% |
|
||||
| Lasso Regression | 2.089 | 2.672 | 0.559 | 29.9% | 55.9% | 25.9% | 64.8% |
|
||||
| Random Forest (Tuned) | 1.846 | 2.375 | 0.652 | 34.8% | 62.4% | 29.6% | 69.7% |
|
||||
|
||||
### Key Findings
|
||||
|
||||
1. **Tree-based models remain strongest on this structured feature set.**
|
||||
- Random Forest (Tuned) achieves the best overall balance of MAE, RMSE, and grouped V-grade performance.
|
||||
- Linear models remain useful baselines but leave clear nonlinear signal unexplained.
|
||||
|
||||
2. **Fine-grained difficulty prediction is meaningfully harder than grouped grade prediction.**
|
||||
- On the held-out test set, the best model is within ±1 fine-grained difficulty score 34.8% of the time.
|
||||
- The same model is within ±1 grouped V-grade 69.7% of the time.
|
||||
|
||||
3. **This gap is expected and informative.**
|
||||
- Small numeric errors often stay inside the same or adjacent V-grade buckets.
|
||||
- The model captures broad difficulty bands more reliably than exact score distinctions.
|
||||
|
||||
4. **The project’s main predictive takeaway is practical rather than perfect.**
|
||||
- The models are not exact grade replicators.
|
||||
- They are reasonably strong at placing climbs into the correct neighborhood of difficulty.
|
||||
|
||||
### Portfolio Interpretation
|
||||
|
||||
From a modelling perspective, this project shows:
|
||||
- feature engineering grounded in domain structure,
|
||||
- comparison of linear and nonlinear models,
|
||||
- honest evaluation on a held-out test set,
|
||||
- and the ability to translate raw regression performance into climbing-relevant grouped V-grade metrics.
|
||||
Reference in New Issue
Block a user