notebooks, images, scripts

This commit is contained in:
Pawel Sarkowicz
2026-03-26 18:01:52 -04:00
parent 53f31c0f77
commit 09454ba38b
83 changed files with 8681 additions and 375 deletions

9
LICENSE Normal file
View File

@@ -0,0 +1,9 @@
The MIT License (MIT)
Copyright © 2026 Pawel Sarkowicz
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

483
README.md Normal file
View File

@@ -0,0 +1,483 @@
# Tension Board 2: Predicting Climbing Route Difficulty from Board Data
I recently got into *board climbing*, and have been enjoying using the <a href="https://tensionclimbing.com/products/tension-board-2">Tension Board 2</a>. I've been climbing on the 12ftx12ft (mirrored) that is available at my local gym, and I've never felt that the phrase "*it hurts so good*" would be so apt. As such, I decided to do an in depth analysis of available data.
![Hold Usage Heatmap](images/02_hold_stats/all_holds_all_grades_heatmap.png)
1. [Setup and Reproducibility](#setup-and-reproducibility)
2. [Part I — Data Analysis (Notebooks 0103)](#part-i--data-analysis-notebooks-0103)
3. [Part II — Predictive Modelling (Notebooks 0406)](#part-ii--predictive-modelling-notebooks-0406)
4. [Using the Trained Model](#using-the-trained-model)
## Overview
This project analyzes ~130,000 climbs from the Tension Boards in order to do the following.
> 1. **Understand** hold usage patterns and difficulty distributions
> 2. **Quantify** empircal hold difficulty scores
> 3. **Predict** climb grades from hold positions and board angle
Climbing grades are inherently subjective. Different climbers use different beta, setters have different grading standards, and difficulty depends on factors not always captured in data. What makes it harder in the case of the board climbing is that the grade is displayed almost democratically -- it is determined by user input.
Using a Tension Board (2) dataset, this project combines:
* SQL-based data exploration
* statistical analysis and visualization
* feature engineering
* machine learning and deep learning
The project is intentionally structured in two parts:
* **Part I — Data Analysis**
* **Part II — Predictive Modelling**
---
## Project Structure
```text
data/ # processed datasets and feature tables
images/ # saved visualizations used in README and analysis
models/ # trained models and scalers
notebooks/ # full pipeline (0106)
scripts/ # utility + prediction scripts
sql/ # SQL exploration
README.md
```
---
# Setup and Reproducibility
## Requirements
```bash
pip install requirements.txt
```
---
## Retrieving the Data
The utility [`BoardLib`](https://github.com/lemeryfertitta/BoardLib) is used for interacting with climbing board APIs, and works with all Aurora Climbing boards.
We'll work with the Tension Board 2. I downloaded TB2 data as `tb2.db`, and I also downloaded the images.
```bash
# install boardlib
pip install boardlib
# download the database
boardlib database tension data/tb2.db
# download the images
# this puts the images into images/product_sizes_layouts_sets
boardlib images tension tb2.db images
```
**Note**. I downloaded the database in March 2026, and the data was last updated on 2026-01-22. There is no user data in this database. The image I use to overlay the heatmaps on is `images/tb2_board_12x12_composite.png`. It is just the two following images put together:
* `images/product_sizes_layouts_sets/21-2.png`
* `images/product_sizes_layouts_sets/22-2.png`
---
## Running the project
Go to your working directory and run notebooks in order:
```text
01 -> 02 -> 03 -> 04 -> 05 -> 06
```
Note:
* Notebooks 01-03 are uploaded with all of their cells run, so that one can see the data analysis. Notebooks 04-06 are uploaded without having been run.
* Notebook 03 generates hold difficulty tables
* Notebook 04 generates feature matrix
* Notebook 05 trains models
* Notebook 06 trains neural network
---
# Part I — Data Analysis (Notebooks 0103)
This section focuses on **understanding the data**, identifying patterns, and forming hypotheses. We start off by mentioning that we don't have any user data. We are still able to determine some user-trends from features of climbs like `fa_at` (when it was first ascented) and `ascensionist_count` (how many people have logged an ascent) from the `climbs` and `climb_stats` tables, but that's about it.
---
## 1. Data Overview and Climbing Statistics
There are about 30 tables in this database, about half of which contain useful information. See [`sql/01_data_exploration.sql`](sql/01_data_exploration.sql) for the full exploration of tables. We examine many climbing statistics, starting off with grade distribution.
![Grade Distribution](images/01_climb_stats/difficulty_distribution_by_layout_with_total.png)
* Grade distribution is skewed toward mid-range climbs
* Extreme difficulties are relatively rare
* Multiple entries per climb reflect angle variations
---
## 2. Climbing Popularity and Temporal Patterns
Beyond structural analysis, we can also study how board-climbers behave over time (despite the lack of user data).
![Popularity by year](images/01_climb_stats/first_ascents_by_year.png)
* General uptrend of popularity over the years, both in term of first ascents and unique setters
---
## 3. Angle vs Difficulty
![Angle vs Grade](images/01_climb_stats/difficulty_by_angle_boxplot_by_layout.png)
* Wall angle is one of the strongest predictors of difficulty
* Steeper climbs tend to be harder
* Significant variability remains within each angle
* Things tend to stabilize past 50 degrees
---
## 4. Board Structure and Hold Usage
![Board Heatmap](images/02_hold_stats/all_holds_7a_V6_heatmap.png)
* Hold usage is highly non-uniform
* Certain board regions are heavily overrepresented
* Spatial structure plays a key role in difficulty
---
## 5. Hold Difficulty Estimation
![Hold Difficulty](images/03_hold_difficulty/difficulty_hand_40deg.png)
* Hold difficulty is estimated from climb data
* We averaged (pre-role/per-angle) difficulty for each hold (with Bayesian smoothing)
* Took advantage of the mirrored layout to increase the amount of data per hold
### Key technique: Bayesian smoothing
Raw averages are noisy due to uneven usage. To stabilize estimates:
* frequently used holds retain their empirical difficulty
* rarely used holds are pulled toward the global mean
This significantly improves downstream feature quality.
---
## 6. Many more!
There are many other statistics, see notebooks [`01`](notebooks/01_data_overview_and_climbing_statistics.ipynb) (climbing statistics), [`02`](notebooks/02_hold_analysis_and_board_heatmaps.ipynb) (climbing hold statistics), and [`03`](notebooks/03_hold_difficulty.ipynb). Included are:
* **Time-Date analysis** based on `fa_at`. We include month, day of week, and time analysis based on first ascent log data. Winter months are the most popular, and Tuesday and Wednesday are the most popular days of the week.
* **Distribution of climbs per angle**, with 40 degrees being the most common.
* **Distribution of climb quality**, along with the relationship between quality & angle + grade.
* **"Match" vs "No Match"** analysis (whether or not you can match your hands on a hold). "No match" climbs are fewer, but harder and have more ascensionists
* **Prolific statistics**: most popular routes & setters
* **Plastic vs wood** hold analysis
* **Per-Angle, Per-Grade** hold frequency & difficulty analyses
---
# Part II — Predictive Modelling (Notebooks 0406)
This section focuses on **building predictive models and evaluating performance**. We will build features from the `angle` and `frames` of a climb (the `frames` feature of a climb tells us which hold to use and which role it plays).
---
## 7. Feature Engineering
Features are constructed at the climb level using:
* geometry (height, spread, convex hull)
* structure (number of moves, clustering)
* hold difficulty (smoothed estimates)
* interaction features
| Category | Description | Examples |
| ------------- | --------------------------------- | ------------------------------------------- |
| Geometry | Shape and size of climb | bbox_area, range_x, range_y |
| Movement | Reach and movement complexity | max_hand_reach, path_efficiency |
| Difficulty | Hold-based difficulty metrics | mean_hold_difficulty, max_hold_difficulty |
| Progression | How difficulty changes over climb | difficulty_gradient, difficulty_progression |
| Symmetry | Left/right balance | symmetry_score, hand_symmetry |
| Clustering | Local density of holds | mean_neighbors_12in |
| Normalization | Relative board positioning | mean_y_normalized |
| Distribution | Vertical distribution of holds | y_q25, y_q75 |
### Important design decision
The dataset is restricted to:
> **climbs with angle ≤ 50°**
to reduce variability and improve consistency. (see [Angle vs Difficulty](#3-angle-vs-difficulty), where average climb grade seems to stabilize or get lower over 50°)
---
## 8. Feature Relationships
Here are some relationships between features and difficulty
![Correlation Heatmap](images/04_climb_features/feature_correlations.png)
* higher angles allow for harder difficulties
* hold difficulty features seem to correlate the most to difficulty
* engineered features capture non-trivial structure
We have a full feature list in [`data/04_climb_features/feature_list.txt`](data/04_climb_features/feature_list.txt). Explanations are available in [`data/04_climb_features/feature_list_explanations.txt`](data/04_climb_features/feature_explanations.txt).
---
## 9. Predictive Models
Models tested:
* Linear Regression
* Ridge
* Lasso
* **Random Forest**
* Gradient Boosting
* **Neural Networks**
### Feature importance
![RF Feature Importance](images/05_predictive_modelling/random_forest_feature_importance.png)
![Neural Network Feature Importance](images/06_deep_learning/neural_network_feature_importance.png)
Key drivers:
* hold difficulty
* wall angle
* structural features
---
## 10. Model Performance
![RF redicted vs Actual](images/05_predictive_modelling/random_forest_predictions.png)
![NN Predicted vs Actual](images/06_deep_learning/neural_network_predictions.png)
### Results (in terms of difficulty score)
Both the RF and NN models performed similarly.
* **~83% within ±1 V-grade (~45% within ±1 difficulty score)**
* **~96% within ±2 V-grade (~80% within ±2 difficulty scores)**
### Interpretation
* Models capture meaningful trends
* Exact prediction is difficult due to:
* subjective grading
* missing beta (movement sequences)
* climber variability
---
# Results Summary
| Metric | Performance |
| ------------------ | ----------- |
| Within ±1 V-grade | ~83% |
| Within ±2 V-grades | ~96% |
The model can still predict subgrades (e.g., V3 contains 6a and 6a+), but it is not as accurate.
| Metric | Performance |
| ------------------ | ----------- |
| Within ±1 difficulty-grade | ~45% |
| Within ±2 difficulty-grades | ~80% |
---
# Limitations
* No explicit movement / beta information
* Grading inconsistency
* No climber-specific features
* Dataset noise
* Some nonsensical grades. For example `Wayward Son` has the following grades, while it is a more difficult climb at 45 degrees.
> | name |angle|display_difficulty|
> | -----------|-----|------------------|
> | Wayward Son| 35| 14.0 (5b/V1)|
> | Wayward Son| 45| 11.9929 ~ 12 (4c/V0)|
---
# Future Work
* Kilter Board analysis
* Test other models
* Better spatial features
* GUI to create climb and instantly tell you a predicted difficulty
---
# Using the Trained Model
## Load model in Python
```python
import joblib
model = joblib.load('models/random_forest_tuned.pkl')
```
---
## Predict from feature matrix
```python
import pandas as pd
df = pd.read_csv('data/04_climb_features/climb_features.csv')
X = df.drop(columns=['climb_uuid', 'display_difficulty'])
predictions = model.predict(X)
```
---
## Model files
* `models/random_forest_tuned.pkl` — trained Random Forest
* `models/scaler.joblib` — feature scaler (if applicable)
---
## Using the Prediction Script
The repository includes a prediction script that can estimate climb difficulty directly from:
* wall angle
* `frames` string
* optional metadata such as `is_nomatch` and `description`
The script reconstructs the engineered feature vector used during training, applies the selected model, and returns:
* predicted numeric difficulty
* rounded display difficulty
* mapped boulder grade
### Supported models
The script supports the following trained models:
* `random_forest` — default and recommended
* `linear`
* `ridge`
* `lasso`
* `nn` — alias for the best neural network checkpoint
* `nn_best`
### Single-climb prediction
Example:
```bash
python scripts/predict.py --angle 35 --frames 'p304r8p378r6p552r6p564r7p582r5p683r8p686r7' --model random_forest
```
Example output:
```python
{
'predicted_numeric': 14.27,
'predicted_display_difficulty': 14,
'predicted_boulder_grade': '5b/V1',
'model': 'random_forest'
}
```
You can also use the neural network:
```bash
python scripts/predict.py --angle 40 --frames 'p344r5p348r8p352r5p362r6p366r8p367r8p369r6p371r6p372r7p379r8p382r6p386r8p388r8p403r8p603r7p615r6p617r6' --model nn
```
### Batch prediction from CSV
The same script can run predictions for an entire CSV file.
#### Required columns
* `angle`
* `frames`
#### Optional columns
* `is_nomatch`
* `description`
#### Example input CSV
```csv
angle,frames,is_nomatch,description
40,p344r5p348r8p352r5p362r6,0,
35,p304r8p378r6p552r6p564r7,1,no matching
```
#### Run batch prediction
```bash
python scripts/predict.py --input_csv data/new_climbs.csv --output_csv data/new_climbs_with_predictions.csv --model random_forest
```
This appends prediction columns to the original CSV, including:
* `predicted_numeric`
* `predicted_display_difficulty`
* `predicted_boulder_grade`
* `model`
### Evaluate predictions on labeled data
If your CSV also contains a true difficulty column named `display_difficulty`, the script can compute simple evaluation metrics:
```bash
python scripts/predict.py --input_csv data/test_climbs.csv --output_csv data/test_preds.csv --model random_forest --evaluate
```
Reported metrics include:
* mean absolute error
* RMSE
* fraction within ±1 grade
* fraction within ±2 grades
### Python usage
You can also call the prediction function directly:
```python
from scripts.predict import predict
result = predict(
angle=40,
frames="p344r5p348r8p352r5p362r6",
model_name="random_forest"
)
print(result)
```
### Notes
* `random_forest` is the recommended default model for practical use.
* Linear, ridge, lasso, and neural network models are included for comparison.
* The prediction pipeline depends on the same engineered features used during model training, so the script internally reconstructs these from raw route input.
* The neural network checkpoints are loaded from saved PyTorch state dictionaries using the architecture defined in the project.
---
# License
This project is licensed under the MIT License. See the [`LICENSE`](LICENSE) file for details.
The project is for educational purposes. Climb data belongs to Tension Climbing.

View File

@@ -0,0 +1,324 @@
Tension Board 2 Feature Engineering Explanation
This document explains the engineered features used for climb difficulty prediction.
--------------------------------------------------
1. BASIC STRUCTURE FEATURES
--------------------------------------------------
angle
Wall angle in degrees.
total_holds
Total number of holds in the climb.
hand_holds / foot_holds
Number of hand vs foot holds.
start_holds / finish_holds / middle_holds
Counts of hold roles.
--------------------------------------------------
2. MATCHING FEATURE
--------------------------------------------------
is_nomatch
Binary feature indicating whether matching is disallowed.
Derived from:
- explicit flag OR
- description text (e.g. “no match”, “no matching”)
--------------------------------------------------
3. SPATIAL FEATURES
--------------------------------------------------
mean_x, mean_y
Center of mass of all holds.
std_x, std_y
Spread of holds.
range_x, range_y
Width and height of climb.
min_y, max_y
Lowest and highest holds.
height_gained
Total vertical gain.
height_gained_start_finish
Vertical gain from start to finish.
--------------------------------------------------
4. START / FINISH FEATURES
--------------------------------------------------
start_height, finish_height
Average height of start/finish holds.
start_height_min/max, finish_height_min/max
Range of start/finish positions.
--------------------------------------------------
5. BOUNDING BOX FEATURES
--------------------------------------------------
bbox_area
Area covered by climb.
bbox_aspect_ratio
Horizontal vs vertical shape.
bbox_normalized_area
Relative coverage of board.
hold_density
Holds per unit area.
holds_per_vertical_foot
Vertical density.
--------------------------------------------------
6. SYMMETRY FEATURES
--------------------------------------------------
left_holds, right_holds
Distribution across board center.
left_ratio
Fraction of holds on left.
symmetry_score
Symmetry measure (1 = perfectly balanced).
hand_left_ratio, hand_symmetry
Same but for hand holds.
--------------------------------------------------
7. VERTICAL DISTRIBUTION
--------------------------------------------------
upper_holds, lower_holds
Split around median height.
upper_ratio
Proportion of upper holds.
--------------------------------------------------
8. REACH / DISTANCE FEATURES
--------------------------------------------------
max_hand_reach, mean_hand_reach, std_hand_reach
Distances between hand holds.
hand_spread_x, hand_spread_y
Spatial extent of hand holds.
max_foot_spread, mean_foot_spread
Foot hold spacing.
max_hand_to_foot, mean_hand_to_foot
Hand-foot distances.
--------------------------------------------------
9. HOLD DIFFICULTY FEATURES
--------------------------------------------------
mean_hold_difficulty
Average difficulty of holds.
max_hold_difficulty / min_hold_difficulty
Extremes.
std_hold_difficulty
Variation.
median_hold_difficulty
Central tendency.
difficulty_range
Spread.
mean_hand_difficulty / mean_foot_difficulty
Role-specific difficulty.
start_difficulty / finish_difficulty
Entry and exit difficulty.
--------------------------------------------------
10. COMBINED / INTERACTION FEATURES
--------------------------------------------------
hand_foot_ratio
Balance of hands vs feet.
movement_density
Holds per vertical distance.
weighted_difficulty
Height-weighted difficulty.
difficulty_gradient
Difference between start and finish difficulty.
--------------------------------------------------
11. SHAPE / GEOMETRY FEATURES
--------------------------------------------------
convex_hull_area
Area of convex hull around holds.
convex_hull_perimeter
Perimeter.
hull_area_to_bbox_ratio
Compactness.
--------------------------------------------------
12. NEAREST-NEIGHBOR FEATURES
--------------------------------------------------
min_nn_distance / mean_nn_distance
Spacing between holds.
max_nn_distance
Maximum separation.
std_nn_distance
Spread.
--------------------------------------------------
13. CLUSTERING FEATURES
--------------------------------------------------
mean_neighbors_12in
Average nearby holds within 12 inches.
max_neighbors_12in
Max clustering.
clustering_ratio
Normalized clustering.
--------------------------------------------------
14. PATH FEATURES
--------------------------------------------------
path_length_vertical
Estimated movement path length.
path_efficiency
Vertical gain vs path length.
--------------------------------------------------
15. REGIONAL DIFFICULTY FEATURES
--------------------------------------------------
lower_region_difficulty
Bottom third difficulty.
middle_region_difficulty
Middle third difficulty.
upper_region_difficulty
Top third difficulty.
difficulty_progression
Change in difficulty from bottom to top.
--------------------------------------------------
16. DIFFICULTY TRANSITIONS
--------------------------------------------------
max_difficulty_jump
Largest jump between moves.
mean_difficulty_jump
Average jump.
difficulty_weighted_reach
Distance weighted by difficulty.
--------------------------------------------------
17. NORMALIZED FEATURES
--------------------------------------------------
mean_x_normalized, mean_y_normalized
Relative board position.
std_x_normalized, std_y_normalized
Normalized spread.
start_height_normalized, finish_height_normalized
Relative heights.
spread_x_normalized, spread_y_normalized
Coverage.
--------------------------------------------------
18. RELATIVE POSITION FEATURES
--------------------------------------------------
start_offset_from_typical
Deviation from typical start height.
finish_offset_from_typical
Deviation from typical finish height.
mean_y_relative_to_start
Average height relative to start.
max_y_relative_to_start
Highest point relative to start.
--------------------------------------------------
19. DISTRIBUTION FEATURES
--------------------------------------------------
y_q25, y_q50, y_q75
Height quartiles.
y_iqr
Spread.
holds_bottom_quartile
Lower density.
holds_top_quartile
Upper density.
--------------------------------------------------
SUMMARY
--------------------------------------------------
These features capture:
- Geometry (shape, spread)
- Movement (reach, density, path)
- Difficulty (hold-based + progression)
- Symmetry and balance
- Spatial distribution
Together they allow the model to approximate both:
- physical movement complexity
- and hold difficulty structure of a climb.

View File

@@ -0,0 +1,120 @@
angle
total_holds
hand_holds
foot_holds
start_holds
finish_holds
middle_holds
is_nomatch
mean_x
mean_y
std_x
std_y
range_x
range_y
min_y
max_y
start_height
start_height_min
start_height_max
finish_height
finish_height_min
finish_height_max
height_gained
height_gained_start_finish
bbox_area
bbox_aspect_ratio
bbox_normalized_area
hold_density
holds_per_vertical_foot
left_holds
right_holds
left_ratio
symmetry_score
hand_left_ratio
hand_symmetry
upper_holds
lower_holds
upper_ratio
max_hand_reach
min_hand_reach
mean_hand_reach
std_hand_reach
hand_spread_x
hand_spread_y
max_foot_spread
mean_foot_spread
foot_spread_x
foot_spread_y
max_hand_to_foot
min_hand_to_foot
mean_hand_to_foot
std_hand_to_foot
mean_hold_difficulty
max_hold_difficulty
min_hold_difficulty
std_hold_difficulty
median_hold_difficulty
difficulty_range
mean_hand_difficulty
max_hand_difficulty
std_hand_difficulty
mean_foot_difficulty
max_foot_difficulty
std_foot_difficulty
start_difficulty
finish_difficulty
hand_foot_ratio
movement_density
hold_com_x
hold_com_y
weighted_difficulty
convex_hull_area
convex_hull_perimeter
hull_area_to_bbox_ratio
min_nn_distance
mean_nn_distance
max_nn_distance
std_nn_distance
mean_neighbors_12in
max_neighbors_12in
clustering_ratio
path_length_vertical
path_efficiency
difficulty_gradient
lower_region_difficulty
middle_region_difficulty
upper_region_difficulty
difficulty_progression
max_difficulty_jump
mean_difficulty_jump
difficulty_weighted_reach
max_weighted_reach
mean_x_normalized
mean_y_normalized
std_x_normalized
std_y_normalized
start_height_normalized
finish_height_normalized
start_offset_from_typical
finish_offset_from_typical
mean_y_relative_to_start
max_y_relative_to_start
spread_x_normalized
spread_y_normalized
bbox_coverage_x
bbox_coverage_y
y_q25
y_q50
y_q75
y_iqr
holds_bottom_quartile
holds_top_quartile
display_difficulty
angle_x_holds
angle_x_difficulty
angle_squared
difficulty_x_height
difficulty_x_density
complexity_score
hull_area_x_difficulty

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

View File

Before

Width:  |  Height:  |  Size: 110 KiB

After

Width:  |  Height:  |  Size: 110 KiB

View File

Before

Width:  |  Height:  |  Size: 78 KiB

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 141 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 501 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 412 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 179 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 135 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 489 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 97 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 97 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 487 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 467 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 119 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 103 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 68 KiB

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

10
requirements.txt Normal file
View File

@@ -0,0 +1,10 @@
pandas
numpy
matplotlib
seaborn
scikit-learn
jupyter
notebook
torch
sqlite3
boardlib

976
scripts/predict.py Normal file
View File

@@ -0,0 +1,976 @@
import re
from pathlib import Path
import joblib
import numpy as np
import pandas as pd
from scipy.spatial import ConvexHull
from scipy.spatial.distance import pdist, squareform
try:
import torch
import torch.nn as nn
TORCH_AVAILABLE = True
except ImportError:
TORCH_AVAILABLE = False
# ============================================================
# Paths
# ============================================================
ROOT = Path(__file__).resolve().parents[1]
SCALER_PATH = ROOT / "models" / "feature_scaler.pkl"
FEATURE_NAMES_PATH = ROOT / "models" / "feature_names.txt"
HOLD_DIFFICULTY_PATH = ROOT / "data" / "03_hold_difficulty" / "hold_difficulty_scores.csv"
PLACEMENTS_PATH = ROOT / "data" / "placements.csv" # adjust if needed
# ============================================================
# Model registry
# ============================================================
MODEL_REGISTRY = {
"linear": {
"path": ROOT / "models" / "linear_regression.pkl",
"kind": "sklearn",
"needs_scaling": True,
},
"ridge": {
"path": ROOT / "models" / "ridge_regression.pkl",
"kind": "sklearn",
"needs_scaling": True,
},
"lasso": {
"path": ROOT / "models" / "lasso_regression.pkl",
"kind": "sklearn",
"needs_scaling": True,
},
"random_forest": {
"path": ROOT / "models" / "random_forest_tuned.pkl",
"kind": "sklearn",
"needs_scaling": False,
},
"nn_best": {
"path": ROOT / "models" / "neural_network_best.pth",
"kind": "torch_checkpoint",
"needs_scaling": True,
},
}
DEFAULT_MODEL = "random_forest"
# ============================================================
# Board constants
# Adjust if your board coordinate system differs
# ============================================================
x_min, x_max = 0.0, 144.0
y_min, y_max = 0.0, 144.0
board_width = x_max - x_min
board_height = y_max - y_min
# ============================================================
# Role mappings
# ============================================================
HAND_ROLE_IDS = {5, 6, 7}
FOOT_ROLE_IDS = {8}
def get_role_type(role_id: int) -> str:
mapping = {
5: "start",
6: "middle",
7: "finish",
8: "foot",
}
return mapping.get(role_id, "middle")
# ============================================================
# Grade map
# ============================================================
grade_map = {
10: '4a/V0',
11: '4b/V0',
12: '4c/V0',
13: '5a/V1',
14: '5b/V1',
15: '5c/V2',
16: '6a/V3',
17: '6a+/V3',
18: '6b/V4',
19: '6b+/V4',
20: '6c/V5',
21: '6c+/V5',
22: '7a/V6',
23: '7a+/V7',
24: '7b/V8',
25: '7b+/V8',
26: '7c/V9',
27: '7c+/V10',
28: '8a/V11',
29: '8a+/V12',
30: '8b/V13',
31: '8b+/V14',
32: '8c/V15',
33: '8c+/V16'
}
MIN_GRADE = min(grade_map)
MAX_GRADE = max(grade_map)
# ============================================================
# Neural network architecture from Notebook 06
# ============================================================
if TORCH_AVAILABLE:
class ClimbGradePredictor(nn.Module):
def __init__(self, input_dim, hidden_layers=None, dropout_rate=0.2):
super().__init__()
if hidden_layers is None:
hidden_layers = [256, 128, 64]
layers = []
prev_dim = input_dim
for hidden_dim in hidden_layers:
layers.append(nn.Linear(prev_dim, hidden_dim))
layers.append(nn.BatchNorm1d(hidden_dim))
layers.append(nn.ReLU())
layers.append(nn.Dropout(dropout_rate))
prev_dim = hidden_dim
layers.append(nn.Linear(prev_dim, 1))
self.network = nn.Sequential(*layers)
def forward(self, x):
return self.network(x)
# ============================================================
# Load shared artifacts
# ============================================================
scaler = joblib.load(SCALER_PATH)
with open(FEATURE_NAMES_PATH, "r") as f:
FEATURE_NAMES = [line.strip() for line in f if line.strip()]
df_hold_difficulty = pd.read_csv(HOLD_DIFFICULTY_PATH, index_col="placement_id")
df_placements = pd.read_csv(PLACEMENTS_PATH)
placement_coords = {
int(row["placement_id"]): (row["x"], row["y"])
for _, row in df_placements.iterrows()
}
# ============================================================
# Model loading
# ============================================================
_MODEL_CACHE = {}
def normalize_model_name(model_name: str) -> str:
if model_name == "nn":
return "nn_best"
return model_name
def load_model(model_name=DEFAULT_MODEL):
model_name = normalize_model_name(model_name)
if model_name not in MODEL_REGISTRY:
raise ValueError(
f"Unknown model '{model_name}'. Choose from: {list(MODEL_REGISTRY.keys()) + ['nn']}"
)
if model_name in _MODEL_CACHE:
return _MODEL_CACHE[model_name]
info = MODEL_REGISTRY[model_name]
path = info["path"]
if info["kind"] == "sklearn":
model = joblib.load(path)
elif info["kind"] == "torch_checkpoint":
if not TORCH_AVAILABLE:
raise ImportError("PyTorch is not installed, so the neural network model cannot be used.")
checkpoint = torch.load(path, map_location="cpu")
if hasattr(checkpoint, "eval"):
model = checkpoint
model.eval()
elif isinstance(checkpoint, dict):
input_dim = checkpoint.get("input_dim", len(FEATURE_NAMES))
hidden_layers = checkpoint.get("hidden_layers", [256, 128, 64])
dropout_rate = checkpoint.get("dropout_rate", 0.2)
model = ClimbGradePredictor(
input_dim=input_dim,
hidden_layers=hidden_layers,
dropout_rate=dropout_rate,
)
if "model_state_dict" in checkpoint:
model.load_state_dict(checkpoint["model_state_dict"])
else:
model.load_state_dict(checkpoint)
model.eval()
else:
raise RuntimeError(
f"Unsupported checkpoint type for {model_name}: {type(checkpoint)}"
)
else:
raise ValueError(f"Unsupported model kind: {info['kind']}")
_MODEL_CACHE[model_name] = model
return model
# ============================================================
# Helpers
# ============================================================
def parse_frames(frames: str):
"""
Parse strings like:
p304r8p378r6p552r6
into:
[(304, 8), (378, 6), (552, 6)]
"""
if not isinstance(frames, str) or not frames.strip():
return []
matches = re.findall(r"p(\d+)r(\d+)", frames)
return [(int(p), int(r)) for p, r in matches]
def lookup_hold_difficulty(placement_id, angle, role_type, is_hand, is_foot):
"""
Preference order:
1. role-specific per-angle
2. aggregate hand/foot per-angle
3. overall_difficulty fallback
"""
if placement_id not in df_hold_difficulty.index:
return np.nan
row = df_hold_difficulty.loc[placement_id]
diff_key = f"{role_type}_diff_{int(angle)}deg"
hand_diff_key = f"hand_diff_{int(angle)}deg"
foot_diff_key = f"foot_diff_{int(angle)}deg"
difficulty = np.nan
if diff_key in row.index:
difficulty = row[diff_key]
if pd.isna(difficulty):
if is_hand and hand_diff_key in row.index:
difficulty = row[hand_diff_key]
elif is_foot and foot_diff_key in row.index:
difficulty = row[foot_diff_key]
if pd.isna(difficulty) and "overall_difficulty" in row.index:
difficulty = row["overall_difficulty"]
return difficulty
# ============================================================
# Feature extraction
# ============================================================
def extract_features_from_raw(angle, frames, is_nomatch=0, description=""):
features = {}
holds = parse_frames(frames)
if not holds:
raise ValueError("Could not parse any holds from frames.")
hold_data = []
for placement_id, role_id in holds:
coords = placement_coords.get(placement_id, (None, None))
if coords[0] is None:
continue
role_type = get_role_type(role_id)
is_hand = role_id in HAND_ROLE_IDS
is_foot = role_id in FOOT_ROLE_IDS
difficulty = lookup_hold_difficulty(
placement_id=placement_id,
angle=angle,
role_type=role_type,
is_hand=is_hand,
is_foot=is_foot,
)
hold_data.append({
"placement_id": placement_id,
"x": coords[0],
"y": coords[1],
"role_id": role_id,
"role_type": role_type,
"is_hand": is_hand,
"is_foot": is_foot,
"difficulty": difficulty,
})
if not hold_data:
raise ValueError("No valid holds found after parsing frames.")
df_holds = pd.DataFrame(hold_data)
hand_holds = df_holds[df_holds["is_hand"]]
foot_holds = df_holds[df_holds["is_foot"]]
start_holds = df_holds[df_holds["role_type"] == "start"]
finish_holds = df_holds[df_holds["role_type"] == "finish"]
middle_holds = df_holds[df_holds["role_type"] == "middle"]
xs = df_holds["x"].values
ys = df_holds["y"].values
features["angle"] = angle
features["total_holds"] = len(df_holds)
features["hand_holds"] = len(hand_holds)
features["foot_holds"] = len(foot_holds)
features["start_holds"] = len(start_holds)
features["finish_holds"] = len(finish_holds)
features["middle_holds"] = len(middle_holds)
desc = str(description) if description is not None else ""
features["is_nomatch"] = int(
(is_nomatch == 1) or
bool(re.search(r"\bno\s*match(ing)?\b", desc, flags=re.IGNORECASE))
)
features["mean_x"] = np.mean(xs)
features["mean_y"] = np.mean(ys)
features["std_x"] = np.std(xs) if len(xs) > 1 else 0
features["std_y"] = np.std(ys) if len(ys) > 1 else 0
features["range_x"] = np.max(xs) - np.min(xs)
features["range_y"] = np.max(ys) - np.min(ys)
features["min_y"] = np.min(ys)
features["max_y"] = np.max(ys)
if len(start_holds) > 0:
features["start_height"] = start_holds["y"].mean()
features["start_height_min"] = start_holds["y"].min()
features["start_height_max"] = start_holds["y"].max()
else:
features["start_height"] = np.nan
features["start_height_min"] = np.nan
features["start_height_max"] = np.nan
if len(finish_holds) > 0:
features["finish_height"] = finish_holds["y"].mean()
features["finish_height_min"] = finish_holds["y"].min()
features["finish_height_max"] = finish_holds["y"].max()
else:
features["finish_height"] = np.nan
features["finish_height_min"] = np.nan
features["finish_height_max"] = np.nan
features["height_gained"] = features["max_y"] - features["min_y"]
if pd.notna(features["finish_height"]) and pd.notna(features["start_height"]):
features["height_gained_start_finish"] = features["finish_height"] - features["start_height"]
else:
features["height_gained_start_finish"] = np.nan
bbox_width = features["range_x"]
bbox_height = features["range_y"]
features["bbox_area"] = bbox_width * bbox_height
features["bbox_aspect_ratio"] = bbox_width / bbox_height if bbox_height > 0 else 0
features["bbox_normalized_area"] = features["bbox_area"] / (board_width * board_height)
features["hold_density"] = features["total_holds"] / features["bbox_area"] if features["bbox_area"] > 0 else 0
features["holds_per_vertical_foot"] = features["total_holds"] / max(features["range_y"], 1)
center_x = (x_min + x_max) / 2
features["left_holds"] = (df_holds["x"] < center_x).sum()
features["right_holds"] = (df_holds["x"] >= center_x).sum()
features["left_ratio"] = features["left_holds"] / features["total_holds"] if features["total_holds"] > 0 else 0.5
features["symmetry_score"] = 1 - abs(features["left_ratio"] - 0.5) * 2
if len(hand_holds) > 0:
hand_left = (hand_holds["x"] < center_x).sum()
features["hand_left_ratio"] = hand_left / len(hand_holds)
features["hand_symmetry"] = 1 - abs(features["hand_left_ratio"] - 0.5) * 2
else:
features["hand_left_ratio"] = np.nan
features["hand_symmetry"] = np.nan
y_median = np.median(ys)
features["upper_holds"] = (df_holds["y"] > y_median).sum()
features["lower_holds"] = (df_holds["y"] <= y_median).sum()
features["upper_ratio"] = features["upper_holds"] / features["total_holds"]
if len(hand_holds) >= 2:
hand_xs = hand_holds["x"].values
hand_ys = hand_holds["y"].values
hand_distances = []
for i in range(len(hand_holds)):
for j in range(i + 1, len(hand_holds)):
dx = hand_xs[i] - hand_xs[j]
dy = hand_ys[i] - hand_ys[j]
hand_distances.append(np.sqrt(dx**2 + dy**2))
features["max_hand_reach"] = max(hand_distances)
features["min_hand_reach"] = min(hand_distances)
features["mean_hand_reach"] = np.mean(hand_distances)
features["std_hand_reach"] = np.std(hand_distances)
features["hand_spread_x"] = hand_xs.max() - hand_xs.min()
features["hand_spread_y"] = hand_ys.max() - hand_ys.min()
else:
features["max_hand_reach"] = 0
features["min_hand_reach"] = 0
features["mean_hand_reach"] = 0
features["std_hand_reach"] = 0
features["hand_spread_x"] = 0
features["hand_spread_y"] = 0
if len(foot_holds) >= 2:
foot_xs = foot_holds["x"].values
foot_ys = foot_holds["y"].values
foot_distances = []
for i in range(len(foot_holds)):
for j in range(i + 1, len(foot_holds)):
dx = foot_xs[i] - foot_xs[j]
dy = foot_ys[i] - foot_ys[j]
foot_distances.append(np.sqrt(dx**2 + dy**2))
features["max_foot_spread"] = max(foot_distances)
features["mean_foot_spread"] = np.mean(foot_distances)
features["foot_spread_x"] = foot_xs.max() - foot_xs.min()
features["foot_spread_y"] = foot_ys.max() - foot_ys.min()
else:
features["max_foot_spread"] = 0
features["mean_foot_spread"] = 0
features["foot_spread_x"] = 0
features["foot_spread_y"] = 0
if len(hand_holds) > 0 and len(foot_holds) > 0:
h2f_distances = []
for _, h in hand_holds.iterrows():
for _, f in foot_holds.iterrows():
dx = h["x"] - f["x"]
dy = h["y"] - f["y"]
h2f_distances.append(np.sqrt(dx**2 + dy**2))
features["max_hand_to_foot"] = max(h2f_distances)
features["min_hand_to_foot"] = min(h2f_distances)
features["mean_hand_to_foot"] = np.mean(h2f_distances)
features["std_hand_to_foot"] = np.std(h2f_distances)
else:
features["max_hand_to_foot"] = 0
features["min_hand_to_foot"] = 0
features["mean_hand_to_foot"] = 0
features["std_hand_to_foot"] = 0
difficulties = df_holds["difficulty"].dropna().values
if len(difficulties) > 0:
features["mean_hold_difficulty"] = np.mean(difficulties)
features["max_hold_difficulty"] = np.max(difficulties)
features["min_hold_difficulty"] = np.min(difficulties)
features["std_hold_difficulty"] = np.std(difficulties)
features["median_hold_difficulty"] = np.median(difficulties)
features["difficulty_range"] = features["max_hold_difficulty"] - features["min_hold_difficulty"]
else:
features["mean_hold_difficulty"] = np.nan
features["max_hold_difficulty"] = np.nan
features["min_hold_difficulty"] = np.nan
features["std_hold_difficulty"] = np.nan
features["median_hold_difficulty"] = np.nan
features["difficulty_range"] = np.nan
hand_diffs = hand_holds["difficulty"].dropna().values if len(hand_holds) > 0 else np.array([])
if len(hand_diffs) > 0:
features["mean_hand_difficulty"] = np.mean(hand_diffs)
features["max_hand_difficulty"] = np.max(hand_diffs)
features["std_hand_difficulty"] = np.std(hand_diffs)
else:
features["mean_hand_difficulty"] = np.nan
features["max_hand_difficulty"] = np.nan
features["std_hand_difficulty"] = np.nan
foot_diffs = foot_holds["difficulty"].dropna().values if len(foot_holds) > 0 else np.array([])
if len(foot_diffs) > 0:
features["mean_foot_difficulty"] = np.mean(foot_diffs)
features["max_foot_difficulty"] = np.max(foot_diffs)
features["std_foot_difficulty"] = np.std(foot_diffs)
else:
features["mean_foot_difficulty"] = np.nan
features["max_foot_difficulty"] = np.nan
features["std_foot_difficulty"] = np.nan
start_diffs = start_holds["difficulty"].dropna().values if len(start_holds) > 0 else np.array([])
finish_diffs = finish_holds["difficulty"].dropna().values if len(finish_holds) > 0 else np.array([])
features["start_difficulty"] = np.mean(start_diffs) if len(start_diffs) > 0 else np.nan
features["finish_difficulty"] = np.mean(finish_diffs) if len(finish_diffs) > 0 else np.nan
features["hand_foot_ratio"] = features["hand_holds"] / max(features["foot_holds"], 1)
features["movement_density"] = features["total_holds"] / max(features["height_gained"], 1)
features["hold_com_x"] = np.average(xs)
features["hold_com_y"] = np.average(ys)
if len(difficulties) > 0 and len(ys) >= len(difficulties):
weights = (ys[:len(difficulties)] - ys.min()) / max(ys.max() - ys.min(), 1) + 0.5
features["weighted_difficulty"] = np.average(difficulties, weights=weights)
else:
features["weighted_difficulty"] = features["mean_hold_difficulty"]
if len(df_holds) >= 3:
try:
points = np.column_stack([xs, ys])
hull = ConvexHull(points)
features["convex_hull_area"] = hull.volume
features["convex_hull_perimeter"] = hull.area
features["hull_area_to_bbox_ratio"] = features["convex_hull_area"] / max(features["bbox_area"], 1)
except Exception:
features["convex_hull_area"] = np.nan
features["convex_hull_perimeter"] = np.nan
features["hull_area_to_bbox_ratio"] = np.nan
else:
features["convex_hull_area"] = 0
features["convex_hull_perimeter"] = 0
features["hull_area_to_bbox_ratio"] = 0
if len(df_holds) >= 2:
points = np.column_stack([xs, ys])
distances = pdist(points)
features["min_nn_distance"] = np.min(distances)
features["mean_nn_distance"] = np.mean(distances)
features["max_nn_distance"] = np.max(distances)
features["std_nn_distance"] = np.std(distances)
else:
features["min_nn_distance"] = 0
features["mean_nn_distance"] = 0
features["max_nn_distance"] = 0
features["std_nn_distance"] = 0
if len(df_holds) >= 3:
points = np.column_stack([xs, ys])
dist_matrix = squareform(pdist(points))
threshold = 12.0
neighbors_count = (dist_matrix < threshold).sum(axis=1) - 1
features["mean_neighbors_12in"] = np.mean(neighbors_count)
features["max_neighbors_12in"] = np.max(neighbors_count)
avg_neighbors = np.mean(neighbors_count)
max_possible = len(df_holds) - 1
features["clustering_ratio"] = avg_neighbors / max_possible if max_possible > 0 else 0
else:
features["mean_neighbors_12in"] = 0
features["max_neighbors_12in"] = 0
features["clustering_ratio"] = 0
if len(df_holds) >= 2:
sorted_indices = np.argsort(ys)
sorted_points = np.column_stack([xs[sorted_indices], ys[sorted_indices]])
path_length = 0
for i in range(len(sorted_points) - 1):
dx = sorted_points[i + 1, 0] - sorted_points[i, 0]
dy = sorted_points[i + 1, 1] - sorted_points[i, 1]
path_length += np.sqrt(dx**2 + dy**2)
features["path_length_vertical"] = path_length
features["path_efficiency"] = features["height_gained"] / max(path_length, 1)
else:
features["path_length_vertical"] = 0
features["path_efficiency"] = 0
if pd.notna(features["finish_difficulty"]) and pd.notna(features["start_difficulty"]):
features["difficulty_gradient"] = features["finish_difficulty"] - features["start_difficulty"]
else:
features["difficulty_gradient"] = np.nan
if len(difficulties) > 0:
y_min_val, y_max_val = ys.min(), ys.max()
y_range = y_max_val - y_min_val
if y_range > 0:
lower_mask = ys <= (y_min_val + y_range / 3)
middle_mask = (ys > y_min_val + y_range / 3) & (ys <= y_min_val + 2 * y_range / 3)
upper_mask = ys > (y_min_val + 2 * y_range / 3)
df_with_diff = df_holds.copy()
df_with_diff["lower"] = lower_mask
df_with_diff["middle"] = middle_mask
df_with_diff["upper"] = upper_mask
lower_diffs = df_with_diff[df_with_diff["lower"] & df_with_diff["difficulty"].notna()]["difficulty"]
middle_diffs = df_with_diff[df_with_diff["middle"] & df_with_diff["difficulty"].notna()]["difficulty"]
upper_diffs = df_with_diff[df_with_diff["upper"] & df_with_diff["difficulty"].notna()]["difficulty"]
features["lower_region_difficulty"] = lower_diffs.mean() if len(lower_diffs) > 0 else np.nan
features["middle_region_difficulty"] = middle_diffs.mean() if len(middle_diffs) > 0 else np.nan
features["upper_region_difficulty"] = upper_diffs.mean() if len(upper_diffs) > 0 else np.nan
if pd.notna(features["lower_region_difficulty"]) and pd.notna(features["upper_region_difficulty"]):
features["difficulty_progression"] = features["upper_region_difficulty"] - features["lower_region_difficulty"]
else:
features["difficulty_progression"] = np.nan
else:
features["lower_region_difficulty"] = features["mean_hold_difficulty"]
features["middle_region_difficulty"] = features["mean_hold_difficulty"]
features["upper_region_difficulty"] = features["mean_hold_difficulty"]
features["difficulty_progression"] = 0
else:
features["lower_region_difficulty"] = np.nan
features["middle_region_difficulty"] = np.nan
features["upper_region_difficulty"] = np.nan
features["difficulty_progression"] = np.nan
if len(hand_holds) >= 2 and len(hand_diffs) >= 2:
hand_sorted = hand_holds.sort_values("y")
hand_diff_sorted = hand_sorted["difficulty"].dropna().values
if len(hand_diff_sorted) >= 2:
difficulty_jumps = np.abs(np.diff(hand_diff_sorted))
features["max_difficulty_jump"] = np.max(difficulty_jumps) if len(difficulty_jumps) > 0 else 0
features["mean_difficulty_jump"] = np.mean(difficulty_jumps) if len(difficulty_jumps) > 0 else 0
else:
features["max_difficulty_jump"] = 0
features["mean_difficulty_jump"] = 0
else:
features["max_difficulty_jump"] = 0
features["mean_difficulty_jump"] = 0
if len(hand_holds) >= 2 and len(hand_diffs) >= 2:
hand_sorted = hand_holds.sort_values("y")
xs_sorted = hand_sorted["x"].values
ys_sorted = hand_sorted["y"].values
diffs_sorted = hand_sorted["difficulty"].fillna(np.mean(hand_diffs)).values
weighted_reach = []
for i in range(len(hand_sorted) - 1):
dx = xs_sorted[i + 1] - xs_sorted[i]
dy = ys_sorted[i + 1] - ys_sorted[i]
dist = np.sqrt(dx**2 + dy**2)
avg_diff = (diffs_sorted[i] + diffs_sorted[i + 1]) / 2
weighted_reach.append(dist * avg_diff)
features["difficulty_weighted_reach"] = np.mean(weighted_reach) if weighted_reach else 0
features["max_weighted_reach"] = np.max(weighted_reach) if weighted_reach else 0
else:
features["difficulty_weighted_reach"] = 0
features["max_weighted_reach"] = 0
features["mean_x_normalized"] = (features["mean_x"] - x_min) / board_width
features["mean_y_normalized"] = (features["mean_y"] - y_min) / board_height
features["std_x_normalized"] = features["std_x"] / board_width
features["std_y_normalized"] = features["std_y"] / board_height
if pd.notna(features["start_height"]):
features["start_height_normalized"] = (features["start_height"] - y_min) / board_height
else:
features["start_height_normalized"] = np.nan
if pd.notna(features["finish_height"]):
features["finish_height_normalized"] = (features["finish_height"] - y_min) / board_height
else:
features["finish_height_normalized"] = np.nan
typical_start_y = y_min + board_height * 0.15
typical_finish_y = y_min + board_height * 0.85
if pd.notna(features["start_height"]):
features["start_offset_from_typical"] = abs(features["start_height"] - typical_start_y)
else:
features["start_offset_from_typical"] = np.nan
if pd.notna(features["finish_height"]):
features["finish_offset_from_typical"] = abs(features["finish_height"] - typical_finish_y)
else:
features["finish_offset_from_typical"] = np.nan
if len(start_holds) > 0:
start_y = start_holds["y"].mean()
features["mean_y_relative_to_start"] = features["mean_y"] - start_y
features["max_y_relative_to_start"] = features["max_y"] - start_y
else:
features["mean_y_relative_to_start"] = np.nan
features["max_y_relative_to_start"] = np.nan
features["spread_x_normalized"] = features["range_x"] / board_width
features["spread_y_normalized"] = features["range_y"] / board_height
features["bbox_coverage_x"] = features["range_x"] / board_width
features["bbox_coverage_y"] = features["range_y"] / board_height
y_quartiles = np.percentile(ys, [25, 50, 75])
features["y_q25"] = y_quartiles[0]
features["y_q50"] = y_quartiles[1]
features["y_q75"] = y_quartiles[2]
features["y_iqr"] = y_quartiles[2] - y_quartiles[0]
features["holds_bottom_quartile"] = (ys < y_quartiles[0]).sum()
features["holds_top_quartile"] = (ys >= y_quartiles[2]).sum()
return features
# ============================================================
# Model input preparation
# ============================================================
def prepare_feature_vector(features: dict) -> pd.DataFrame:
row = {}
for col in FEATURE_NAMES:
value = features.get(col, 0.0)
row[col] = 0.0 if pd.isna(value) else value
return pd.DataFrame([row], columns=FEATURE_NAMES)
# ============================================================
# Prediction helpers
# ============================================================
def format_prediction(pred: float):
rounded = int(round(pred))
rounded = max(min(rounded, MAX_GRADE), MIN_GRADE)
return {
"predicted_numeric": float(pred),
"predicted_display_difficulty": rounded,
"predicted_boulder_grade": grade_map[rounded],
}
def predict_with_model(model, X: pd.DataFrame, model_name: str):
model_name = normalize_model_name(model_name)
info = MODEL_REGISTRY[model_name]
if info["kind"] == "sklearn":
X_input = scaler.transform(X) if info["needs_scaling"] else X
pred = model.predict(X_input)[0]
return float(pred)
if info["kind"] == "torch_checkpoint":
if not TORCH_AVAILABLE:
raise ImportError("PyTorch is not installed.")
X_input = scaler.transform(X) if info["needs_scaling"] else X
X_tensor = torch.tensor(np.asarray(X_input), dtype=torch.float32)
with torch.no_grad():
out = model(X_tensor)
if isinstance(out, tuple):
out = out[0]
pred = np.asarray(out).reshape(-1)[0]
return float(pred)
raise ValueError(f"Unsupported model kind: {info['kind']}")
# ============================================================
# Public API
# ============================================================
def predict(
angle,
frames,
is_nomatch=0,
description="",
model_name=DEFAULT_MODEL,
return_numeric=False,
debug=False,
):
model_name = normalize_model_name(model_name)
model = load_model(model_name)
features = extract_features_from_raw(
angle=angle,
frames=frames,
is_nomatch=is_nomatch,
description=description,
)
X = prepare_feature_vector(features)
if debug:
print("\nNonzero / non-null feature values:")
for col, val in X.iloc[0].items():
if pd.notna(val) and val != 0:
print(f"{col}: {val}")
pred = predict_with_model(model, X, model_name=model_name)
if return_numeric:
return float(pred)
result = format_prediction(pred)
result["model"] = model_name
return result
def predict_csv(
input_csv,
output_csv=None,
model_name=DEFAULT_MODEL,
angle_col="angle",
frames_col="frames",
is_nomatch_col="is_nomatch",
description_col="description",
):
"""
Batch prediction over a CSV file.
Required columns:
- angle
- frames
Optional columns:
- is_nomatch
- description
"""
model_name = normalize_model_name(model_name)
df = pd.read_csv(input_csv)
if angle_col not in df.columns:
raise ValueError(f"Missing required column: '{angle_col}'")
if frames_col not in df.columns:
raise ValueError(f"Missing required column: '{frames_col}'")
results = []
for _, row in df.iterrows():
angle = row[angle_col]
frames = row[frames_col]
is_nomatch = row[is_nomatch_col] if is_nomatch_col in df.columns and pd.notna(row[is_nomatch_col]) else 0
description = row[description_col] if description_col in df.columns and pd.notna(row[description_col]) else ""
pred = predict(
angle=angle,
frames=frames,
is_nomatch=is_nomatch,
description=description,
model_name=model_name,
return_numeric=False,
debug=False,
)
results.append(pred)
pred_df = pd.DataFrame(results)
out = pd.concat([df.reset_index(drop=True), pred_df.reset_index(drop=True)], axis=1)
if output_csv is not None:
out.to_csv(output_csv, index=False)
return out
def evaluate_predictions(df, true_col="display_difficulty", pred_col="predicted_numeric"):
"""
Simple evaluation summary for labeled batch predictions.
"""
if true_col not in df.columns:
raise ValueError(f"Missing true target column: '{true_col}'")
if pred_col not in df.columns:
raise ValueError(f"Missing prediction column: '{pred_col}'")
y_true = df[true_col].astype(float)
y_pred = df[pred_col].astype(float)
mae = np.mean(np.abs(y_true - y_pred))
rmse = np.sqrt(np.mean((y_true - y_pred) ** 2))
within_1 = np.mean(np.abs(y_true - y_pred) <= 1)
within_2 = np.mean(np.abs(y_true - y_pred) <= 2)
return {
"mae": float(mae),
"rmse": float(rmse),
"within_1": float(within_1),
"within_2": float(within_2),
}
# ============================================================
# CLI
# ============================================================
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
# Single prediction mode
parser.add_argument("--angle", type=int)
parser.add_argument("--frames", type=str)
parser.add_argument("--is_nomatch", type=int, default=0)
parser.add_argument("--description", type=str, default="")
# Batch mode
parser.add_argument("--input_csv", type=str)
parser.add_argument("--output_csv", type=str)
parser.add_argument(
"--model",
type=str,
default=DEFAULT_MODEL,
choices=list(MODEL_REGISTRY.keys()) + ["nn"],
help="Which trained model to use",
)
parser.add_argument("--numeric", action="store_true")
parser.add_argument("--debug", action="store_true")
parser.add_argument("--evaluate", action="store_true")
args = parser.parse_args()
if args.input_csv:
df_out = predict_csv(
input_csv=args.input_csv,
output_csv=args.output_csv,
model_name=args.model,
)
print(df_out.head())
if args.evaluate:
try:
metrics = evaluate_predictions(df_out)
print("\nEvaluation:")
for k, v in metrics.items():
print(f"{k}: {v:.4f}")
except Exception as e:
print(f"\nCould not evaluate predictions: {e}")
else:
if args.angle is None or args.frames is None:
raise ValueError("For single prediction, you must provide --angle and --frames")
pred = predict(
angle=args.angle,
frames=args.frames,
is_nomatch=args.is_nomatch,
description=args.description,
model_name=args.model,
return_numeric=args.numeric,
debug=args.debug,
)
print(pred)

View File

@@ -0,0 +1,32 @@
import sqlite3
# Update path to your database file
db_path = "../data/tb2.db"
connection = sqlite3.connect(db_path)
cursor = connection.cursor()
# Get all table names
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;")
tables = [row[0] for row in cursor.fetchall()]
# Count rows for each table
results = []
for table in tables:
try:
cursor.execute(f"SELECT COUNT(*) FROM [{table}]")
count = cursor.fetchone()[0]
results.append((table, count))
except Exception as e:
results.append((table, f"Error: {e}"))
# Sort by row count descending
results.sort(key=lambda x: x[1] if isinstance(x[1], int) else -1, reverse=True)
# Print results
print(f"{'table_name':<30} | {'rows':>10}")
print("-" * 45)
for table, count in results:
print(f"{table:<30} | {count:>10}")
connection.close()

View File

@@ -2,11 +2,11 @@
* TB2 Data exploration
*
* We will set out to the understand the database structure,
* and to understand how this data actually produces climbs on a Tension Board.
* and to understand how this data actually models climbs on a Tension Board.
*
*
* This data was downloaded via boardlib (https://github.com/lemeryfertitta/BoardLib) on 2026-03-14.
* It is clear from the `kits` table that it was updated on 2026-01-22 (well, most of it).
* It is clear from the `shared_syncs` table that it was updated on 2026-01-22 (well, most of it).
*/
--------------------------------------------------------------------------------
@@ -117,7 +117,7 @@ uuid |layout_id|setter_id|setter_username|name
0027cc6eea485099809f5336a0452564| 9| 56399|memphisben |Pre Game |No matching.| 1| 8| 40| 40| 128| | 1| 0|p22r1p49r1p74r3p76r4p78r2p80r2 | 0| 1|2021-02-13 01:52:54.000000| 1|
002e2db25b124ff5719afdb2c6732b2c| 9| 33924|jfisch040 |Yoooooooooo | | 9| 16| 48| 4| 152| | 1| 0|p1r3p14r2p67r1p73r2p80r2p279r4 | 0| 1|2021-02-13 01:52:57.000000| 0|
* The frams column is what actually determines the holds on the climb, and what role they are.
* The frames column is what actually determines the holds on the climb, and what role they are.
* There are some climb characteristics (name, desceription, whether or not matching is allowed, setter info, edges, whether it is listed).
* The UUID is how we link the specifc climb to the other tables.
* What is hsm?
@@ -133,7 +133,7 @@ climb_uuid |ascensionist_count|display_difficulty|quality_a
0020974d7ee7f1b6d78b44a70f3fa27b| 1| 24.0| 3.0|
0024b68ebc5cbbcfbe653ec4ed224271| 1| 23.0| 3.0|
*
* climb_uuid, ascentionist_count, display difficulty, and quality_average.
* climb_uuid, ascensionist_count, display_difficulty, and quality_average.
*/
SELECT * FROM climb_stats;
@@ -835,7 +835,7 @@ id |product_id|name |x |y |mirrored_hole_id|mirror_group|
* So we understand HOW the board works pretty well now. Let's summarize.
* - There are about 128k climbs, across 3 layouts -- the TB1, TB2 (Mirror) and TB2 (Spray).
* - There are about 147k statistcs for climbs. This includes multiple angles for each climb.
* - Some key features are the frames, the angle, and the layout_id (the latter determins the board, the former the actual climb on the board)
* - Some key features are the frames, the angle, and the layout_id (the latter determines the board, the former the actual climb on the board)
* - Hold positions are decoded via mapping placements to (x,y) coordinates (from the holes tables)
* - There are four hold types: start, middle, finish, foot. 498 holds on the TB2.
* - There are different hold sets (e.g., Wood/Plastic on TB2).