Tension Board 2 Mirror: Grade Prediction with Deep Learning¶

This notebook tests whether a neural network can improve on the classical models from notebook 05. The feature set is the same; only the learning algorithm changes.

Neural Network Approach¶

We implement a straightforward feedforward network in PyTorch:

  1. Architecture
    A multi-layer perceptron with batch normalization and dropout. The input dimension matches the number of engineered features; the output is a single difficulty score.

  2. Training
    Adam optimizer with learning rate scheduling and early stopping. The validation set monitors for overfitting.

  3. Regularization
    Dropout and weight decay help prevent the network from memorizing training data.

Consistency Requirements¶

For fair comparison with notebook 05, we use the same random seed and the same test set. Predictions from both the Random Forest and Neural Network are saved so that an ensemble approach could be tested in future work.

Output¶

The final products are the trained neural network weights, test set predictions, and a comparison of neural network performance against Random Forest on identical test data.

Notebook Structure¶

  1. Setup and Imports
  2. Train/Test Split
  3. Setting up the Neural Network
  4. Training Configuration
  5. Training Loop
  6. Test Set Evaluation
  7. Visualization and Error Analysis
  8. Hyperparameter Tuning
  9. Feature Importance
  10. Save Model
  11. Compare with RF
  12. Conclusion

Setup and Imports¶

In [ ]:
"""
==================================
Setup and Imports
==================================
"""

# Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import matplotlib.patches as mpatches

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score

from scipy.spatial import ConvexHull
from scipy.spatial.distance import pdist, squareform

import sqlite3

import re
import os
from collections import defaultdict

import ast

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torch.optim.lr_scheduler import ReduceLROnPlateau

import warnings
warnings.filterwarnings('ignore')

from PIL import Image

# Set some display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

# Set style
palette=['steelblue', 'coral', 'seagreen']  #(for multi-bar graphs)

# Set board image for some visual analysis
board_img = Image.open('../images/tb2_board_12x12_composite.png')

# Connect to the database
DB_PATH="../data/tb2.db"
conn = sqlite3.connect(DB_PATH)

# Set random state
RANDOM_STATE=3

np.random.seed(RANDOM_STATE)
torch.manual_seed(RANDOM_STATE)
if torch.cuda.is_available():
    torch.cuda.manual_seed(RANDOM_STATE)

# Check for GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
In [ ]:
"""
==================================
Query our data from the DB
==================================

This time we restrict to where `layout_id=10` for the TB2 Mirror.
We will also restrict ourselves to an angle of at most 50, since according to our grade vs angle distribution in notebook 01, things start to look a bit weird past 50.
(Probably a bias towards climbers who can actually climb that steep). We will encode this directly into our query.
"""

# Query climbs data
climbs_query = """
SELECT
    c.uuid,
    c.name AS climb_name,
    c.setter_username,
    c.layout_id AS layout_id,
    c.description,
    c.is_nomatch,
    c.is_listed,
    l.name AS layout_name,
    p.name AS board_name,
    c.frames,
    cs.angle,
    cs.display_difficulty,
    dg.boulder_name AS boulder_grade,
    cs.ascensionist_count,
    cs.quality_average,
    cs.fa_at
    
FROM climbs c
JOIN layouts l ON c.layout_id = l.id
JOIN products p ON l.product_id = p.id
JOIN climb_stats cs ON c.uuid = cs.climb_uuid
JOIN difficulty_grades dg ON ROUND(cs.display_difficulty) = dg.difficulty
WHERE cs.display_difficulty IS NOT NULL AND c.is_listed=1 AND c.layout_id=10 AND cs.angle <= 50
"""

# Query information about placements (and their mirrors)
placements_query = """
SELECT
    p.id AS placement_id,
    h.x,
    h.y,
    p.default_placement_role_id AS default_role_id,
    p.set_id AS set_id,
    s.name AS set_name,
    p_mirror.id AS mirror_placement_id
FROM placements p
JOIN holes h ON p.hole_id = h.id
JOIN sets s ON p.set_id = s.id
LEFT JOIN holes h_mirror ON h.mirrored_hole_id = h_mirror.id
LEFT JOIN placements p_mirror ON p_mirror.hole_id = h_mirror.id AND p_mirror.layout_id = p.layout_id
WHERE p.layout_id = 10
"""

# Load it into a DataFrame
df_climbs = pd.read_sql_query(climbs_query, conn)
df_placements = pd.read_sql_query(placements_query, conn)

df_hold_difficulty = pd.read_csv('../data/03_hold_difficulty/hold_difficulty_scores.csv', index_col='placement_id')
df_features = pd.read_csv('../data/04_climb_features/climb_features.csv', index_col='climb_uuid')
In [ ]:
# Separate features and target
X = df_features.drop(columns=['display_difficulty'])
y = df_features['display_difficulty']

print(f"\nFeatures shape: {X.shape}")
print(f"Target range: {y.min():.1f} to {y.max():.1f}")
print(f"Target mean: {y.mean():.2f}")
print(f"Target std: {y.std():.2f}")

# Check for any remaining missing values
missing = X.isna().sum().sum()
print(f"\nMissing values in features: {missing}")

if missing > 0:
    print("Filling remaining missing values with column means...")
    X = X.fillna(X.mean())

Train/Test Split¶

In [ ]:
"""
========================
Train/Test split
========================
"""

# First split: 80% train+val, 20% test
X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=0.2, random_state=RANDOM_STATE
)

# Second split: 80% train, 20% validation from the remaining
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.2, random_state=RANDOM_STATE
)

print(f"Training set:   {len(X_train)} samples")
print(f"Validation set: {len(X_val)} samples")
print(f"Test set:       {len(X_test)} samples")

# Save test indices for ensemble consistency
test_indices = X_test.index.tolist()
np.save('../data/06_deep_learning/test_indices.npy', test_indices)
print("\nTest indices saved for ensemble consistency")

Setting up the Neural Network¶

In [ ]:
"""
========================
Feature Scaling
========================
"""

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

print(f"Features scaled")
print(f"Train mean: {X_train_scaled.mean():.4f}, std: {X_train_scaled.std():.4f}")
In [ ]:
"""
========================
Convert to PyTorch Tensors
========================
"""

# Convert to tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train).reshape(-1, 1)

X_val_tensor = torch.FloatTensor(X_val_scaled)
y_val_tensor = torch.FloatTensor(y_val).reshape(-1, 1)

X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test).reshape(-1, 1)

# Create DataLoaders
batch_size = 64

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_dataset = TensorDataset(X_val_tensor, y_val_tensor)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print(f"Batch size: {batch_size}")
print(f"Training batches: {len(train_loader)}")
print(f"Validation batches: {len(val_loader)}")
In [ ]:
"""
========================
Define Neural Network Architecture
========================
"""

class ClimbGradePredictor(nn.Module):
    """
    Neural network for climb grade prediction.
    
    Architecture:
    - Input layer
    - Multiple hidden layers with BatchNorm and Dropout
    - Output layer (single value)
    """
    
    def __init__(self, input_dim, hidden_layers=[256, 128, 64], dropout_rate=0.2):
        super(ClimbGradePredictor, self).__init__()
        
        layers = []
        prev_dim = input_dim
        
        for hidden_dim in hidden_layers:
            layers.append(nn.Linear(prev_dim, hidden_dim))
            layers.append(nn.BatchNorm1d(hidden_dim))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(dropout_rate))
            prev_dim = hidden_dim
        
        # Output layer
        layers.append(nn.Linear(prev_dim, 1))
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.network(x)


# Create model
input_dim = X_train.shape[1]
hidden_layers = [256, 128, 64]
dropout_rate = 0.2

model = ClimbGradePredictor(input_dim, hidden_layers, dropout_rate).to(device)

print(f"Model Architecture:\n{model}")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

Training Configuration¶

In [ ]:
"""
========================
Define training functions
========================
"""

grade_to_v = {
    10: 0, 11: 0, 12: 0,
    13: 1, 14: 1,
    15: 2,
    16: 3, 17: 3,
    18: 4, 19: 4,
    20: 5, 21: 5,
    22: 6,
    23: 7,
    24: 8, 25: 8,
    26: 9,
    27: 10,
    28: 11,
    29: 12,
    30: 13,
    31: 14,
    32: 15,
    33: 16,
}

def to_grouped_v(x):
    rounded = int(round(x))
    rounded = max(min(rounded, max(grade_to_v)), min(grade_to_v))
    return grade_to_v[rounded]

def grouped_v_metrics(y_true, y_pred):
    true_v = np.array([to_grouped_v(x) for x in y_true])
    pred_v = np.array([to_grouped_v(x) for x in y_pred])

    return {
        'exact_grouped_v': np.mean(true_v == pred_v) * 100,
        'within_1_vgrade': np.mean(np.abs(true_v - pred_v) <= 1) * 100,
        'within_2_vgrades': np.mean(np.abs(true_v - pred_v) <= 2) * 100
    }

def train_epoch(model, train_loader, criterion, optimizer, device):
    """Train for one epoch."""
    model.train()
    total_loss = 0

    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)

        optimizer.zero_grad()
        predictions = model(X_batch)
        loss = criterion(predictions, y_batch)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    return total_loss / len(train_loader)


def evaluate(model, data_loader, criterion, device):
    """Evaluate model on a dataset."""
    model.eval()
    total_loss = 0
    predictions_list = []
    actuals_list = []

    with torch.no_grad():
        for X_batch, y_batch in data_loader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)

            predictions = model(X_batch)
            loss = criterion(predictions, y_batch)

            total_loss += loss.item()
            predictions_list.append(predictions.cpu().numpy())
            actuals_list.append(y_batch.cpu().numpy())

    avg_loss = total_loss / len(data_loader)
    all_predictions = np.vstack(predictions_list).flatten()
    all_actuals = np.vstack(actuals_list).flatten()

    return avg_loss, all_predictions, all_actuals


def compute_metrics(y_true, y_pred):
    """Compute evaluation metrics."""
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    r2 = r2_score(y_true, y_pred)
    within_1 = np.mean(np.abs(y_true - y_pred) <= 1) * 100
    within_2 = np.mean(np.abs(y_true - y_pred) <= 2) * 100
    v_metrics = grouped_v_metrics(y_true, y_pred)

    return {
        'mae': mae,
        'rmse': rmse,
        'r2': r2,
        'within_1': within_1,
        'within_2': within_2,
        'exact_grouped_v': v_metrics['exact_grouped_v'],
        'within_1_vgrade': v_metrics['within_1_vgrade'],
        'within_2_vgrades': v_metrics['within_2_vgrades']
    }
In [ ]:
"""
========================
Training configuration
========================
"""

# Loss function
criterion = nn.MSELoss()

# Optimizer
learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)

# Learning rate scheduler
scheduler = ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=10
)

# Training settings
num_epochs = 200
early_stopping_patience = 25

print(f"Learning rate: {learning_rate}")
print(f"Max epochs: {num_epochs}")
print(f"Early stopping patience: {early_stopping_patience}")

Training Loop¶

In [ ]:
"""
========================
Training
========================
"""

# Training history
history = {
    'train_loss': [],
    'val_loss': [],
    'val_mae': [],
    'val_r2': []
}

best_val_loss = float('inf')
best_epoch = 0
epochs_no_improve = 0

print("Starting training...\n")

for epoch in range(num_epochs):
    # Train
    train_loss = train_epoch(model, train_loader, criterion, optimizer, device)
    
    # Validate
    val_loss, val_preds, val_actuals = evaluate(model, val_loader, criterion, device)
    val_metrics = compute_metrics(val_actuals, val_preds)
    
    # Update scheduler
    scheduler.step(val_loss)
    
    # Record history
    history['train_loss'].append(train_loss)
    history['val_loss'].append(val_loss)
    history['val_mae'].append(val_metrics['mae'])
    history['val_r2'].append(val_metrics['r2'])
    
    # Print progress
    if (epoch + 1) % 10 == 0 or epoch == 0:
        print(f"Epoch {epoch+1:3d}/{num_epochs} | "
              f"Train Loss: {train_loss:.4f} | "
              f"Val Loss: {val_loss:.4f} | "
              f"Val MAE: {val_metrics['mae']:.3f} | "
              f"Val R²: {val_metrics['r2']:.3f}")
    
    # Early stopping
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_epoch = epoch + 1
        epochs_no_improve = 0
        
        # Save best model
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'val_loss': val_loss,
        }, '../models/neural_network_best.pth')
    else:
        epochs_no_improve += 1
        
    if epochs_no_improve >= early_stopping_patience:
        print(f"\nEarly stopping at epoch {epoch + 1}")
        print(f"Best validation loss at epoch {best_epoch}: {best_val_loss:.4f}")
        break

print(f"\nTraining completed!")
print(f"Best epoch: {best_epoch}")
print(f"Best validation loss: {best_val_loss:.4f}")
In [ ]:
"""
========================
Plot Training History
========================
"""

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Loss
ax = axes[0]
ax.plot(history['train_loss'], label='Train Loss', linewidth=2)
ax.plot(history['val_loss'], label='Val Loss', linewidth=2)
ax.axvline(x=best_epoch-1, color='r', linestyle='--', label=f'Best Epoch ({best_epoch})')
ax.set_xlabel('Epoch', fontsize=12)
ax.set_ylabel('Loss (MSE)', fontsize=12)
ax.set_title('Training & Validation Loss', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

# MAE
ax = axes[1]
ax.plot(history['val_mae'], color='#e74c3c', linewidth=2)
ax.axvline(x=best_epoch-1, color='r', linestyle='--', label=f'Best Epoch')
ax.set_xlabel('Epoch', fontsize=12)
ax.set_ylabel('MAE', fontsize=12)
ax.set_title('Validation MAE', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

# R²
ax = axes[2]
ax.plot(history['val_r2'], color='#2ecc71', linewidth=2)
ax.axvline(x=best_epoch-1, color='r', linestyle='--', label=f'Best Epoch')
ax.set_xlabel('Epoch', fontsize=12)
ax.set_ylabel('R²', fontsize=12)
ax.set_title('Validation R²', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../images/06_deep_learning/neural_network_training.png', dpi=150, bbox_inches='tight')
plt.show()

Test set evaluation¶

In [ ]:
"""
========================
Load best model and evaluate on test set
========================
"""

# Load best model
checkpoint = torch.load('../models/neural_network_best.pth')
model.load_state_dict(checkpoint['model_state_dict'])

print(f"Loaded best model from epoch {checkpoint['epoch']+1}")

# Evaluate on test set
test_loss, test_preds, test_actuals = evaluate(model, test_loader, criterion, device)
test_metrics = compute_metrics(test_actuals, test_preds)

print("\n" + "=" * 50)
print("NEURAL NETWORK - TEST SET EVALUATION")
print("=" * 50)
print(f"\nMAE:  {test_metrics['mae']:.3f}")
print(f"RMSE: {test_metrics['rmse']:.3f}")
print(f"R²:   {test_metrics['r2']:.3f}")
print(f"\nAccuracy within ±1 grade: {test_metrics['within_1']:.1f}%")
print(f"Accuracy within ±2 grades: {test_metrics['within_2']:.1f}%")
print(f"\nExact grouped V-grade accuracy: {test_metrics['exact_grouped_v']:.1f}%")
print(f"Accuracy within ±1 V-grade: {test_metrics['within_1_vgrade']:.1f}%")
print(f"Accuracy within ±2 V-grades: {test_metrics['within_2_vgrades']:.1f}%")

Visualization and Error Analysis¶

In [ ]:
"""
========================
Visualize predictions
========================
"""

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Predicted vs Actual
ax = axes[0]
ax.scatter(test_actuals, test_preds, alpha=0.4, s=20)
min_val = min(test_actuals.min(), test_preds.min())
max_val = max(test_actuals.max(), test_preds.max())
ax.plot([min_val, max_val], [min_val, max_val], 'r--', lw=2, label='Perfect prediction')
ax.set_xlabel('Actual Difficulty', fontsize=12)
ax.set_ylabel('Predicted Difficulty', fontsize=12)
ax.set_title('Neural Network: Predicted vs Actual', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

# Residuals
ax = axes[1]
residuals = test_actuals - test_preds
ax.scatter(test_preds, residuals, alpha=0.4, s=20)
ax.axhline(y=0, color='r', linestyle='--', lw=2)
ax.set_xlabel('Predicted Difficulty', fontsize=12)
ax.set_ylabel('Residual (Actual - Predicted)', fontsize=12)
ax.set_title('Neural Network: Residuals', fontsize=14)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../images/06_deep_learning/neural_network_predictions.png', dpi=150, bbox_inches='tight')
plt.show()

# Error distribution
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(residuals, bins=50, edgecolor='black', alpha=0.7)
ax.axvline(x=0, color='r', linestyle='--', lw=2)
ax.set_xlabel('Prediction Error', fontsize=12)
ax.set_ylabel('Count', fontsize=12)
ax.set_title('Neural Network: Error Distribution', fontsize=14)

plt.tight_layout()
plt.savefig('../images/06_deep_learning/neural_network_errors.png', dpi=150, bbox_inches='tight')
plt.show()
In [ ]:
"""
========================
Error analysis by grade
========================
"""

df_analysis = pd.DataFrame({
    'actual': test_actuals,
    'predicted': test_preds,
    'error': test_actuals - test_preds,
    'abs_error': np.abs(test_actuals - test_preds)
})

grade_analysis = df_analysis.groupby('actual').agg(
    count=('actual', 'count'),
    mae=('abs_error', 'mean'),
    bias=('error', 'mean'),
    within_1=('error', lambda x: (np.abs(x) <= 1).mean() * 100)
).round(3)

print("### Error Analysis by Grade\n")
display(grade_analysis)

# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

ax = axes[0]
ax.bar(grade_analysis.index, grade_analysis['count'], color='#3498db', alpha=0.8)
ax.set_xlabel('Grade')
ax.set_ylabel('Count')
ax.set_title('Test Set Distribution by Grade')

ax = axes[1]
ax.bar(grade_analysis.index, grade_analysis['mae'], color='#e74c3c', alpha=0.8)
ax.set_xlabel('Grade')
ax.set_ylabel('MAE')
ax.set_title('MAE by Grade')

ax = axes[2]
colors = ['#2ecc71' if b >= 0 else '#e74c3c' for b in grade_analysis['bias']]
ax.bar(grade_analysis.index, grade_analysis['bias'], color=colors, alpha=0.8)
ax.set_xlabel('Grade')
ax.set_ylabel('Bias')
ax.set_title('Prediction Bias by Grade')
ax.axhline(y=0, color='black', linestyle='--', lw=1)

plt.tight_layout()
plt.savefig('../images/06_deep_learning/neural_network_by_grade.png', dpi=150, bbox_inches='tight')
plt.show()

Hyperparameter tuning¶

In [ ]:
"""
========================
Hyperparameter tuning - try different architectures
========================
"""

def train_and_evaluate_model(hidden_layers, dropout_rate, learning_rate, verbose=False):
    """Train a model with given hyperparameters and return validation metrics."""
    
    # Create model
    model = ClimbGradePredictor(input_dim, hidden_layers, dropout_rate).to(device)
    
    # Optimizer
    optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
    scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10)
    
    best_val_loss = float('inf')
    epochs_no_improve = 0
    patience = 20
    
    for epoch in range(150):  # Max epochs
        train_loss = train_epoch(model, train_loader, criterion, optimizer, device)
        val_loss, val_preds, val_actuals = evaluate(model, val_loader, criterion, device)
        scheduler.step(val_loss)
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            epochs_no_improve = 0
        else:
            epochs_no_improve += 1
        
        if epochs_no_improve >= patience:
            break
    
    # Final validation metrics
    _, val_preds, val_actuals = evaluate(model, val_loader, criterion, device)
    val_metrics = compute_metrics(val_actuals, val_preds)
    
    if verbose:
        print(f"Layers: {hidden_layers}, Dropout: {dropout_rate}, LR: {learning_rate}")
        print(f"  Val MAE: {val_metrics['mae']:.3f}, Val R²: {val_metrics['r2']:.3f}")
    
    return val_metrics, model


# Test different architectures
architectures = [
    {'hidden_layers': [128, 64], 'dropout_rate': 0.2, 'learning_rate': 0.001},
    {'hidden_layers': [256, 128, 64], 'dropout_rate': 0.2, 'learning_rate': 0.001},
    {'hidden_layers': [512, 256, 128], 'dropout_rate': 0.2, 'learning_rate': 0.001},
    {'hidden_layers': [256, 128, 64], 'dropout_rate': 0.3, 'learning_rate': 0.001},
    {'hidden_layers': [256, 128, 64], 'dropout_rate': 0.1, 'learning_rate': 0.001},
    {'hidden_layers': [256, 128, 64], 'dropout_rate': 0.2, 'learning_rate': 0.0005},
]

print("### Hyperparameter Search\n")

arch_results = []
for arch in architectures:
    metrics, _ = train_and_evaluate_model(**arch, verbose=True)
    arch_results.append({
        **arch,
        'val_mae': metrics['mae'],
        'val_r2': metrics['r2']
    })
    print()

arch_df = pd.DataFrame(arch_results).sort_values('val_mae')
print("\n### Architecture Comparison (sorted by Val MAE)\n")
display(arch_df)

Feature Importance¶

In [ ]:
"""
========================
Feature importance via permutation
========================
"""

from sklearn.inspection import permutation_importance
from sklearn.base import BaseEstimator, RegressorMixin

print("Computing feature importance via permutation...\n")

# Create sklearn-compatible wrapper
class TorchWrapper(BaseEstimator, RegressorMixin):
    """Sklearn-compatible wrapper for PyTorch model."""
    
    def __init__(self, model, device):
        self.model = model
        self.device = device
    
    def fit(self, X, y):
        # Already fitted, just return self for sklearn compatibility
        return self
    
    def predict(self, X):
        self.model.eval()
        with torch.no_grad():
            X_tensor = torch.FloatTensor(X).to(self.device)
            predictions = self.model(X_tensor).cpu().numpy().flatten()
        return predictions



# Load best model
model.load_state_dict(checkpoint['model_state_dict'])
wrapped_model = TorchWrapper(model, device)

# Compute permutation importance (on a sample for speed)
sample_size = min(1000, len(X_test))
X_test_sample = X_test_scaled[:sample_size]
y_test_sample = y_test[:sample_size]

result = permutation_importance(
    wrapped_model, X_test_sample, y_test_sample,
    n_repeats=10,
    random_state=RANDOM_STATE,
    scoring='neg_mean_absolute_error'
)

# Get feature importance
importance_df = pd.DataFrame({
    'feature': X.columns,
    'importance': result.importances_mean,
    'std': result.importances_std
}).sort_values('importance', ascending=False)

print("### Top 20 Most Important Features (Permutation)\n")
display(importance_df.head(20))

# Plot
fig, ax = plt.subplots(figsize=(10, 8))

top_features = importance_df.head(20)
ax.barh(range(len(top_features)), top_features['importance'], color='#3498db', alpha=0.8)
ax.set_yticks(range(len(top_features)))
ax.set_yticklabels(top_features['feature'])
ax.set_xlabel('Importance (decrease in MAE)', fontsize=12)
ax.set_title('Neural Network: Top 20 Features (Permutation Importance)', fontsize=14)
ax.invert_yaxis()

plt.tight_layout()
plt.savefig('../images/06_deep_learning/neural_network_feature_importance.png', dpi=150, bbox_inches='tight')
plt.show()

Save Model¶

In [ ]:
"""
========================
Save final model and predictions
========================
"""

# Save the model
torch.save({
    'model_state_dict': model.state_dict(),
    'input_dim': input_dim,
    'hidden_layers': hidden_layers,
    'dropout_rate': dropout_rate,
    'scaler_mean': scaler.mean_,
    'scaler_scale': scaler.scale_,
    'feature_names': X.columns.tolist(),
}, '../models/neural_network_final.pth')

# Save predictions for ensemble
np.save('../data/06_deep_learning/nn_test_predictions.npy', test_preds)
np.save('../data/06_deep_learning/nn_test_actuals.npy', test_actuals)

# Save test features
pd.DataFrame(X_test_scaled, columns=X.columns, index=X_test.index).to_csv(
    '../data/06_deep_learning/nn_test_features.csv'
)

print("Saved:")
print("  - ../models/neural_network_final.pth")
print("  - ../data/06_deep_learning/nn_test_predictions.npy")
print("  - ../data/06_deep_learning/nn_test_actuals.npy")
print("  - ../data/06_deep_learning/nn_test_features.csv")

Comparison with RF¶

In [ ]:
"""
========================
Comparison with random forest
========================
"""

# Load Random Forest predictions if available
try:
    rf_preds = np.load('../data/06_deep_learning/rf_test_predictions.npy')
    rf_actuals = np.load('../data/06_deep_learning/rf_test_actuals.npy')

    # Compare
    rf_metrics = compute_metrics(rf_actuals, rf_preds)
    nn_metrics = compute_metrics(test_actuals, test_preds)

    comparison = pd.DataFrame({
        'Metric': [
            'MAE', 'RMSE', 'R²',
            'Within ±1', 'Within ±2',
            'Exact grouped V',
            'Within ±1 V', 'Within ±2 V'
        ],
        'Random Forest': [
            rf_metrics['mae'], rf_metrics['rmse'], rf_metrics['r2'], 
            rf_metrics['within_1'], rf_metrics['within_2'],
            rf_metrics['exact_grouped_v'],
            rf_metrics['within_1_vgrade'], rf_metrics['within_2_vgrades']
        ],
        'Neural Network': [
            nn_metrics['mae'], nn_metrics['rmse'], nn_metrics['r2'],
            nn_metrics['within_1'], nn_metrics['within_2'],
            nn_metrics['exact_grouped_v'],
            nn_metrics['within_1_vgrade'], nn_metrics['within_2_vgrades']
        ]
    })

    print("### Model Comparison\n")
    display(comparison)

    # Plot comparison
    fig, axes = plt.subplots(1, 4, figsize=(18, 5))

    x = [0, 1]

    # MAE
    ax = axes[0]
    ax.bar(x, [rf_metrics['mae'], nn_metrics['mae']], color=['#3498db', '#e74c3c'])
    ax.set_xticks(x)
    ax.set_xticklabels(['Random Forest', 'Neural Network'])
    ax.set_ylabel('MAE')
    ax.set_title('Mean Absolute Error')

    # R²
    ax = axes[1]
    ax.bar(x, [rf_metrics['r2'], nn_metrics['r2']], color=['#3498db', '#e74c3c'])
    ax.set_xticks(x)
    ax.set_xticklabels(['Random Forest', 'Neural Network'])
    ax.set_ylabel('R²')
    ax.set_title('R² Score')

    # Within ±1 fine-grained difficulty
    ax = axes[2]
    ax.bar(x, [rf_metrics['within_1'], nn_metrics['within_1']], color=['#3498db', '#e74c3c'])
    ax.set_xticks(x)
    ax.set_xticklabels(['Random Forest', 'Neural Network'])
    ax.set_ylabel('Percent')
    ax.set_title('Within ±1 Difficulty')

    # Within ±1 grouped V-grade
    ax = axes[3]
    ax.bar(x, [rf_metrics['within_1_vgrade'], nn_metrics['within_1_vgrade']], color=['#3498db', '#e74c3c'])
    ax.set_xticks(x)
    ax.set_xticklabels(['Random Forest', 'Neural Network'])
    ax.set_ylabel('Percent')
    ax.set_title('Within ±1 V-grade')

    plt.tight_layout()
    plt.savefig('../images/06_deep_learning/rf_vs_nn_comparison.png', dpi=150, bbox_inches='tight')
    plt.show()

except Exception as e:
    print(f"Could not compare with Random Forest: {e}")
    print("Run Notebook 05 first to generate RF prediction files.")

Conclusion¶

In [ ]:
"""
========================
Final Summary
========================
"""

summary = f"""
### Neural Network Model Summary

**Architecture:**
- Input: {input_dim} features
- Hidden layers: {hidden_layers}
- Dropout rate: {dropout_rate}
- Total parameters: {total_params:,}

**Training:**
- Optimizer: Adam (lr={learning_rate})
- Early stopping: {early_stopping_patience} epochs patience
- Best epoch: {best_epoch}

**Test Set Performance:**
- MAE: {test_metrics['mae']:.3f}
- RMSE: {test_metrics['rmse']:.3f}
- R²: {test_metrics['r2']:.3f}
- Accuracy within ±1 grade: {test_metrics['within_1']:.1f}%
- Accuracy within ±2 grades: {test_metrics['within_2']:.1f}%
- Exact grouped V-grade accuracy: {test_metrics['exact_grouped_v']:.1f}%
- Accuracy within ±1 V-grade: {test_metrics['within_1_vgrade']:.1f}%
- Accuracy within ±2 V-grades: {test_metrics['within_2_vgrades']:.1f}%

**Key Findings:**
1. The neural network is competitive, but not clearly stronger than the best tree-based baseline.
2. Fine-grained score prediction remains harder than grouped grade prediction.
3. The grouped V-grade metrics show that the model captures broader difficulty bands more reliably than exact score labels.
4. This makes the neural network useful as a comparison model, and potentially valuable in an ensemble.

**Portfolio Interpretation:**
This deep learning notebook extends the classical modelling pipeline by testing whether a neural architecture can improve prediction quality on engineered climbing features.
The main result is not that deep learning wins outright, but that it provides a meaningful benchmark and helps clarify where model complexity does and does not add value.
"""

print(summary)

# Save summary
with open('../data/06_deep_learning/neural_network_summary.txt', 'w') as f:
    f.write(summary)

print("\nSummary saved to ../data/06_deep_learning/neural_network_summary.txt")