Files
Tension-Board-2-Analysis/notebooks/06_deep_learning.ipynb
2026-03-26 21:07:12 -04:00

1308 lines
44 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "f5269664",
"metadata": {},
"source": [
"# Tension Board 2 Mirror: Grade Prediction with Deep Learning\n",
"\n",
"This notebook tests whether a neural network can improve on the classical models from notebook 05. The feature set is the same; only the learning algorithm changes.\n",
"\n",
"## Neural Network Approach\n",
"\n",
"We implement a straightforward feedforward network in PyTorch:\n",
"\n",
"1. **Architecture** \n",
" A multi-layer perceptron with batch normalization and dropout. The input dimension matches the number of engineered features; the output is a single difficulty score.\n",
"\n",
"2. **Training** \n",
" Adam optimizer with learning rate scheduling and early stopping. The validation set monitors for overfitting.\n",
"\n",
"3. **Regularization** \n",
" Dropout and weight decay help prevent the network from memorizing training data.\n",
"\n",
"## Consistency Requirements\n",
"\n",
"For fair comparison with notebook 05, we use the same random seed and the same test set. Predictions from both the Random Forest and Neural Network are saved so that an ensemble approach could be tested in future work.\n",
"\n",
"## Output\n",
"\n",
"The final products are the trained neural network weights, test set predictions, and a comparison of neural network performance against Random Forest on identical test data.\n",
"\n",
"## Notebook Structure\n",
"\n",
"1. [Setup and Imports](#setup-and-imports)\n",
"2. [Train/Test Split](#traintest-split)\n",
"3. [Setting up the Neural Network](#setting-up-the-neural-network)\n",
"4. [Training Configuration](#training-configuration)\n",
"5. [Training Loop](#training-loop)\n",
"6. [Test Set Evaluation](#test-set-evaluation)\n",
"7. [Visualization and Error Analysis](#visualization-and-error-analysis)\n",
"8. [Hyperparameter Tuning](#hyperparameter-tuning)\n",
"9. [Feature Importance](#feature-importance)\n",
"10. [Save Model](#save-model)\n",
"11. [Compare with RF](#comparison-with-rf)\n",
"12. [Conclusion](#conclusion)"
]
},
{
"cell_type": "markdown",
"id": "2344030f",
"metadata": {},
"source": [
"# Setup and Imports\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99f64922",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"==================================\n",
"Setup and Imports\n",
"==================================\n",
"\"\"\"\n",
"\n",
"# Imports\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import numpy as np\n",
"import matplotlib.patches as mpatches\n",
"\n",
"from sklearn.ensemble import RandomForestRegressor\n",
"from sklearn.model_selection import cross_val_score\n",
"\n",
"from scipy.spatial import ConvexHull\n",
"from scipy.spatial.distance import pdist, squareform\n",
"\n",
"import sqlite3\n",
"\n",
"import re\n",
"import os\n",
"from collections import defaultdict\n",
"\n",
"import ast\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n",
"\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"from torch.utils.data import DataLoader, TensorDataset\n",
"from torch.optim.lr_scheduler import ReduceLROnPlateau\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"from PIL import Image\n",
"\n",
"# Set some display options\n",
"pd.set_option('display.max_columns', None)\n",
"pd.set_option('display.max_rows', 100)\n",
"\n",
"# Set style\n",
"palette=['steelblue', 'coral', 'seagreen'] #(for multi-bar graphs)\n",
"\n",
"# Set board image for some visual analysis\n",
"board_img = Image.open('../images/tb2_board_12x12_composite.png')\n",
"\n",
"# Connect to the database\n",
"DB_PATH=\"../data/tb2.db\"\n",
"conn = sqlite3.connect(DB_PATH)\n",
"\n",
"# Set random state\n",
"RANDOM_STATE=3\n",
"\n",
"np.random.seed(RANDOM_STATE)\n",
"torch.manual_seed(RANDOM_STATE)\n",
"if torch.cuda.is_available():\n",
" torch.cuda.manual_seed(RANDOM_STATE)\n",
"\n",
"# Check for GPU\n",
"device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
"print(f\"Using device: {device}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a9e2443",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"==================================\n",
"Query our data from the DB\n",
"==================================\n",
"\n",
"This time we restrict to where `layout_id=10` for the TB2 Mirror.\n",
"We will also restrict ourselves to an angle of at most 50, since according to our grade vs angle distribution in notebook 01, things start to look a bit weird past 50.\n",
"(Probably a bias towards climbers who can actually climb that steep). We will encode this directly into our query.\n",
"\"\"\"\n",
"\n",
"# Query climbs data\n",
"climbs_query = \"\"\"\n",
"SELECT\n",
" c.uuid,\n",
" c.name AS climb_name,\n",
" c.setter_username,\n",
" c.layout_id AS layout_id,\n",
" c.description,\n",
" c.is_nomatch,\n",
" c.is_listed,\n",
" l.name AS layout_name,\n",
" p.name AS board_name,\n",
" c.frames,\n",
" cs.angle,\n",
" cs.display_difficulty,\n",
" dg.boulder_name AS boulder_grade,\n",
" cs.ascensionist_count,\n",
" cs.quality_average,\n",
" cs.fa_at\n",
" \n",
"FROM climbs c\n",
"JOIN layouts l ON c.layout_id = l.id\n",
"JOIN products p ON l.product_id = p.id\n",
"JOIN climb_stats cs ON c.uuid = cs.climb_uuid\n",
"JOIN difficulty_grades dg ON ROUND(cs.display_difficulty) = dg.difficulty\n",
"WHERE cs.display_difficulty IS NOT NULL AND c.is_listed=1 AND c.layout_id=10 AND cs.angle <= 50\n",
"\"\"\"\n",
"\n",
"# Query information about placements (and their mirrors)\n",
"placements_query = \"\"\"\n",
"SELECT\n",
" p.id AS placement_id,\n",
" h.x,\n",
" h.y,\n",
" p.default_placement_role_id AS default_role_id,\n",
" p.set_id AS set_id,\n",
" s.name AS set_name,\n",
" p_mirror.id AS mirror_placement_id\n",
"FROM placements p\n",
"JOIN holes h ON p.hole_id = h.id\n",
"JOIN sets s ON p.set_id = s.id\n",
"LEFT JOIN holes h_mirror ON h.mirrored_hole_id = h_mirror.id\n",
"LEFT JOIN placements p_mirror ON p_mirror.hole_id = h_mirror.id AND p_mirror.layout_id = p.layout_id\n",
"WHERE p.layout_id = 10\n",
"\"\"\"\n",
"\n",
"# Load it into a DataFrame\n",
"df_climbs = pd.read_sql_query(climbs_query, conn)\n",
"df_placements = pd.read_sql_query(placements_query, conn)\n",
"\n",
"df_hold_difficulty = pd.read_csv('../data/03_hold_difficulty/hold_difficulty_scores.csv', index_col='placement_id')\n",
"df_features = pd.read_csv('../data/04_climb_features/climb_features.csv', index_col='climb_uuid')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1531f8b",
"metadata": {},
"outputs": [],
"source": [
"# Separate features and target\n",
"X = df_features.drop(columns=['display_difficulty'])\n",
"y = df_features['display_difficulty']\n",
"\n",
"print(f\"\\nFeatures shape: {X.shape}\")\n",
"print(f\"Target range: {y.min():.1f} to {y.max():.1f}\")\n",
"print(f\"Target mean: {y.mean():.2f}\")\n",
"print(f\"Target std: {y.std():.2f}\")\n",
"\n",
"# Check for any remaining missing values\n",
"missing = X.isna().sum().sum()\n",
"print(f\"\\nMissing values in features: {missing}\")\n",
"\n",
"if missing > 0:\n",
" print(\"Filling remaining missing values with column means...\")\n",
" X = X.fillna(X.mean())"
]
},
{
"cell_type": "markdown",
"id": "d4abfd9b",
"metadata": {},
"source": [
"# Train/Test Split"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63a2acf9",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Train/Test split\n",
"========================\n",
"\"\"\"\n",
"\n",
"# First split: 80% train+val, 20% test\n",
"X_temp, X_test, y_temp, y_test = train_test_split(\n",
" X, y, test_size=0.2, random_state=RANDOM_STATE\n",
")\n",
"\n",
"# Second split: 80% train, 20% validation from the remaining\n",
"X_train, X_val, y_train, y_val = train_test_split(\n",
" X_temp, y_temp, test_size=0.2, random_state=RANDOM_STATE\n",
")\n",
"\n",
"print(f\"Training set: {len(X_train)} samples\")\n",
"print(f\"Validation set: {len(X_val)} samples\")\n",
"print(f\"Test set: {len(X_test)} samples\")\n",
"\n",
"# Save test indices for ensemble consistency\n",
"test_indices = X_test.index.tolist()\n",
"np.save('../data/06_deep_learning/test_indices.npy', test_indices)\n",
"print(\"\\nTest indices saved for ensemble consistency\")"
]
},
{
"cell_type": "markdown",
"id": "90fa8ae5",
"metadata": {},
"source": [
"# Setting up the Neural Network"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a43314c",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Feature Scaling\n",
"========================\n",
"\"\"\"\n",
"\n",
"scaler = StandardScaler()\n",
"\n",
"X_train_scaled = scaler.fit_transform(X_train)\n",
"X_val_scaled = scaler.transform(X_val)\n",
"X_test_scaled = scaler.transform(X_test)\n",
"\n",
"print(f\"Features scaled\")\n",
"print(f\"Train mean: {X_train_scaled.mean():.4f}, std: {X_train_scaled.std():.4f}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "50fbb82c",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Convert to PyTorch Tensors\n",
"========================\n",
"\"\"\"\n",
"\n",
"# Convert to tensors\n",
"X_train_tensor = torch.FloatTensor(X_train_scaled)\n",
"y_train_tensor = torch.FloatTensor(y_train).reshape(-1, 1)\n",
"\n",
"X_val_tensor = torch.FloatTensor(X_val_scaled)\n",
"y_val_tensor = torch.FloatTensor(y_val).reshape(-1, 1)\n",
"\n",
"X_test_tensor = torch.FloatTensor(X_test_scaled)\n",
"y_test_tensor = torch.FloatTensor(y_test).reshape(-1, 1)\n",
"\n",
"# Create DataLoaders\n",
"batch_size = 64\n",
"\n",
"train_dataset = TensorDataset(X_train_tensor, y_train_tensor)\n",
"train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)\n",
"\n",
"val_dataset = TensorDataset(X_val_tensor, y_val_tensor)\n",
"val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)\n",
"\n",
"test_dataset = TensorDataset(X_test_tensor, y_test_tensor)\n",
"test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)\n",
"\n",
"print(f\"Batch size: {batch_size}\")\n",
"print(f\"Training batches: {len(train_loader)}\")\n",
"print(f\"Validation batches: {len(val_loader)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62c2db48",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Define Neural Network Architecture\n",
"========================\n",
"\"\"\"\n",
"\n",
"class ClimbGradePredictor(nn.Module):\n",
" \"\"\"\n",
" Neural network for climb grade prediction.\n",
" \n",
" Architecture:\n",
" - Input layer\n",
" - Multiple hidden layers with BatchNorm and Dropout\n",
" - Output layer (single value)\n",
" \"\"\"\n",
" \n",
" def __init__(self, input_dim, hidden_layers=[256, 128, 64], dropout_rate=0.2):\n",
" super(ClimbGradePredictor, self).__init__()\n",
" \n",
" layers = []\n",
" prev_dim = input_dim\n",
" \n",
" for hidden_dim in hidden_layers:\n",
" layers.append(nn.Linear(prev_dim, hidden_dim))\n",
" layers.append(nn.BatchNorm1d(hidden_dim))\n",
" layers.append(nn.ReLU())\n",
" layers.append(nn.Dropout(dropout_rate))\n",
" prev_dim = hidden_dim\n",
" \n",
" # Output layer\n",
" layers.append(nn.Linear(prev_dim, 1))\n",
" \n",
" self.network = nn.Sequential(*layers)\n",
" \n",
" def forward(self, x):\n",
" return self.network(x)\n",
"\n",
"\n",
"# Create model\n",
"input_dim = X_train.shape[1]\n",
"hidden_layers = [256, 128, 64]\n",
"dropout_rate = 0.2\n",
"\n",
"model = ClimbGradePredictor(input_dim, hidden_layers, dropout_rate).to(device)\n",
"\n",
"print(f\"Model Architecture:\\n{model}\")\n",
"\n",
"# Count parameters\n",
"total_params = sum(p.numel() for p in model.parameters())\n",
"trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
"print(f\"\\nTotal parameters: {total_params:,}\")\n",
"print(f\"Trainable parameters: {trainable_params:,}\")"
]
},
{
"cell_type": "markdown",
"id": "ded8d846",
"metadata": {},
"source": [
"# Training Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "665deadb",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Define training functions\n",
"========================\n",
"\"\"\"\n",
"\n",
"grade_to_v = {\n",
" 10: 0, 11: 0, 12: 0,\n",
" 13: 1, 14: 1,\n",
" 15: 2,\n",
" 16: 3, 17: 3,\n",
" 18: 4, 19: 4,\n",
" 20: 5, 21: 5,\n",
" 22: 6,\n",
" 23: 7,\n",
" 24: 8, 25: 8,\n",
" 26: 9,\n",
" 27: 10,\n",
" 28: 11,\n",
" 29: 12,\n",
" 30: 13,\n",
" 31: 14,\n",
" 32: 15,\n",
" 33: 16,\n",
"}\n",
"\n",
"def to_grouped_v(x):\n",
" rounded = int(round(x))\n",
" rounded = max(min(rounded, max(grade_to_v)), min(grade_to_v))\n",
" return grade_to_v[rounded]\n",
"\n",
"def grouped_v_metrics(y_true, y_pred):\n",
" true_v = np.array([to_grouped_v(x) for x in y_true])\n",
" pred_v = np.array([to_grouped_v(x) for x in y_pred])\n",
"\n",
" return {\n",
" 'exact_grouped_v': np.mean(true_v == pred_v) * 100,\n",
" 'within_1_vgrade': np.mean(np.abs(true_v - pred_v) <= 1) * 100,\n",
" 'within_2_vgrades': np.mean(np.abs(true_v - pred_v) <= 2) * 100\n",
" }\n",
"\n",
"def train_epoch(model, train_loader, criterion, optimizer, device):\n",
" \"\"\"Train for one epoch.\"\"\"\n",
" model.train()\n",
" total_loss = 0\n",
"\n",
" for X_batch, y_batch in train_loader:\n",
" X_batch, y_batch = X_batch.to(device), y_batch.to(device)\n",
"\n",
" optimizer.zero_grad()\n",
" predictions = model(X_batch)\n",
" loss = criterion(predictions, y_batch)\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" total_loss += loss.item()\n",
"\n",
" return total_loss / len(train_loader)\n",
"\n",
"\n",
"def evaluate(model, data_loader, criterion, device):\n",
" \"\"\"Evaluate model on a dataset.\"\"\"\n",
" model.eval()\n",
" total_loss = 0\n",
" predictions_list = []\n",
" actuals_list = []\n",
"\n",
" with torch.no_grad():\n",
" for X_batch, y_batch in data_loader:\n",
" X_batch, y_batch = X_batch.to(device), y_batch.to(device)\n",
"\n",
" predictions = model(X_batch)\n",
" loss = criterion(predictions, y_batch)\n",
"\n",
" total_loss += loss.item()\n",
" predictions_list.append(predictions.cpu().numpy())\n",
" actuals_list.append(y_batch.cpu().numpy())\n",
"\n",
" avg_loss = total_loss / len(data_loader)\n",
" all_predictions = np.vstack(predictions_list).flatten()\n",
" all_actuals = np.vstack(actuals_list).flatten()\n",
"\n",
" return avg_loss, all_predictions, all_actuals\n",
"\n",
"\n",
"def compute_metrics(y_true, y_pred):\n",
" \"\"\"Compute evaluation metrics.\"\"\"\n",
" mae = mean_absolute_error(y_true, y_pred)\n",
" rmse = np.sqrt(mean_squared_error(y_true, y_pred))\n",
" r2 = r2_score(y_true, y_pred)\n",
" within_1 = np.mean(np.abs(y_true - y_pred) <= 1) * 100\n",
" within_2 = np.mean(np.abs(y_true - y_pred) <= 2) * 100\n",
" v_metrics = grouped_v_metrics(y_true, y_pred)\n",
"\n",
" return {\n",
" 'mae': mae,\n",
" 'rmse': rmse,\n",
" 'r2': r2,\n",
" 'within_1': within_1,\n",
" 'within_2': within_2,\n",
" 'exact_grouped_v': v_metrics['exact_grouped_v'],\n",
" 'within_1_vgrade': v_metrics['within_1_vgrade'],\n",
" 'within_2_vgrades': v_metrics['within_2_vgrades']\n",
" }\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d9c040e",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Training configuration\n",
"========================\n",
"\"\"\"\n",
"\n",
"# Loss function\n",
"criterion = nn.MSELoss()\n",
"\n",
"# Optimizer\n",
"learning_rate = 0.001\n",
"optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)\n",
"\n",
"# Learning rate scheduler\n",
"scheduler = ReduceLROnPlateau(\n",
" optimizer, mode='min', factor=0.5, patience=10\n",
")\n",
"\n",
"# Training settings\n",
"num_epochs = 200\n",
"early_stopping_patience = 25\n",
"\n",
"print(f\"Learning rate: {learning_rate}\")\n",
"print(f\"Max epochs: {num_epochs}\")\n",
"print(f\"Early stopping patience: {early_stopping_patience}\")"
]
},
{
"cell_type": "markdown",
"id": "35d4bd8b",
"metadata": {},
"source": [
"# Training Loop"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "476b158d",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Training\n",
"========================\n",
"\"\"\"\n",
"\n",
"# Training history\n",
"history = {\n",
" 'train_loss': [],\n",
" 'val_loss': [],\n",
" 'val_mae': [],\n",
" 'val_r2': []\n",
"}\n",
"\n",
"best_val_loss = float('inf')\n",
"best_epoch = 0\n",
"epochs_no_improve = 0\n",
"\n",
"print(\"Starting training...\\n\")\n",
"\n",
"for epoch in range(num_epochs):\n",
" # Train\n",
" train_loss = train_epoch(model, train_loader, criterion, optimizer, device)\n",
" \n",
" # Validate\n",
" val_loss, val_preds, val_actuals = evaluate(model, val_loader, criterion, device)\n",
" val_metrics = compute_metrics(val_actuals, val_preds)\n",
" \n",
" # Update scheduler\n",
" scheduler.step(val_loss)\n",
" \n",
" # Record history\n",
" history['train_loss'].append(train_loss)\n",
" history['val_loss'].append(val_loss)\n",
" history['val_mae'].append(val_metrics['mae'])\n",
" history['val_r2'].append(val_metrics['r2'])\n",
" \n",
" # Print progress\n",
" if (epoch + 1) % 10 == 0 or epoch == 0:\n",
" print(f\"Epoch {epoch+1:3d}/{num_epochs} | \"\n",
" f\"Train Loss: {train_loss:.4f} | \"\n",
" f\"Val Loss: {val_loss:.4f} | \"\n",
" f\"Val MAE: {val_metrics['mae']:.3f} | \"\n",
" f\"Val R²: {val_metrics['r2']:.3f}\")\n",
" \n",
" # Early stopping\n",
" if val_loss < best_val_loss:\n",
" best_val_loss = val_loss\n",
" best_epoch = epoch + 1\n",
" epochs_no_improve = 0\n",
" \n",
" # Save best model\n",
" torch.save({\n",
" 'epoch': epoch,\n",
" 'model_state_dict': model.state_dict(),\n",
" 'optimizer_state_dict': optimizer.state_dict(),\n",
" 'val_loss': val_loss,\n",
" }, '../models/neural_network_best.pth')\n",
" else:\n",
" epochs_no_improve += 1\n",
" \n",
" if epochs_no_improve >= early_stopping_patience:\n",
" print(f\"\\nEarly stopping at epoch {epoch + 1}\")\n",
" print(f\"Best validation loss at epoch {best_epoch}: {best_val_loss:.4f}\")\n",
" break\n",
"\n",
"print(f\"\\nTraining completed!\")\n",
"print(f\"Best epoch: {best_epoch}\")\n",
"print(f\"Best validation loss: {best_val_loss:.4f}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2ef949f2",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Plot Training History\n",
"========================\n",
"\"\"\"\n",
"\n",
"fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n",
"\n",
"# Loss\n",
"ax = axes[0]\n",
"ax.plot(history['train_loss'], label='Train Loss', linewidth=2)\n",
"ax.plot(history['val_loss'], label='Val Loss', linewidth=2)\n",
"ax.axvline(x=best_epoch-1, color='r', linestyle='--', label=f'Best Epoch ({best_epoch})')\n",
"ax.set_xlabel('Epoch', fontsize=12)\n",
"ax.set_ylabel('Loss (MSE)', fontsize=12)\n",
"ax.set_title('Training & Validation Loss', fontsize=14)\n",
"ax.legend()\n",
"ax.grid(True, alpha=0.3)\n",
"\n",
"# MAE\n",
"ax = axes[1]\n",
"ax.plot(history['val_mae'], color='#e74c3c', linewidth=2)\n",
"ax.axvline(x=best_epoch-1, color='r', linestyle='--', label=f'Best Epoch')\n",
"ax.set_xlabel('Epoch', fontsize=12)\n",
"ax.set_ylabel('MAE', fontsize=12)\n",
"ax.set_title('Validation MAE', fontsize=14)\n",
"ax.legend()\n",
"ax.grid(True, alpha=0.3)\n",
"\n",
"# R²\n",
"ax = axes[2]\n",
"ax.plot(history['val_r2'], color='#2ecc71', linewidth=2)\n",
"ax.axvline(x=best_epoch-1, color='r', linestyle='--', label=f'Best Epoch')\n",
"ax.set_xlabel('Epoch', fontsize=12)\n",
"ax.set_ylabel('R²', fontsize=12)\n",
"ax.set_title('Validation R²', fontsize=14)\n",
"ax.legend()\n",
"ax.grid(True, alpha=0.3)\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig('../images/06_deep_learning/neural_network_training.png', dpi=150, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "589ad448",
"metadata": {},
"source": [
"# Test set evaluation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9abc3a72",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Load best model and evaluate on test set\n",
"========================\n",
"\"\"\"\n",
"\n",
"# Load best model\n",
"checkpoint = torch.load('../models/neural_network_best.pth')\n",
"model.load_state_dict(checkpoint['model_state_dict'])\n",
"\n",
"print(f\"Loaded best model from epoch {checkpoint['epoch']+1}\")\n",
"\n",
"# Evaluate on test set\n",
"test_loss, test_preds, test_actuals = evaluate(model, test_loader, criterion, device)\n",
"test_metrics = compute_metrics(test_actuals, test_preds)\n",
"\n",
"print(\"\\n\" + \"=\" * 50)\n",
"print(\"NEURAL NETWORK - TEST SET EVALUATION\")\n",
"print(\"=\" * 50)\n",
"print(f\"\\nMAE: {test_metrics['mae']:.3f}\")\n",
"print(f\"RMSE: {test_metrics['rmse']:.3f}\")\n",
"print(f\"R²: {test_metrics['r2']:.3f}\")\n",
"print(f\"\\nAccuracy within ±1 grade: {test_metrics['within_1']:.1f}%\")\n",
"print(f\"Accuracy within ±2 grades: {test_metrics['within_2']:.1f}%\")\n",
"print(f\"\\nExact grouped V-grade accuracy: {test_metrics['exact_grouped_v']:.1f}%\")\n",
"print(f\"Accuracy within ±1 V-grade: {test_metrics['within_1_vgrade']:.1f}%\")\n",
"print(f\"Accuracy within ±2 V-grades: {test_metrics['within_2_vgrades']:.1f}%\")\n"
]
},
{
"cell_type": "markdown",
"id": "e6442e20",
"metadata": {},
"source": [
"# Visualization and Error Analysis"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5d639b4",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Visualize predictions\n",
"========================\n",
"\"\"\"\n",
"\n",
"fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
"\n",
"# Predicted vs Actual\n",
"ax = axes[0]\n",
"ax.scatter(test_actuals, test_preds, alpha=0.4, s=20)\n",
"min_val = min(test_actuals.min(), test_preds.min())\n",
"max_val = max(test_actuals.max(), test_preds.max())\n",
"ax.plot([min_val, max_val], [min_val, max_val], 'r--', lw=2, label='Perfect prediction')\n",
"ax.set_xlabel('Actual Difficulty', fontsize=12)\n",
"ax.set_ylabel('Predicted Difficulty', fontsize=12)\n",
"ax.set_title('Neural Network: Predicted vs Actual', fontsize=14)\n",
"ax.legend()\n",
"ax.grid(True, alpha=0.3)\n",
"\n",
"# Residuals\n",
"ax = axes[1]\n",
"residuals = test_actuals - test_preds\n",
"ax.scatter(test_preds, residuals, alpha=0.4, s=20)\n",
"ax.axhline(y=0, color='r', linestyle='--', lw=2)\n",
"ax.set_xlabel('Predicted Difficulty', fontsize=12)\n",
"ax.set_ylabel('Residual (Actual - Predicted)', fontsize=12)\n",
"ax.set_title('Neural Network: Residuals', fontsize=14)\n",
"ax.grid(True, alpha=0.3)\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig('../images/06_deep_learning/neural_network_predictions.png', dpi=150, bbox_inches='tight')\n",
"plt.show()\n",
"\n",
"# Error distribution\n",
"fig, ax = plt.subplots(figsize=(10, 5))\n",
"ax.hist(residuals, bins=50, edgecolor='black', alpha=0.7)\n",
"ax.axvline(x=0, color='r', linestyle='--', lw=2)\n",
"ax.set_xlabel('Prediction Error', fontsize=12)\n",
"ax.set_ylabel('Count', fontsize=12)\n",
"ax.set_title('Neural Network: Error Distribution', fontsize=14)\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig('../images/06_deep_learning/neural_network_errors.png', dpi=150, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffc027fc",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Error analysis by grade\n",
"========================\n",
"\"\"\"\n",
"\n",
"df_analysis = pd.DataFrame({\n",
" 'actual': test_actuals,\n",
" 'predicted': test_preds,\n",
" 'error': test_actuals - test_preds,\n",
" 'abs_error': np.abs(test_actuals - test_preds)\n",
"})\n",
"\n",
"grade_analysis = df_analysis.groupby('actual').agg(\n",
" count=('actual', 'count'),\n",
" mae=('abs_error', 'mean'),\n",
" bias=('error', 'mean'),\n",
" within_1=('error', lambda x: (np.abs(x) <= 1).mean() * 100)\n",
").round(3)\n",
"\n",
"print(\"### Error Analysis by Grade\\n\")\n",
"display(grade_analysis)\n",
"\n",
"# Plot\n",
"fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
"\n",
"ax = axes[0]\n",
"ax.bar(grade_analysis.index, grade_analysis['count'], color='#3498db', alpha=0.8)\n",
"ax.set_xlabel('Grade')\n",
"ax.set_ylabel('Count')\n",
"ax.set_title('Test Set Distribution by Grade')\n",
"\n",
"ax = axes[1]\n",
"ax.bar(grade_analysis.index, grade_analysis['mae'], color='#e74c3c', alpha=0.8)\n",
"ax.set_xlabel('Grade')\n",
"ax.set_ylabel('MAE')\n",
"ax.set_title('MAE by Grade')\n",
"\n",
"ax = axes[2]\n",
"colors = ['#2ecc71' if b >= 0 else '#e74c3c' for b in grade_analysis['bias']]\n",
"ax.bar(grade_analysis.index, grade_analysis['bias'], color=colors, alpha=0.8)\n",
"ax.set_xlabel('Grade')\n",
"ax.set_ylabel('Bias')\n",
"ax.set_title('Prediction Bias by Grade')\n",
"ax.axhline(y=0, color='black', linestyle='--', lw=1)\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig('../images/06_deep_learning/neural_network_by_grade.png', dpi=150, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "6bf02729",
"metadata": {},
"source": [
"# Hyperparameter tuning"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "81ff678e",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Hyperparameter tuning - try different architectures\n",
"========================\n",
"\"\"\"\n",
"\n",
"def train_and_evaluate_model(hidden_layers, dropout_rate, learning_rate, verbose=False):\n",
" \"\"\"Train a model with given hyperparameters and return validation metrics.\"\"\"\n",
" \n",
" # Create model\n",
" model = ClimbGradePredictor(input_dim, hidden_layers, dropout_rate).to(device)\n",
" \n",
" # Optimizer\n",
" optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)\n",
" scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10)\n",
" \n",
" best_val_loss = float('inf')\n",
" epochs_no_improve = 0\n",
" patience = 20\n",
" \n",
" for epoch in range(150): # Max epochs\n",
" train_loss = train_epoch(model, train_loader, criterion, optimizer, device)\n",
" val_loss, val_preds, val_actuals = evaluate(model, val_loader, criterion, device)\n",
" scheduler.step(val_loss)\n",
" \n",
" if val_loss < best_val_loss:\n",
" best_val_loss = val_loss\n",
" epochs_no_improve = 0\n",
" else:\n",
" epochs_no_improve += 1\n",
" \n",
" if epochs_no_improve >= patience:\n",
" break\n",
" \n",
" # Final validation metrics\n",
" _, val_preds, val_actuals = evaluate(model, val_loader, criterion, device)\n",
" val_metrics = compute_metrics(val_actuals, val_preds)\n",
" \n",
" if verbose:\n",
" print(f\"Layers: {hidden_layers}, Dropout: {dropout_rate}, LR: {learning_rate}\")\n",
" print(f\" Val MAE: {val_metrics['mae']:.3f}, Val R²: {val_metrics['r2']:.3f}\")\n",
" \n",
" return val_metrics, model\n",
"\n",
"\n",
"# Test different architectures\n",
"architectures = [\n",
" {'hidden_layers': [128, 64], 'dropout_rate': 0.2, 'learning_rate': 0.001},\n",
" {'hidden_layers': [256, 128, 64], 'dropout_rate': 0.2, 'learning_rate': 0.001},\n",
" {'hidden_layers': [512, 256, 128], 'dropout_rate': 0.2, 'learning_rate': 0.001},\n",
" {'hidden_layers': [256, 128, 64], 'dropout_rate': 0.3, 'learning_rate': 0.001},\n",
" {'hidden_layers': [256, 128, 64], 'dropout_rate': 0.1, 'learning_rate': 0.001},\n",
" {'hidden_layers': [256, 128, 64], 'dropout_rate': 0.2, 'learning_rate': 0.0005},\n",
"]\n",
"\n",
"print(\"### Hyperparameter Search\\n\")\n",
"\n",
"arch_results = []\n",
"for arch in architectures:\n",
" metrics, _ = train_and_evaluate_model(**arch, verbose=True)\n",
" arch_results.append({\n",
" **arch,\n",
" 'val_mae': metrics['mae'],\n",
" 'val_r2': metrics['r2']\n",
" })\n",
" print()\n",
"\n",
"arch_df = pd.DataFrame(arch_results).sort_values('val_mae')\n",
"print(\"\\n### Architecture Comparison (sorted by Val MAE)\\n\")\n",
"display(arch_df)"
]
},
{
"cell_type": "markdown",
"id": "b0b6119b",
"metadata": {},
"source": [
"# Feature Importance"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d3b07fe1",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Feature importance via permutation\n",
"========================\n",
"\"\"\"\n",
"\n",
"from sklearn.inspection import permutation_importance\n",
"from sklearn.base import BaseEstimator, RegressorMixin\n",
"\n",
"print(\"Computing feature importance via permutation...\\n\")\n",
"\n",
"# Create sklearn-compatible wrapper\n",
"class TorchWrapper(BaseEstimator, RegressorMixin):\n",
" \"\"\"Sklearn-compatible wrapper for PyTorch model.\"\"\"\n",
" \n",
" def __init__(self, model, device):\n",
" self.model = model\n",
" self.device = device\n",
" \n",
" def fit(self, X, y):\n",
" # Already fitted, just return self for sklearn compatibility\n",
" return self\n",
" \n",
" def predict(self, X):\n",
" self.model.eval()\n",
" with torch.no_grad():\n",
" X_tensor = torch.FloatTensor(X).to(self.device)\n",
" predictions = self.model(X_tensor).cpu().numpy().flatten()\n",
" return predictions\n",
"\n",
"\n",
"\n",
"# Load best model\n",
"model.load_state_dict(checkpoint['model_state_dict'])\n",
"wrapped_model = TorchWrapper(model, device)\n",
"\n",
"# Compute permutation importance (on a sample for speed)\n",
"sample_size = min(1000, len(X_test))\n",
"X_test_sample = X_test_scaled[:sample_size]\n",
"y_test_sample = y_test[:sample_size]\n",
"\n",
"result = permutation_importance(\n",
" wrapped_model, X_test_sample, y_test_sample,\n",
" n_repeats=10,\n",
" random_state=RANDOM_STATE,\n",
" scoring='neg_mean_absolute_error'\n",
")\n",
"\n",
"# Get feature importance\n",
"importance_df = pd.DataFrame({\n",
" 'feature': X.columns,\n",
" 'importance': result.importances_mean,\n",
" 'std': result.importances_std\n",
"}).sort_values('importance', ascending=False)\n",
"\n",
"print(\"### Top 20 Most Important Features (Permutation)\\n\")\n",
"display(importance_df.head(20))\n",
"\n",
"# Plot\n",
"fig, ax = plt.subplots(figsize=(10, 8))\n",
"\n",
"top_features = importance_df.head(20)\n",
"ax.barh(range(len(top_features)), top_features['importance'], color='#3498db', alpha=0.8)\n",
"ax.set_yticks(range(len(top_features)))\n",
"ax.set_yticklabels(top_features['feature'])\n",
"ax.set_xlabel('Importance (decrease in MAE)', fontsize=12)\n",
"ax.set_title('Neural Network: Top 20 Features (Permutation Importance)', fontsize=14)\n",
"ax.invert_yaxis()\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig('../images/06_deep_learning/neural_network_feature_importance.png', dpi=150, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "9f32c595",
"metadata": {},
"source": [
"# Save Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5132952",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Save final model and predictions\n",
"========================\n",
"\"\"\"\n",
"\n",
"# Save the model\n",
"torch.save({\n",
" 'model_state_dict': model.state_dict(),\n",
" 'input_dim': input_dim,\n",
" 'hidden_layers': hidden_layers,\n",
" 'dropout_rate': dropout_rate,\n",
" 'scaler_mean': scaler.mean_,\n",
" 'scaler_scale': scaler.scale_,\n",
" 'feature_names': X.columns.tolist(),\n",
"}, '../models/neural_network_final.pth')\n",
"\n",
"# Save predictions for ensemble\n",
"np.save('../data/06_deep_learning/nn_test_predictions.npy', test_preds)\n",
"np.save('../data/06_deep_learning/nn_test_actuals.npy', test_actuals)\n",
"\n",
"# Save test features\n",
"pd.DataFrame(X_test_scaled, columns=X.columns, index=X_test.index).to_csv(\n",
" '../data/06_deep_learning/nn_test_features.csv'\n",
")\n",
"\n",
"print(\"Saved:\")\n",
"print(\" - ../models/neural_network_final.pth\")\n",
"print(\" - ../data/06_deep_learning/nn_test_predictions.npy\")\n",
"print(\" - ../data/06_deep_learning/nn_test_actuals.npy\")\n",
"print(\" - ../data/06_deep_learning/nn_test_features.csv\")"
]
},
{
"cell_type": "markdown",
"id": "ced3f05d",
"metadata": {},
"source": [
"# Comparison with RF"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7938200",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Comparison with random forest\n",
"========================\n",
"\"\"\"\n",
"\n",
"# Load Random Forest predictions if available\n",
"try:\n",
" rf_preds = np.load('../data/06_deep_learning/rf_test_predictions.npy')\n",
" rf_actuals = np.load('../data/06_deep_learning/rf_test_actuals.npy')\n",
"\n",
" # Compare\n",
" rf_metrics = compute_metrics(rf_actuals, rf_preds)\n",
" nn_metrics = compute_metrics(test_actuals, test_preds)\n",
"\n",
" comparison = pd.DataFrame({\n",
" 'Metric': [\n",
" 'MAE', 'RMSE', 'R²',\n",
" 'Within ±1', 'Within ±2',\n",
" 'Exact grouped V',\n",
" 'Within ±1 V', 'Within ±2 V'\n",
" ],\n",
" 'Random Forest': [\n",
" rf_metrics['mae'], rf_metrics['rmse'], rf_metrics['r2'], \n",
" rf_metrics['within_1'], rf_metrics['within_2'],\n",
" rf_metrics['exact_grouped_v'],\n",
" rf_metrics['within_1_vgrade'], rf_metrics['within_2_vgrades']\n",
" ],\n",
" 'Neural Network': [\n",
" nn_metrics['mae'], nn_metrics['rmse'], nn_metrics['r2'],\n",
" nn_metrics['within_1'], nn_metrics['within_2'],\n",
" nn_metrics['exact_grouped_v'],\n",
" nn_metrics['within_1_vgrade'], nn_metrics['within_2_vgrades']\n",
" ]\n",
" })\n",
"\n",
" print(\"### Model Comparison\\n\")\n",
" display(comparison)\n",
"\n",
" # Plot comparison\n",
" fig, axes = plt.subplots(1, 4, figsize=(18, 5))\n",
"\n",
" x = [0, 1]\n",
"\n",
" # MAE\n",
" ax = axes[0]\n",
" ax.bar(x, [rf_metrics['mae'], nn_metrics['mae']], color=['#3498db', '#e74c3c'])\n",
" ax.set_xticks(x)\n",
" ax.set_xticklabels(['Random Forest', 'Neural Network'])\n",
" ax.set_ylabel('MAE')\n",
" ax.set_title('Mean Absolute Error')\n",
"\n",
" # R²\n",
" ax = axes[1]\n",
" ax.bar(x, [rf_metrics['r2'], nn_metrics['r2']], color=['#3498db', '#e74c3c'])\n",
" ax.set_xticks(x)\n",
" ax.set_xticklabels(['Random Forest', 'Neural Network'])\n",
" ax.set_ylabel('R²')\n",
" ax.set_title('R² Score')\n",
"\n",
" # Within ±1 fine-grained difficulty\n",
" ax = axes[2]\n",
" ax.bar(x, [rf_metrics['within_1'], nn_metrics['within_1']], color=['#3498db', '#e74c3c'])\n",
" ax.set_xticks(x)\n",
" ax.set_xticklabels(['Random Forest', 'Neural Network'])\n",
" ax.set_ylabel('Percent')\n",
" ax.set_title('Within ±1 Difficulty')\n",
"\n",
" # Within ±1 grouped V-grade\n",
" ax = axes[3]\n",
" ax.bar(x, [rf_metrics['within_1_vgrade'], nn_metrics['within_1_vgrade']], color=['#3498db', '#e74c3c'])\n",
" ax.set_xticks(x)\n",
" ax.set_xticklabels(['Random Forest', 'Neural Network'])\n",
" ax.set_ylabel('Percent')\n",
" ax.set_title('Within ±1 V-grade')\n",
"\n",
" plt.tight_layout()\n",
" plt.savefig('../images/06_deep_learning/rf_vs_nn_comparison.png', dpi=150, bbox_inches='tight')\n",
" plt.show()\n",
"\n",
"except Exception as e:\n",
" print(f\"Could not compare with Random Forest: {e}\")\n",
" print(\"Run Notebook 05 first to generate RF prediction files.\")\n"
]
},
{
"cell_type": "markdown",
"id": "556be142",
"metadata": {},
"source": [
"# Conclusion"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "24ff0a3a",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"========================\n",
"Final Summary\n",
"========================\n",
"\"\"\"\n",
"\n",
"summary = f\"\"\"\n",
"### Neural Network Model Summary\n",
"\n",
"**Architecture:**\n",
"- Input: {input_dim} features\n",
"- Hidden layers: {hidden_layers}\n",
"- Dropout rate: {dropout_rate}\n",
"- Total parameters: {total_params:,}\n",
"\n",
"**Training:**\n",
"- Optimizer: Adam (lr={learning_rate})\n",
"- Early stopping: {early_stopping_patience} epochs patience\n",
"- Best epoch: {best_epoch}\n",
"\n",
"**Test Set Performance:**\n",
"- MAE: {test_metrics['mae']:.3f}\n",
"- RMSE: {test_metrics['rmse']:.3f}\n",
"- R²: {test_metrics['r2']:.3f}\n",
"- Accuracy within ±1 grade: {test_metrics['within_1']:.1f}%\n",
"- Accuracy within ±2 grades: {test_metrics['within_2']:.1f}%\n",
"- Exact grouped V-grade accuracy: {test_metrics['exact_grouped_v']:.1f}%\n",
"- Accuracy within ±1 V-grade: {test_metrics['within_1_vgrade']:.1f}%\n",
"- Accuracy within ±2 V-grades: {test_metrics['within_2_vgrades']:.1f}%\n",
"\n",
"**Key Findings:**\n",
"1. The neural network is competitive, but not clearly stronger than the best tree-based baseline.\n",
"2. Fine-grained score prediction remains harder than grouped grade prediction.\n",
"3. The grouped V-grade metrics show that the model captures broader difficulty bands more reliably than exact score labels.\n",
"4. This makes the neural network useful as a comparison model, and potentially valuable in an ensemble.\n",
"\n",
"**Portfolio Interpretation:**\n",
"This deep learning notebook extends the classical modelling pipeline by testing whether a neural architecture can improve prediction quality on engineered climbing features.\n",
"The main result is not that deep learning wins outright, but that it provides a meaningful benchmark and helps clarify where model complexity does and does not add value.\n",
"\"\"\"\n",
"\n",
"print(summary)\n",
"\n",
"# Save summary\n",
"with open('../data/06_deep_learning/neural_network_summary.txt', 'w') as f:\n",
" f.write(summary)\n",
"\n",
"print(\"\\nSummary saved to ../data/06_deep_learning/neural_network_summary.txt\")\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}