diff --git a/README.md b/README.md
index 1887b35..69e79d8 100644
--- a/README.md
+++ b/README.md
@@ -90,13 +90,13 @@ The route generator is a small **GPT-style model** — the same general idea as
 
 and it predicts what hold token should come next, then the next, then the next, until it produces an `<EOS>` token. The result is a novel sequence of holds that the model thinks is a plausible V6 on the Kilter at 40°.
 
-**~91% of generated routes pass basic structural checks** (has a start hold, has a finish hold, holds exist on the right board, no duplicates).
+**~89% of generated routes pass basic structural checks** (has a start hold, has a finish hold, holds exist on the right board, no duplicates).
 
 ---
 
 ## Quantitative results
 
-These numbers are from the full training run documented by this repository.
+These numbers are from the full training run documented by this repository. The notebooks in `notebooks/` are self-contained walkthroughs of the pipeline stages. The reported pipeline run was executed on Kaggle; notebooks 01-04 took about 8h 1m 59s total using GPU T4 x2.
 
 In practice: the grade model is usually within one V-grade, and the generator usually makes structurally valid routes, but exact grade control is still imperfect.
 
@@ -112,27 +112,27 @@ Shared vocabulary: **4,438 tokens** (6 special + 2 board + 12 angle + 16 grade +
 
 ### Grade prediction accuracy
 
-The model has ~1.17M parameters. Early stopping selected epoch 8 (validation MAE ≈ 1.480).
+The model has ~1.17M parameters. Early stopping selected epoch 11 (validation MAE ≈ 1.488).
 
 | Metric | Overall | TB2 | Kilter |
 |---|---:|---:|---:|
-| Exact V-grade | 36.0% | 37.3% | 35.8% |
-| Within ±1 V-grade | 79.3% | 80.0% | 79.2% |
-| Within ±2 V-grades | 94.8% | 95.5% | 94.7% |
-| R² | 0.768 | 0.800 | 0.763 |
+| Exact V-grade | 35.8% | 35.8% | 35.8% |
+| Within ±1 V-grade | 79.2% | 79.4% | 79.1% |
+| Within ±2 V-grades | 94.9% | 95.5% | 94.8% |
+| R² | 0.763 | 0.793 | 0.758 |
 
 ### Route generation
 
-The generator has ~1.41M parameters. Best validation perplexity: 24.2.
+The generator has ~1.41M parameters. Best validation perplexity: 24.3.
 
 | Metric | TB2 | Kilter |
 |---|---:|---:|
 | Routes evaluated | 200 | 200 |
-| Structurally valid | 89.0% | 94.0% |
-| Exact requested grade (critic) | 29.5% | 27.0% |
-| Within ±1 V-grade (critic) | 68.5% | 73.0% |
-| Within ±2 V-grades (critic) | 90.5% | 93.5% |
-| Mean novelty (Jaccard distance) | 0.656 | 0.634 |
+| Structurally valid | 91.5% | 86.0% |
+| Exact requested grade (critic) | 34.5% | 37.0% |
+| Within ±1 V-grade (critic) | 73.0% | 79.5% |
+| Within ±2 V-grades (critic) | 91.0% | 96.5% |
+| Mean novelty (Jaccard distance) | 0.656 | 0.643 |
 
 ---
 
diff --git a/notebooks/01_unified_route_tokenization.ipynb b/notebooks/01_unified_route_tokenization.ipynb
index f3f9a3a..01fe5b2 100644
--- a/notebooks/01_unified_route_tokenization.ipynb
+++ b/notebooks/01_unified_route_tokenization.ipynb
@@ -48,39 +48,190 @@
     "- Kilter Board Original\n",
     "\n",
     "The board-specific details are stored in `configs/tb2.json` and `configs/kilter.json`.\n",
-    "The shared tokenization code lives in `src/climbingboardgpt/`."
+    "This version defines the tokenization helpers inline as the notebook needs them.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "6ee2907f",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:45:23.269153Z",
+     "iopub.status.busy": "2026-06-07T15:45:23.268660Z",
+     "iopub.status.idle": "2026-06-07T15:45:25.138003Z",
+     "shell.execute_reply": "2026-06-07T15:45:25.137054Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "from pathlib import Path\n",
-    "import sys\n",
-    "import json\n",
-    "import pandas as pd\n",
+    "from __future__ import annotations\n",
+    "\n",
+    "import ast\n",
+    "import json\n",
+    "import random\n",
+    "import re\n",
+    "import sqlite3\n",
+    "from dataclasses import dataclass\n",
+    "from pathlib import Path\n",
+    "from typing import Any, Iterable\n",
+    "\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from sklearn.model_selection import train_test_split\n",
     "\n",
-    "# Set up the project root so we can import our custom package\n",
     "ROOT = Path.cwd().resolve()\n",
     "if ROOT.name == \"notebooks\":\n",
-    "    ROOT = ROOT.parent\n",
-    "sys.path.insert(0, str(ROOT / \"src\"))\n",
+    "    ROOT = ROOT.parent"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fec9ef3b",
+   "metadata": {},
+   "source": [
+    "### Board configuration helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c110801",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:45:25.142164Z",
+     "iopub.status.busy": "2026-06-07T15:45:25.141718Z",
+     "iopub.status.idle": "2026-06-07T15:45:25.156480Z",
+     "shell.execute_reply": "2026-06-07T15:45:25.155743Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Find the project root and load board configuration JSON files.\n",
+    "def find_project_root(start: str | Path | None = None) -> Path:\n",
+    "    \"\"\"Walk upward until the repository root markers are found.\n",
     "\n",
-    "# Import our custom modules\n",
-    "from climbingboardgpt.config import load_board_configs\n",
-    "from climbingboardgpt.data import load_multi_board_data\n",
-    "from climbingboardgpt.tokenization import (\n",
-    "    build_route_records,\n",
-    "    build_token_metadata,\n",
-    "    build_vocab,\n",
-    "    encode,\n",
-    "    make_placement_lookup,\n",
-    "    vocab_payload,\n",
-    ")\n",
-    "from climbingboardgpt.utils import assign_group_splits, write_json, json_safe"
+    "    The project root is identified by both ``pyproject.toml`` and ``configs``.\n",
+    "    If neither marker pair is found, the resolved starting directory is returned\n",
+    "    so callers still have a deterministic base path.\n",
+    "    \"\"\"\n",
+    "    current = Path(start).resolve() if start is not None else Path.cwd().resolve()\n",
+    "    for candidate in [current, *current.parents]:\n",
+    "        if (candidate / \"pyproject.toml\").exists() and (candidate / \"configs\").exists():\n",
+    "            return candidate\n",
+    "    return current\n",
+    "\n",
+    "@dataclass(frozen=True)\n",
+    "class BoardConfig:\n",
+    "    \"\"\"Configuration for a single climbing board.\n",
+    "    \n",
+    "    This dataclass stores all board-specific settings needed for\n",
+    "    data loading, tokenization, and model training.\n",
+    "    \n",
+    "    Attributes:\n",
+    "        board_key: Short identifier (e.g., \"tb2\", \"kilter\")\n",
+    "        display_name: Human-readable name (e.g., \"Tension Board 2 Mirror\")\n",
+    "        token_prefix: Namespace for hold tokens (e.g., \"TB2\", \"KILTER\")\n",
+    "        db_path: Path to the SQLite database\n",
+    "        layout_id: Which layout in the database to use\n",
+    "        max_angle: Filter out routes steeper than this (None = no filter)\n",
+    "        min_fa_date: Filter out routes first ascended before this date\n",
+    "        placement_y_max: Filter out placements above this Y coordinate\n",
+    "        include_mirror_placement_id: Whether to include mirror info (TB2 only)\n",
+    "        role_definitions: Maps semantic role names to numeric IDs\n",
+    "        boardlib_database_command: Command to download the database\n",
+    "        boardlib_images_command: Command to download board images\n",
+    "        notes: Additional notes about the configuration\n",
+    "    \"\"\"\n",
+    "    board_key: str\n",
+    "    display_name: str\n",
+    "    token_prefix: str\n",
+    "    db_path: Path\n",
+    "    layout_id: int\n",
+    "    max_angle: float | None\n",
+    "    min_fa_date: str | None\n",
+    "    placement_y_max: float | None\n",
+    "    include_mirror_placement_id: bool\n",
+    "    role_definitions: dict[str, int]\n",
+    "    boardlib_database_command: str | None = None\n",
+    "    boardlib_images_command: str | None = None\n",
+    "    notes: tuple[str, ...] = ()\n",
+    "\n",
+    "    @property\n",
+    "    def role_id_to_name(self) -> dict[int, str]:\n",
+    "        \"\"\"Reverse mapping from numeric role IDs to semantic role names.\n",
+    "        \n",
+    "        Example: {5: 'start', 6: 'middle', 7: 'finish', 8: 'foot'} for TB2\n",
+    "        \"\"\"\n",
+    "        return {int(role_id): name for name, role_id in self.role_definitions.items()}\n",
+    "\n",
+    "    @property\n",
+    "    def board_token(self) -> str:\n",
+    "        \"\"\"The special token representing this board.\n",
+    "        \n",
+    "        Example: \"<BOARD_TB2>\" or \"<BOARD_KILTER>\"\n",
+    "        \"\"\"\n",
+    "        return f\"<BOARD_{self.token_prefix}>\"\n",
+    "\n",
+    "    def resolve_db_path(self, project_root: Path | None = None) -> Path:\n",
+    "        \"\"\"Resolve the database path relative to the project root.\n",
+    "        \n",
+    "        If db_path is absolute, return it as-is.\n",
+    "        Otherwise, resolve it relative to the project root.\n",
+    "        \"\"\"\n",
+    "        project_root = project_root or find_project_root()\n",
+    "        return self.db_path if self.db_path.is_absolute() else project_root / self.db_path\n",
+    "\n",
+    "def load_board_config(board_key: str, config_dir: str | Path | None = None) -> BoardConfig:\n",
+    "    \"\"\"Load a single board configuration from a JSON file.\n",
+    "    \n",
+    "    Args:\n",
+    "        board_key: Board identifier (e.g., \"tb2\", \"kilter\")\n",
+    "        config_dir: Directory containing config JSON files\n",
+    "        \n",
+    "    Returns:\n",
+    "        BoardConfig dataclass with all board settings\n",
+    "        \n",
+    "    Raises:\n",
+    "        FileNotFoundError: If the config file doesn't exist\n",
+    "    \"\"\"\n",
+    "    project_root = find_project_root()\n",
+    "    config_dir = Path(config_dir) if config_dir is not None else project_root / \"configs\"\n",
+    "    path = config_dir / f\"{board_key}.json\"\n",
+    "    if not path.exists():\n",
+    "        available = sorted(p.stem for p in config_dir.glob(\"*.json\"))\n",
+    "        raise FileNotFoundError(\n",
+    "            f\"Unknown board config '{board_key}'. Available: {available}\"\n",
+    "        )\n",
+    "\n",
+    "    payload = json.loads(path.read_text(encoding=\"utf-8\"))\n",
+    "    return BoardConfig(\n",
+    "        board_key=str(payload[\"board_key\"]),\n",
+    "        display_name=str(payload[\"display_name\"]),\n",
+    "        token_prefix=str(payload[\"token_prefix\"]),\n",
+    "        db_path=Path(payload[\"db_path\"]),\n",
+    "        layout_id=int(payload[\"layout_id\"]),\n",
+    "        max_angle=None if payload.get(\"max_angle\") is None else float(payload[\"max_angle\"]),\n",
+    "        min_fa_date=payload.get(\"min_fa_date\"),\n",
+    "        placement_y_max=None if payload.get(\"placement_y_max\") is None else float(payload[\"placement_y_max\"]),\n",
+    "        include_mirror_placement_id=bool(payload.get(\"include_mirror_placement_id\", False)),\n",
+    "        role_definitions={str(k): int(v) for k, v in payload[\"role_definitions\"].items()},\n",
+    "        boardlib_database_command=payload.get(\"boardlib_database_command\"),\n",
+    "        boardlib_images_command=payload.get(\"boardlib_images_command\"),\n",
+    "        notes=tuple(payload.get(\"notes\", [])),\n",
+    "    )\n",
+    "\n",
+    "def load_board_configs(board_keys: list[str] | tuple[str, ...]) -> list[BoardConfig]:\n",
+    "    \"\"\"Load multiple board configurations.\n",
+    "    \n",
+    "    Args:\n",
+    "        board_keys: List of board identifiers\n",
+    "        \n",
+    "    Returns:\n",
+    "        List of BoardConfig dataclasses\n",
+    "    \"\"\"\n",
+    "    return [load_board_config(board_key) for board_key in board_keys]"
    ]
   },
   {
@@ -100,20 +251,248 @@
     "- **`token_prefix`**: The namespace prefix for hold tokens (\"TB2\" vs \"KILTER\")\n",
     "- **`include_mirror_placement_id`**: Whether to include mirror information (TB2 has symmetric left/right holds)\n",
     "\n",
-    "This configuration-driven approach means we can add new boards by creating a new JSON config file, without changing any code."
+    "This configuration-driven approach means we can add new boards by creating a new JSON config file, without changing any code.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "4f04dcea",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:45:25.159465Z",
+     "iopub.status.busy": "2026-06-07T15:45:25.159209Z",
+     "iopub.status.idle": "2026-06-07T15:45:25.166377Z",
+     "shell.execute_reply": "2026-06-07T15:45:25.165663Z"
+    }
+   },
    "outputs": [],
    "source": [
     "configs = load_board_configs([\"tb2\", \"kilter\"])\n",
     "configs"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "25242855",
+   "metadata": {},
+   "source": [
+    "### Database loading helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a076d997",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:45:25.170098Z",
+     "iopub.status.busy": "2026-06-07T15:45:25.169626Z",
+     "iopub.status.idle": "2026-06-07T15:45:25.182438Z",
+     "shell.execute_reply": "2026-06-07T15:45:25.181766Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Query each BoardLib SQLite database and attach board identity columns.\n",
+    "def build_climbs_query(config: BoardConfig) -> tuple[str, list]:\n",
+    "    \"\"\"Build a SQL query for climbs data with board-specific filters.\n",
+    "    \n",
+    "    The query joins climbs, layouts, products, climb_stats, and difficulty_grades\n",
+    "    tables, applying filters for:\n",
+    "    - layout_id: Which board layout to use\n",
+    "    - max_angle: Exclude routes steeper than this\n",
+    "    - min_fa_date: Exclude routes first ascended before this date\n",
+    "    - display_difficulty IS NOT NULL: Only routes with difficulty ratings\n",
+    "    - is_listed = 1: Only publicly listed routes\n",
+    "    \n",
+    "    Args:\n",
+    "        config: Board configuration\n",
+    "        \n",
+    "    Returns:\n",
+    "        Tuple of (SQL query string, list of query parameters)\n",
+    "    \"\"\"\n",
+    "    conditions = [\n",
+    "        \"cs.display_difficulty IS NOT NULL\",\n",
+    "        \"c.is_listed = 1\",\n",
+    "        \"c.layout_id = ?\",\n",
+    "    ]\n",
+    "    params: list = [config.layout_id]\n",
+    "\n",
+    "    if config.max_angle is not None:\n",
+    "        conditions.append(\"cs.angle <= ?\")\n",
+    "        params.append(config.max_angle)\n",
+    "\n",
+    "    if config.min_fa_date is not None:\n",
+    "        conditions.append(\"cs.fa_at > ?\")\n",
+    "        params.append(config.min_fa_date)\n",
+    "\n",
+    "    query = f\"\"\"\n",
+    "    SELECT\n",
+    "        c.uuid,\n",
+    "        c.name AS climb_name,\n",
+    "        c.setter_username,\n",
+    "        c.layout_id AS layout_id,\n",
+    "        c.description,\n",
+    "        c.is_nomatch,\n",
+    "        c.is_listed,\n",
+    "        l.name AS layout_name,\n",
+    "        p.name AS board_name,\n",
+    "        c.frames,\n",
+    "        cs.angle,\n",
+    "        cs.display_difficulty,\n",
+    "        dg.boulder_name AS boulder_grade,\n",
+    "        cs.ascensionist_count,\n",
+    "        cs.quality_average,\n",
+    "        cs.fa_at\n",
+    "    FROM climbs c\n",
+    "    JOIN layouts l ON c.layout_id = l.id\n",
+    "    JOIN products p ON l.product_id = p.id\n",
+    "    JOIN climb_stats cs ON c.uuid = cs.climb_uuid\n",
+    "    JOIN difficulty_grades dg ON ROUND(cs.display_difficulty) = dg.difficulty\n",
+    "    WHERE {' AND '.join(conditions)}\n",
+    "    \"\"\"\n",
+    "    return query, params\n",
+    "\n",
+    "def build_placements_query(config: BoardConfig) -> tuple[str, list]:\n",
+    "    \"\"\"Build a SQL query for placement data with board-specific filters.\n",
+    "    \n",
+    "    The query retrieves hold positions, default roles, material types,\n",
+    "    and (optionally) mirror placement IDs for symmetric holds.\n",
+    "    \n",
+    "    Args:\n",
+    "        config: Board configuration\n",
+    "        \n",
+    "    Returns:\n",
+    "        Tuple of (SQL query string, list of query parameters)\n",
+    "    \"\"\"\n",
+    "    params: list = [config.layout_id]\n",
+    "    y_condition = \"\"\n",
+    "    if config.placement_y_max is not None:\n",
+    "        y_condition = \" AND h.y <= ?\"\n",
+    "        params.append(config.placement_y_max)\n",
+    "\n",
+    "    if config.include_mirror_placement_id:\n",
+    "        # TB2 has mirrored holds — include the mirror placement ID\n",
+    "        query = f\"\"\"\n",
+    "        SELECT\n",
+    "            p.id AS placement_id,\n",
+    "            h.x,\n",
+    "            h.y,\n",
+    "            p.default_placement_role_id AS default_role_id,\n",
+    "            p.set_id AS set_id,\n",
+    "            s.name AS set_name,\n",
+    "            p_mirror.id AS mirror_placement_id\n",
+    "        FROM placements p\n",
+    "        JOIN holes h ON p.hole_id = h.id\n",
+    "        JOIN sets s ON p.set_id = s.id\n",
+    "        LEFT JOIN holes h_mirror ON h.mirrored_hole_id = h_mirror.id\n",
+    "        LEFT JOIN placements p_mirror\n",
+    "            ON p_mirror.hole_id = h_mirror.id\n",
+    "           AND p_mirror.layout_id = p.layout_id\n",
+    "        WHERE p.layout_id = ?{y_condition}\n",
+    "        \"\"\"\n",
+    "    else:\n",
+    "        # Kilter doesn't have mirrored holds\n",
+    "        query = f\"\"\"\n",
+    "        SELECT\n",
+    "            p.id AS placement_id,\n",
+    "            h.x,\n",
+    "            h.y,\n",
+    "            p.default_placement_role_id AS default_role_id,\n",
+    "            p.set_id AS set_id,\n",
+    "            s.name AS set_name,\n",
+    "            NULL AS mirror_placement_id\n",
+    "        FROM placements p\n",
+    "        JOIN holes h ON p.hole_id = h.id\n",
+    "        JOIN sets s ON p.set_id = s.id\n",
+    "        WHERE p.layout_id = ?{y_condition}\n",
+    "        \"\"\"\n",
+    "    return query, params\n",
+    "\n",
+    "def load_board_data(\n",
+    "    config: BoardConfig,\n",
+    "    project_root: str | Path | None = None,\n",
+    "    max_climbs: int | None = None,\n",
+    ") -> tuple[pd.DataFrame, pd.DataFrame]:\n",
+    "    \"\"\"Load climbs and placements data for a single board.\n",
+    "    \n",
+    "    Args:\n",
+    "        config: Board configuration\n",
+    "        project_root: Path to project root (for resolving db_path)\n",
+    "        max_climbs: Optional row limit for fast smoke-test loads.\n",
+    "        \n",
+    "    Returns:\n",
+    "        Tuple of (climbs DataFrame, placements DataFrame)\n",
+    "    \"\"\"\n",
+    "    project_root = Path(project_root) if project_root is not None else find_project_root()\n",
+    "    db_path = config.resolve_db_path(project_root)\n",
+    "    if not db_path.exists():\n",
+    "        raise FileNotFoundError(\n",
+    "            f\"Could not find database for board '{config.board_key}': {db_path}\"\n",
+    "        )\n",
+    "\n",
+    "    climbs_query, climbs_params = build_climbs_query(config)\n",
+    "    placements_query, placements_params = build_placements_query(config)\n",
+    "    if max_climbs is not None:\n",
+    "        if max_climbs < 1:\n",
+    "            raise ValueError(\"max_climbs must be at least 1.\")\n",
+    "        climbs_query = f\"{climbs_query}\\nORDER BY c.uuid, cs.angle\\nLIMIT ?\"\n",
+    "        climbs_params = [*climbs_params, int(max_climbs)]\n",
+    "\n",
+    "    with sqlite3.connect(db_path) as conn:\n",
+    "        df_climbs = pd.read_sql_query(climbs_query, conn, params=climbs_params)\n",
+    "        df_placements = pd.read_sql_query(placements_query, conn, params=placements_params)\n",
+    "\n",
+    "    # Add board identifiers for multi-board processing\n",
+    "    df_climbs[\"board_key\"] = config.board_key\n",
+    "    df_climbs[\"board_token_prefix\"] = config.token_prefix\n",
+    "    df_climbs[\"board_display_name\"] = config.display_name\n",
+    "\n",
+    "    df_placements[\"board_key\"] = config.board_key\n",
+    "    df_placements[\"board_token_prefix\"] = config.token_prefix\n",
+    "    df_placements[\"board_display_name\"] = config.display_name\n",
+    "\n",
+    "    return df_climbs, df_placements\n",
+    "\n",
+    "def load_multi_board_data(\n",
+    "    configs: list[BoardConfig],\n",
+    "    project_root: str | Path | None = None,\n",
+    "    max_climbs_per_board: int | None = None,\n",
+    ") -> tuple[pd.DataFrame, pd.DataFrame]:\n",
+    "    \"\"\"Load and concatenate data from multiple boards.\n",
+    "    \n",
+    "    This function loads data from each board's database and concatenates\n",
+    "    them into unified DataFrames. Board identifiers are preserved in\n",
+    "    the board_key column.\n",
+    "    \n",
+    "    Args:\n",
+    "        configs: List of board configurations\n",
+    "        project_root: Path to project root\n",
+    "        max_climbs_per_board: Optional row limit per board for smoke tests.\n",
+    "        \n",
+    "    Returns:\n",
+    "        Tuple of (combined climbs DataFrame, combined placements DataFrame)\n",
+    "    \"\"\"\n",
+    "    climb_frames = []\n",
+    "    placement_frames = []\n",
+    "\n",
+    "    for config in configs:\n",
+    "        climbs, placements = load_board_data(\n",
+    "            config,\n",
+    "            project_root=project_root,\n",
+    "            max_climbs=max_climbs_per_board,\n",
+    "        )\n",
+    "        climb_frames.append(climbs)\n",
+    "        placement_frames.append(placements)\n",
+    "\n",
+    "    return (\n",
+    "        pd.concat(climb_frames, ignore_index=True),\n",
+    "        pd.concat(placement_frames, ignore_index=True),\n",
+    "    )"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "2a5c9a9b",
@@ -142,14 +521,22 @@
     "- `x`, `y`: Physical coordinates on the board (in inches)\n",
     "- `default_role_id`: What role this hold typically plays (hand vs foot)\n",
     "- `set_name`: Material type (\"Wood\" or \"Plastic\")\n",
-    "- `mirror_placement_id`: For TB2, the ID of the symmetric hold on the other side"
+    "- `mirror_placement_id`: For TB2, the ID of the symmetric hold on the other side\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "53c1951a",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:45:25.185319Z",
+     "iopub.status.busy": "2026-06-07T15:45:25.184989Z",
+     "iopub.status.idle": "2026-06-07T15:45:29.117312Z",
+     "shell.execute_reply": "2026-06-07T15:45:29.116566Z"
+    }
+   },
    "outputs": [],
    "source": [
     "df_climbs, df_placements = load_multi_board_data(configs, project_root=ROOT)\n",
@@ -160,6 +547,267 @@
     "print(df_climbs.groupby(\"board_key\").size())"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "f6a93063",
+   "metadata": {},
+   "source": [
+    "### Grade and route-tokenization helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7597dfc3",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:45:29.120837Z",
+     "iopub.status.busy": "2026-06-07T15:45:29.120438Z",
+     "iopub.status.idle": "2026-06-07T15:45:29.144382Z",
+     "shell.execute_reply": "2026-06-07T15:45:29.143567Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Map BoardLib display difficulties into grouped V-grade tokens.\n",
+    "GRADE_TO_V = {\n",
+    "    10: 0, 11: 0, 12: 0,\n",
+    "    13: 1, 14: 1,\n",
+    "    15: 2,\n",
+    "    16: 3, 17: 3,\n",
+    "    18: 4, 19: 4,\n",
+    "    20: 5, 21: 5,\n",
+    "    22: 6,\n",
+    "    23: 7,\n",
+    "    24: 8, 25: 8,\n",
+    "    26: 9,\n",
+    "    27: 10,\n",
+    "    28: 11,\n",
+    "    29: 12,\n",
+    "    30: 13,\n",
+    "    31: 14,\n",
+    "    32: 15,\n",
+    "    33: 16,\n",
+    "}\n",
+    "\n",
+    "def to_grouped_v(display_difficulty: float) -> int:\n",
+    "    \"\"\"Map a continuous display difficulty to the nearest grouped V grade.\"\"\"\n",
+    "    rounded = int(round(float(display_difficulty)))\n",
+    "    rounded = max(min(rounded, max(GRADE_TO_V)), min(GRADE_TO_V))\n",
+    "    return GRADE_TO_V[rounded]\n",
+    "\n",
+    "def grade_token(display_difficulty: float) -> str:\n",
+    "    \"\"\"Return the grade-conditioning token for a display difficulty value.\"\"\"\n",
+    "    return f\"<GRADE_V{to_grouped_v(display_difficulty)}>\"\n",
+    "\n",
+    "# Parse frames, canonicalize holds, and build route-level token sequences.\n",
+    "SPECIAL_TOKENS = [\n",
+    "    \"<PAD>\",\n",
+    "    \"<UNK>\",\n",
+    "    \"<BOS>\",\n",
+    "    \"<EOS>\",\n",
+    "    \"<CLS>\",\n",
+    "    \"<MASK>\",\n",
+    "]\n",
+    "\n",
+    "ANGLE_TOKEN_PATTERN = re.compile(r\"^<ANGLE_(-?\\d+)>$\")\n",
+    "\n",
+    "GRADE_TOKEN_PATTERN = re.compile(r\"^<GRADE_V(\\d+)>$\")\n",
+    "\n",
+    "BOARD_TOKEN_PATTERN = re.compile(r\"^<BOARD_([A-Z0-9_]+)>$\")\n",
+    "\n",
+    "HOLD_TOKEN_PATTERN = re.compile(r\"^<([A-Z0-9_]+)_p(\\d+)_(start|middle|finish|foot|unknown)>$\")\n",
+    "\n",
+    "ROLE_SORT_ORDER = {\n",
+    "    \"start\": 0,\n",
+    "    \"middle\": 1,\n",
+    "    \"foot\": 2,\n",
+    "    \"finish\": 3,\n",
+    "    \"unknown\": 9,\n",
+    "}\n",
+    "\n",
+    "def parse_frames(frames_str: str | None) -> list[tuple[int, int]]:\n",
+    "    \"\"\"Parse a frames string into ``(placement_id, role_id)`` pairs.\n",
+    "\n",
+    "    Frames strings are compact concatenations such as ``p344r5p369r6``. Invalid\n",
+    "    or missing input returns an empty list so callers can skip unusable climbs\n",
+    "    without special-case exception handling.\n",
+    "    \"\"\"\n",
+    "    if not isinstance(frames_str, str):\n",
+    "        return []\n",
+    "    matches = re.findall(r\"p(\\d+)r(\\d+)\", frames_str)\n",
+    "    return [(int(placement_id), int(role_id)) for placement_id, role_id in matches]\n",
+    "\n",
+    "def make_placement_lookup(df_placements: pd.DataFrame) -> dict[tuple[str, int], dict]:\n",
+    "    \"\"\"Build a coordinate/metadata lookup keyed by ``(board_key, placement_id)``.\"\"\"\n",
+    "    rows = {}\n",
+    "    for _, row in df_placements.iterrows():\n",
+    "        key = (str(row[\"board_key\"]), int(row[\"placement_id\"]))\n",
+    "        rows[key] = row.to_dict()\n",
+    "    return rows\n",
+    "\n",
+    "def role_name(role_id: int, config: BoardConfig) -> str:\n",
+    "    \"\"\"Map a board-specific numeric role ID to a shared semantic role name.\"\"\"\n",
+    "    return config.role_id_to_name.get(int(role_id), \"unknown\")\n",
+    "\n",
+    "def placement_xy(\n",
+    "    board_key: str,\n",
+    "    placement_id: int,\n",
+    "    placement_lookup: dict[tuple[str, int], dict],\n",
+    ") -> tuple[float, float]:\n",
+    "    \"\"\"Return raw board coordinates for a placement, or NaNs if unknown.\"\"\"\n",
+    "    row = placement_lookup.get((str(board_key), int(placement_id)))\n",
+    "    if row is None:\n",
+    "        return (float(\"nan\"), float(\"nan\"))\n",
+    "    return (float(row[\"x\"]), float(row[\"y\"]))\n",
+    "\n",
+    "def canonicalize_holds(\n",
+    "    holds: Iterable[tuple[int, int]],\n",
+    "    config: BoardConfig,\n",
+    "    placement_lookup: dict[tuple[str, int], dict],\n",
+    ") -> list[tuple[int, int]]:\n",
+    "    \"\"\"Sort holds into the canonical route order used by all model inputs.\n",
+    "\n",
+    "    Frames preserve setter/storage order, which is not always stable\n",
+    "    across routes or boards. Canonical ordering keeps starts first, hand/foot\n",
+    "    holds in a bottom-to-top scan, and finishes last, giving the models a more\n",
+    "    consistent sequence grammar.\n",
+    "    \"\"\"\n",
+    "    def key(pair: tuple[int, int]):\n",
+    "        \"\"\"Sort by semantic role, then board position, then placement ID.\"\"\"\n",
+    "        placement_id, role_id = pair\n",
+    "        x, y = placement_xy(config.board_key, placement_id, placement_lookup)\n",
+    "        name = role_name(role_id, config)\n",
+    "        return (\n",
+    "            ROLE_SORT_ORDER.get(name, 9),\n",
+    "            y if not np.isnan(y) else 9999.0,\n",
+    "            x if not np.isnan(x) else 9999.0,\n",
+    "            placement_id,\n",
+    "        )\n",
+    "\n",
+    "    return sorted(list(holds), key=key)\n",
+    "\n",
+    "def board_token(config: BoardConfig) -> str:\n",
+    "    \"\"\"Return the special conditioning token for a board config.\"\"\"\n",
+    "    return f\"<BOARD_{config.token_prefix}>\"\n",
+    "\n",
+    "def angle_token(angle: float) -> str:\n",
+    "    \"\"\"Round a wall angle into the shared angle-token format.\"\"\"\n",
+    "    return f\"<ANGLE_{int(round(float(angle)))}>\"\n",
+    "\n",
+    "def hold_token(\n",
+    "    placement_id: int,\n",
+    "    role_id: int,\n",
+    "    config: BoardConfig,\n",
+    ") -> str:\n",
+    "    \"\"\"Return a board-namespaced hold token for a placement and role.\"\"\"\n",
+    "    semantic_role = role_name(role_id, config)\n",
+    "    return f\"<{config.token_prefix}_p{int(placement_id)}_{semantic_role}>\"\n",
+    "\n",
+    "def tokenize_route(\n",
+    "    row,\n",
+    "    config: BoardConfig,\n",
+    "    placement_lookup: dict[tuple[str, int], dict],\n",
+    "    include_grade: bool = True,\n",
+    "    canonical: bool = True,\n",
+    ") -> list[str]:\n",
+    "    \"\"\"Tokenize one climb row into the sequence consumed by the models.\n",
+    "\n",
+    "    ``include_grade=True`` is used for GPT-style generation, where the target\n",
+    "    grade is a conditioning token. ``include_grade=False`` is used for grade\n",
+    "    prediction so the model cannot read the answer from its input.\n",
+    "    \"\"\"\n",
+    "    holds = parse_frames(row[\"frames\"])\n",
+    "    if canonical:\n",
+    "        holds = canonicalize_holds(holds, config, placement_lookup)\n",
+    "\n",
+    "    tokens = [\n",
+    "        \"<BOS>\",\n",
+    "        board_token(config),\n",
+    "        angle_token(row[\"angle\"]),\n",
+    "    ]\n",
+    "    if include_grade:\n",
+    "        tokens.append(grade_token(row[\"display_difficulty\"]))\n",
+    "\n",
+    "    tokens.extend(hold_token(placement_id, role_id, config) for placement_id, role_id in holds)\n",
+    "    tokens.append(\"<EOS>\")\n",
+    "    return tokens\n",
+    "\n",
+    "def build_route_records(\n",
+    "    df_climbs: pd.DataFrame,\n",
+    "    configs_by_key: dict[str, BoardConfig],\n",
+    "    placement_lookup: dict[tuple[str, int], dict],\n",
+    ") -> pd.DataFrame:\n",
+    "    \"\"\"Create one training/evaluation record per climb-angle row.\n",
+    "\n",
+    "    The returned frame keeps both human-readable route metadata and model-ready\n",
+    "    token sequences, which lets downstream scripts save compact CSV summaries\n",
+    "    while still retaining the richer JSONL training artifacts.\n",
+    "    \"\"\"\n",
+    "    records: list[dict] = []\n",
+    "\n",
+    "    for _, row in df_climbs.iterrows():\n",
+    "        board_key = str(row[\"board_key\"])\n",
+    "        config = configs_by_key[board_key]\n",
+    "        holds = canonicalize_holds(parse_frames(row[\"frames\"]), config, placement_lookup)\n",
+    "        if not holds:\n",
+    "            continue\n",
+    "\n",
+    "        hold_tokens = [hold_token(p, r, config) for p, r in holds]\n",
+    "        semantic_roles = [role_name(r, config) for _, r in holds]\n",
+    "\n",
+    "        tokens_with_grade = tokenize_route(\n",
+    "            row,\n",
+    "            config=config,\n",
+    "            placement_lookup=placement_lookup,\n",
+    "            include_grade=True,\n",
+    "            canonical=True,\n",
+    "        )\n",
+    "        tokens_no_grade = tokenize_route(\n",
+    "            row,\n",
+    "            config=config,\n",
+    "            placement_lookup=placement_lookup,\n",
+    "            include_grade=False,\n",
+    "            canonical=True,\n",
+    "        )\n",
+    "\n",
+    "        records.append(\n",
+    "            {\n",
+    "                \"uuid\": row[\"uuid\"],\n",
+    "                \"board_key\": board_key,\n",
+    "                \"board_display_name\": row[\"board_display_name\"],\n",
+    "                \"board_token_prefix\": row[\"board_token_prefix\"],\n",
+    "                \"board_token\": board_token(config),\n",
+    "                \"climb_name\": row[\"climb_name\"],\n",
+    "                \"setter_username\": row.get(\"setter_username\"),\n",
+    "                \"layout_id\": int(row[\"layout_id\"]),\n",
+    "                \"layout_name\": row.get(\"layout_name\"),\n",
+    "                \"board_name\": row.get(\"board_name\"),\n",
+    "                \"frames\": row[\"frames\"],\n",
+    "                \"angle\": float(row[\"angle\"]),\n",
+    "                \"display_difficulty\": float(row[\"display_difficulty\"]),\n",
+    "                \"grouped_v\": int(to_grouped_v(row[\"display_difficulty\"])),\n",
+    "                \"boulder_grade\": row.get(\"boulder_grade\"),\n",
+    "                \"ascensionist_count\": row.get(\"ascensionist_count\"),\n",
+    "                \"quality_average\": row.get(\"quality_average\"),\n",
+    "                \"fa_at\": row.get(\"fa_at\"),\n",
+    "                \"n_holds\": len(holds),\n",
+    "                \"n_start\": semantic_roles.count(\"start\"),\n",
+    "                \"n_middle\": semantic_roles.count(\"middle\"),\n",
+    "                \"n_foot\": semantic_roles.count(\"foot\"),\n",
+    "                \"n_finish\": semantic_roles.count(\"finish\"),\n",
+    "                \"holds\": holds,\n",
+    "                \"hold_tokens\": hold_tokens,\n",
+    "                \"tokens_with_grade\": tokens_with_grade,\n",
+    "                \"tokens_no_grade\": tokens_no_grade,\n",
+    "                \"sequence_with_grade\": \" \".join(tokens_with_grade),\n",
+    "                \"sequence_no_grade\": \" \".join(tokens_no_grade),\n",
+    "            }\n",
+    "        )\n",
+    "\n",
+    "    return pd.DataFrame(records)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "6198c24e",
@@ -182,14 +830,22 @@
     "   - `sequence_with_grade`: `<BOS> <BOARD_TB2> <ANGLE_40> <GRADE_V6> <TB2_p344_start> ... <EOS>`\n",
     "   - `sequence_no_grade`: `<BOS> <BOARD_TB2> <ANGLE_40> <TB2_p344_start> ... <EOS>` (grade removed)\n",
     "\n",
-    "The grade-included version is used for the GPT generator (which predicts the next token, including grade). The grade-excluded version is used for the grade predictor (which receives the route without knowing the grade and must predict it)."
+    "The grade-included version is used for the GPT generator (which predicts the next token, including grade). The grade-excluded version is used for the grade predictor (which receives the route without knowing the grade and must predict it).\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "20bed1da",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:45:29.147179Z",
+     "iopub.status.busy": "2026-06-07T15:45:29.146784Z",
+     "iopub.status.idle": "2026-06-07T15:47:22.798391Z",
+     "shell.execute_reply": "2026-06-07T15:47:22.797739Z"
+    }
+   },
    "outputs": [],
    "source": [
     "configs_by_key = {config.board_key: config for config in configs}\n",
@@ -213,14 +869,22 @@
    "source": [
     "## Example tokenized routes\n",
     "\n",
-    "Let's look at what the tokenized routes actually look like. This is the \"text\" that our transformer models will read."
+    "Let's look at what the tokenized routes actually look like. This is the \"text\" that our transformer models will read.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "f5b7391b",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:22.801970Z",
+     "iopub.status.busy": "2026-06-07T15:47:22.801298Z",
+     "iopub.status.idle": "2026-06-07T15:47:22.833306Z",
+     "shell.execute_reply": "2026-06-07T15:47:22.832513Z"
+    }
+   },
    "outputs": [],
    "source": [
     "for _, row in df_routes.groupby(\"board_key\").head(2).iterrows():\n",
@@ -231,6 +895,54 @@
     "    print()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "8f46bb08",
+   "metadata": {},
+   "source": [
+    "### Vocabulary helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d52041b7",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:22.836570Z",
+     "iopub.status.busy": "2026-06-07T15:47:22.836109Z",
+     "iopub.status.idle": "2026-06-07T15:47:22.843523Z",
+     "shell.execute_reply": "2026-06-07T15:47:22.842719Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Build the shared vocabulary and encode/decode token strings.\n",
+    "def build_vocab(df_routes: pd.DataFrame) -> tuple[list[str], dict[str, int], dict[int, str]]:\n",
+    "    \"\"\"Build the shared token vocabulary from grade-conditioned sequences.\"\"\"\n",
+    "    all_tokens: list[str] = []\n",
+    "    for tokens in df_routes[\"tokens_with_grade\"]:\n",
+    "        all_tokens.extend(tokens)\n",
+    "\n",
+    "    vocab_tokens = list(SPECIAL_TOKENS)\n",
+    "    for token in sorted(set(all_tokens)):\n",
+    "        if token not in vocab_tokens:\n",
+    "            vocab_tokens.append(token)\n",
+    "\n",
+    "    stoi = {token: idx for idx, token in enumerate(vocab_tokens)}\n",
+    "    itos = {idx: token for token, idx in stoi.items()}\n",
+    "    return vocab_tokens, stoi, itos\n",
+    "\n",
+    "def encode(tokens: Iterable[str], stoi: dict[str, int]) -> list[int]:\n",
+    "    \"\"\"Convert tokens to integer IDs, using ``<UNK>`` for unseen tokens.\"\"\"\n",
+    "    unk_id = stoi[\"<UNK>\"]\n",
+    "    return [stoi.get(token, unk_id) for token in tokens]\n",
+    "\n",
+    "def decode(ids: Iterable[int], itos: dict[int, str]) -> list[str]:\n",
+    "    \"\"\"Convert integer IDs back to token strings.\"\"\"\n",
+    "    return [itos.get(int(idx), \"<UNK>\") for idx in ids]"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "0393d191",
@@ -250,20 +962,28 @@
     "\n",
     "### Why board-namespaced hold tokens?\n",
     "\n",
-    "Placement ID 344 on TB2 refers to a completely different physical hold than placement ID 344 on Kilter. By prefixing with the board name (`TB2_p344_start` vs `KILTER_p344_start`), we ensure the model treats these as distinct tokens.\n",
+    "Placement ID 344 on TB2 refers to a completely different physical hold than placement ID 344 on Kilter (the latter doesn't exist). By prefixing with the board name (`TB2_p344_start` vs `KILTER_p344_start`), we ensure the model treats these as distinct tokens.\n",
     "\n",
     "This is analogous to how multilingual LLMs use language-specific subword tokens — the same byte sequence can mean different things in different languages.\n",
     "\n",
     "### String-to-integer mapping\n",
     "\n",
-    "Transformers operate on integer indices, not strings. The `stoi` (string-to-integer) and `itos` (integer-to-string) dictionaries provide this mapping, similar to how HuggingFace tokenizers work."
+    "Transformers operate on integer indices, not strings. The `stoi` (string-to-integer) and `itos` (integer-to-string) dictionaries provide this mapping, similar to how HuggingFace tokenizers work.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "1ba5b78d",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:22.846657Z",
+     "iopub.status.busy": "2026-06-07T15:47:22.846214Z",
+     "iopub.status.idle": "2026-06-07T15:47:23.506016Z",
+     "shell.execute_reply": "2026-06-07T15:47:23.505338Z"
+    }
+   },
    "outputs": [],
    "source": [
     "vocab_tokens, stoi, itos = build_vocab(df_routes)\n",
@@ -286,6 +1006,115 @@
     "print(f\"Hold tokens: {len(hold_tokens)}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "8408e4ab",
+   "metadata": {},
+   "source": [
+    "### Split helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "366d46b5",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:23.509090Z",
+     "iopub.status.busy": "2026-06-07T15:47:23.508606Z",
+     "iopub.status.idle": "2026-06-07T15:47:23.518536Z",
+     "shell.execute_reply": "2026-06-07T15:47:23.517995Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Assign train/validation/test splits at the logical-climb group level.\n",
+    "def safe_train_test_split(\n",
+    "    df: pd.DataFrame,\n",
+    "    test_size: float,\n",
+    "    random_state: int,\n",
+    "    stratify_col: str | None = None,\n",
+    "):\n",
+    "    \"\"\"Split a DataFrame with optional stratification and graceful fallback.\n",
+    "\n",
+    "    scikit-learn raises when a requested stratum is too small. The tokenization\n",
+    "    pipeline prefers stratified splits when possible, but falls back to an\n",
+    "    unstratified split rather than failing on tiny smoke-test subsets.\n",
+    "    \"\"\"\n",
+    "    stratify = None\n",
+    "    if stratify_col is not None and stratify_col in df.columns:\n",
+    "        counts = df[stratify_col].value_counts()\n",
+    "        if len(counts) > 1 and counts.min() >= 2:\n",
+    "            stratify = df[stratify_col]\n",
+    "\n",
+    "    try:\n",
+    "        return train_test_split(\n",
+    "            df,\n",
+    "            test_size=test_size,\n",
+    "            random_state=random_state,\n",
+    "            stratify=stratify,\n",
+    "        )\n",
+    "    except ValueError:\n",
+    "        return train_test_split(\n",
+    "            df,\n",
+    "            test_size=test_size,\n",
+    "            random_state=random_state,\n",
+    "            stratify=None,\n",
+    "        )\n",
+    "\n",
+    "def assign_group_splits(\n",
+    "    df: pd.DataFrame,\n",
+    "    group_cols: list[str],\n",
+    "    test_size: float,\n",
+    "    val_size_within_temp: float,\n",
+    "    random_state: int,\n",
+    "    stratify_col: str | None = None,\n",
+    ") -> pd.Series:\n",
+    "    \"\"\"Assign train/val/test splits at group level.\n",
+    "\n",
+    "    This prevents multiple rows for the same logical climb, for example the\n",
+    "    same UUID at several angles, from being distributed across different\n",
+    "    splits. The returned Series is indexed like ``df`` and contains\n",
+    "    ``train``, ``val``, or ``test``.\n",
+    "    \"\"\"\n",
+    "    group_df = df[group_cols + ([stratify_col] if stratify_col else [])].copy()\n",
+    "    group_df = group_df.drop_duplicates(group_cols).reset_index(drop=True)\n",
+    "\n",
+    "    train_groups, temp_groups = safe_train_test_split(\n",
+    "        group_df,\n",
+    "        test_size=test_size,\n",
+    "        random_state=random_state,\n",
+    "        stratify_col=stratify_col,\n",
+    "    )\n",
+    "    val_groups, test_groups = safe_train_test_split(\n",
+    "        temp_groups,\n",
+    "        test_size=val_size_within_temp,\n",
+    "        random_state=random_state,\n",
+    "        stratify_col=stratify_col,\n",
+    "    )\n",
+    "\n",
+    "    def key_frame(frame: pd.DataFrame) -> set[tuple]:\n",
+    "        \"\"\"Return stringified group keys so pandas dtypes cannot affect joins.\"\"\"\n",
+    "        return set(map(tuple, frame[group_cols].astype(str).values.tolist()))\n",
+    "\n",
+    "    train_keys = key_frame(train_groups)\n",
+    "    val_keys = key_frame(val_groups)\n",
+    "    test_keys = key_frame(test_groups)\n",
+    "\n",
+    "    def split_for_row(row) -> str:\n",
+    "        \"\"\"Map one original row back to its group-level split assignment.\"\"\"\n",
+    "        key = tuple(str(row[col]) for col in group_cols)\n",
+    "        if key in train_keys:\n",
+    "            return \"train\"\n",
+    "        if key in val_keys:\n",
+    "            return \"val\"\n",
+    "        if key in test_keys:\n",
+    "            return \"test\"\n",
+    "        raise KeyError(f\"Could not assign split for group key {key}\")\n",
+    "\n",
+    "    return df.apply(split_for_row, axis=1)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "93176cff",
@@ -302,14 +1131,22 @@
     "\n",
     "Without stratification, we might end up with all V14 climbs in the test set and none in training, which would make evaluation meaningless.\n",
     "\n",
-    "This is the same principle as stratified splitting in NLP, where you ensure all languages or domains are represented in each split."
+    "This is the same principle as stratified splitting in NLP, where you ensure all languages or domains are represented in each split.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "ff18298e",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:23.521543Z",
+     "iopub.status.busy": "2026-06-07T15:47:23.521153Z",
+     "iopub.status.idle": "2026-06-07T15:47:31.054015Z",
+     "shell.execute_reply": "2026-06-07T15:47:31.053128Z"
+    }
+   },
    "outputs": [],
    "source": [
     "df_routes[\"ids_with_grade\"] = df_routes[\"tokens_with_grade\"].apply(lambda tokens: encode(tokens, stoi))\n",
@@ -327,6 +1164,152 @@
     "df_routes.groupby([\"board_key\", \"split\"]).size().unstack(fill_value=0)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ae789b11",
+   "metadata": {},
+   "source": [
+    "### Token metadata helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "43e9dd8b",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:31.057561Z",
+     "iopub.status.busy": "2026-06-07T15:47:31.056935Z",
+     "iopub.status.idle": "2026-06-07T15:47:31.072487Z",
+     "shell.execute_reply": "2026-06-07T15:47:31.071641Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Attach board, role, placement, and coordinate metadata to each token.\n",
+    "def build_token_metadata(\n",
+    "    vocab_tokens: list[str],\n",
+    "    stoi: dict[str, int],\n",
+    "    df_placements: pd.DataFrame,\n",
+    "    placement_lookup: dict[tuple[str, int], dict],\n",
+    "    configs_by_prefix: dict[str, BoardConfig],\n",
+    ") -> pd.DataFrame:\n",
+    "    \"\"\"Build per-token metadata used for coordinate features and plotting.\n",
+    "\n",
+    "    Hold tokens receive raw coordinates, normalized coordinates in ``[-1, 1]``,\n",
+    "    role labels, and board identity. Non-hold tokens keep neutral coordinate\n",
+    "    features so the grade predictor can safely index every token ID.\n",
+    "    \"\"\"\n",
+    "    bounds = {}\n",
+    "    for board_key, frame in df_placements.groupby(\"board_key\"):\n",
+    "        xs = frame[\"x\"].astype(float)\n",
+    "        ys = frame[\"y\"].astype(float)\n",
+    "        bounds[str(board_key)] = {\n",
+    "            \"x_min\": float(xs.min()),\n",
+    "            \"x_max\": float(xs.max()),\n",
+    "            \"y_min\": float(ys.min()),\n",
+    "            \"y_max\": float(ys.max()),\n",
+    "        }\n",
+    "\n",
+    "    def normalize(value: float, lo: float, hi: float) -> float:\n",
+    "        \"\"\"Scale one coordinate into ``[-1, 1]`` with safe missing-value handling.\"\"\"\n",
+    "        if pd.isna(value) or hi == lo:\n",
+    "            return 0.0\n",
+    "        return 2 * ((float(value) - lo) / (hi - lo)) - 1\n",
+    "\n",
+    "    rows: list[dict] = []\n",
+    "\n",
+    "    for token in vocab_tokens:\n",
+    "        meta = {\n",
+    "            \"token\": token,\n",
+    "            \"token_id\": stoi[token],\n",
+    "            \"kind\": \"special\",\n",
+    "            \"board_key\": None,\n",
+    "            \"board_token_prefix\": None,\n",
+    "            \"placement_id\": np.nan,\n",
+    "            \"role\": None,\n",
+    "            \"x\": np.nan,\n",
+    "            \"y\": np.nan,\n",
+    "            \"x_norm\": 0.0,\n",
+    "            \"y_norm\": 0.0,\n",
+    "            \"is_hold\": 0,\n",
+    "            \"angle\": np.nan,\n",
+    "            \"grouped_v\": np.nan,\n",
+    "        }\n",
+    "\n",
+    "        hold_match = HOLD_TOKEN_PATTERN.match(token)\n",
+    "        if hold_match:\n",
+    "            prefix = hold_match.group(1)\n",
+    "            placement_id = int(hold_match.group(2))\n",
+    "            role = hold_match.group(3)\n",
+    "            config = configs_by_prefix[prefix]\n",
+    "            board_key = config.board_key\n",
+    "            row = placement_lookup.get((board_key, placement_id), {})\n",
+    "            x = float(row.get(\"x\", np.nan))\n",
+    "            y = float(row.get(\"y\", np.nan))\n",
+    "            board_bounds = bounds.get(board_key, {\"x_min\": 0, \"x_max\": 1, \"y_min\": 0, \"y_max\": 1})\n",
+    "\n",
+    "            meta.update(\n",
+    "                {\n",
+    "                    \"kind\": \"hold\",\n",
+    "                    \"board_key\": board_key,\n",
+    "                    \"board_token_prefix\": prefix,\n",
+    "                    \"placement_id\": placement_id,\n",
+    "                    \"role\": role,\n",
+    "                    \"x\": x,\n",
+    "                    \"y\": y,\n",
+    "                    \"x_norm\": normalize(x, board_bounds[\"x_min\"], board_bounds[\"x_max\"]),\n",
+    "                    \"y_norm\": normalize(y, board_bounds[\"y_min\"], board_bounds[\"y_max\"]),\n",
+    "                    \"is_hold\": 1,\n",
+    "                }\n",
+    "            )\n",
+    "\n",
+    "        angle_match = ANGLE_TOKEN_PATTERN.match(token)\n",
+    "        if angle_match:\n",
+    "            meta.update({\"kind\": \"angle\", \"angle\": int(angle_match.group(1))})\n",
+    "\n",
+    "        grade_match = GRADE_TOKEN_PATTERN.match(token)\n",
+    "        if grade_match:\n",
+    "            meta.update({\"kind\": \"grade\", \"grouped_v\": int(grade_match.group(1))})\n",
+    "\n",
+    "        board_match = BOARD_TOKEN_PATTERN.match(token)\n",
+    "        if board_match:\n",
+    "            prefix = board_match.group(1)\n",
+    "            config = configs_by_prefix.get(prefix)\n",
+    "            meta.update(\n",
+    "                {\n",
+    "                    \"kind\": \"board\",\n",
+    "                    \"board_key\": None if config is None else config.board_key,\n",
+    "                    \"board_token_prefix\": prefix,\n",
+    "                }\n",
+    "            )\n",
+    "\n",
+    "        rows.append(meta)\n",
+    "\n",
+    "    return pd.DataFrame(rows)\n",
+    "\n",
+    "def vocab_payload(\n",
+    "    stoi: dict[str, int],\n",
+    "    itos: dict[int, str],\n",
+    "    configs_by_key: dict[str, BoardConfig],\n",
+    ") -> dict:\n",
+    "    \"\"\"Package vocabulary and board metadata for JSON serialization.\"\"\"\n",
+    "    return {\n",
+    "        \"stoi\": stoi,\n",
+    "        \"itos\": {str(k): v for k, v in itos.items()},\n",
+    "        \"special_tokens\": SPECIAL_TOKENS,\n",
+    "        \"boards\": {\n",
+    "            board_key: {\n",
+    "                \"token_prefix\": config.token_prefix,\n",
+    "                \"board_token\": board_token(config),\n",
+    "                \"role_definitions\": config.role_definitions,\n",
+    "            }\n",
+    "            for board_key, config in configs_by_key.items()\n",
+    "        },\n",
+    "        \"grade_to_v\": {str(k): v for k, v in GRADE_TO_V.items()},\n",
+    "    }"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "414dca92",
@@ -343,14 +1326,22 @@
     "- **Semantic role** (`start`, `middle`, `finish`, `foot`): What the hold is used for\n",
     "- **Board identity** (`board_key`): Which board this hold belongs to\n",
     "\n",
-    "The grade predictor uses these coordinate features as additional embeddings alongside the token embeddings. This is similar to how some LLMs inject positional embeddings or segment embeddings — we're giving the model extra structured information about each token."
+    "The grade predictor uses these coordinate features as additional embeddings alongside the token embeddings. This is similar to how some LLMs inject positional embeddings or segment embeddings — we're giving the model extra structured information about each token.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "48c3692e",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:31.075613Z",
+     "iopub.status.busy": "2026-06-07T15:47:31.075057Z",
+     "iopub.status.idle": "2026-06-07T15:47:31.130838Z",
+     "shell.execute_reply": "2026-06-07T15:47:31.129975Z"
+    }
+   },
    "outputs": [],
    "source": [
     "df_token_meta = build_token_metadata(\n",
@@ -368,6 +1359,57 @@
     "df_token_meta[df_token_meta[\"kind\"] == \"hold\"].head()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "79774dc1",
+   "metadata": {},
+   "source": [
+    "### JSON output helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a53bf23d",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:31.134369Z",
+     "iopub.status.busy": "2026-06-07T15:47:31.133692Z",
+     "iopub.status.idle": "2026-06-07T15:47:31.141767Z",
+     "shell.execute_reply": "2026-06-07T15:47:31.140911Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Write JSON artifacts after converting NumPy/pandas values to plain Python values.\n",
+    "def json_safe(obj: Any) -> Any:\n",
+    "    \"\"\"Convert NumPy/pandas values into JSON-serializable Python objects.\"\"\"\n",
+    "    if isinstance(obj, dict):\n",
+    "        return {str(k): json_safe(v) for k, v in obj.items()}\n",
+    "    if isinstance(obj, (list, tuple)):\n",
+    "        return [json_safe(v) for v in obj]\n",
+    "    if isinstance(obj, np.integer):\n",
+    "        return int(obj)\n",
+    "    if isinstance(obj, np.floating):\n",
+    "        if np.isnan(obj):\n",
+    "            return None\n",
+    "        return float(obj)\n",
+    "    if isinstance(obj, np.ndarray):\n",
+    "        return json_safe(obj.tolist())\n",
+    "    try:\n",
+    "        if pd.isna(obj):\n",
+    "            return None\n",
+    "    except Exception:\n",
+    "        pass\n",
+    "    return obj\n",
+    "\n",
+    "def write_json(path: str | Path, payload: Any) -> None:\n",
+    "    \"\"\"Write an object as indented UTF-8 JSON after ``json_safe`` cleanup.\"\"\"\n",
+    "    path = Path(path)\n",
+    "    path.parent.mkdir(parents=True, exist_ok=True)\n",
+    "    path.write_text(json.dumps(json_safe(payload), indent=2), encoding=\"utf-8\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "414dca93",
@@ -382,14 +1424,22 @@
     "3. **`token_vocab.json`**: The vocabulary mapping (stoi and itos)\n",
     "4. **`token_metadata.csv`**: Metadata for each token (coordinates, roles, etc.)\n",
     "5. **`placement_metadata.csv`**: Physical placement information\n",
-    "6. **`board_summary.csv`**: Aggregate statistics per board"
+    "6. **`board_summary.csv`**: Aggregate statistics per board\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "50e81878",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:47:31.144897Z",
+     "iopub.status.busy": "2026-06-07T15:47:31.144460Z",
+     "iopub.status.idle": "2026-06-07T15:48:29.973473Z",
+     "shell.execute_reply": "2026-06-07T15:48:29.972779Z"
+    }
+   },
    "outputs": [],
    "source": [
     "OUT = ROOT / \"data\" / \"processed\" / \"tokenized\"\n",
diff --git a/notebooks/02_joint_transformer_grade_prediction.ipynb b/notebooks/02_joint_transformer_grade_prediction.ipynb
index f6255f2..e1709bf 100644
--- a/notebooks/02_joint_transformer_grade_prediction.ipynb
+++ b/notebooks/02_joint_transformer_grade_prediction.ipynb
@@ -30,7 +30,7 @@
     "A climb's difficulty depends on the *relationships between holds*, not just individual holds. Self-attention naturally captures these relationships:\n",
     "\n",
     "- A start hold far from the first middle hold suggests a big opening move\n",
-    "- Two hand holds close together with a foot hold far away suggests a dyno\n",
+    "- Two holds that are very far apart suggest a dyno\n",
     "- The overall spatial distribution determines the \"flow\" of the climb\n",
     "\n",
     "The transformer can learn these spatial relationships through attention, without us having to manually engineer features like \"mean hand reach\" or \"height gained\" (though those features were useful in the classical model).\n",
@@ -47,40 +47,57 @@
     "\n",
     "```text\n",
     "display_difficulty (continuous value, e.g., 20.5)\n",
-    "```"
+    "```\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "3dfd6081",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:37.490884Z",
+     "iopub.status.busy": "2026-06-07T15:48:37.490209Z",
+     "iopub.status.idle": "2026-06-07T15:48:42.972689Z",
+     "shell.execute_reply": "2026-06-07T15:48:42.971662Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "from pathlib import Path\n",
-    "import sys\n",
+    "from __future__ import annotations\n",
+    "\n",
     "import json\n",
+    "import math\n",
+    "from pathlib import Path\n",
+    "from typing import Any\n",
+    "\n",
     "import numpy as np\n",
     "import pandas as pd\n",
     "import torch\n",
     "import torch.nn as nn\n",
-    "from torch.utils.data import DataLoader\n",
+    "import torch.nn.functional as F\n",
+    "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n",
+    "from torch.utils.data import DataLoader, Dataset\n",
     "\n",
     "ROOT = Path.cwd().resolve()\n",
     "if ROOT.name == \"notebooks\":\n",
     "    ROOT = ROOT.parent\n",
-    "sys.path.insert(0, str(ROOT / \"src\"))\n",
-    "\n",
-    "from climbingboardgpt.datasets import RouteGradeDataset\n",
-    "from climbingboardgpt.metrics import regression_metrics, metrics_by_board\n",
-    "from climbingboardgpt.models import JointRouteTransformerRegressor"
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "8a9e2443",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:42.976137Z",
+     "iopub.status.busy": "2026-06-07T15:48:42.975792Z",
+     "iopub.status.idle": "2026-06-07T15:48:48.768984Z",
+     "shell.execute_reply": "2026-06-07T15:48:48.768115Z"
+    }
+   },
    "outputs": [],
    "source": [
     "TOKENIZED = ROOT / \"data\" / \"processed\" / \"tokenized\"\n",
@@ -95,7 +112,8 @@
     "unk_id = stoi[\"<UNK>\"]\n",
     "\n",
     "print(f\"Vocabulary size: {len(stoi):,}\")\n",
-    "print(f\"Total routes: {len(df_routes):,}\")"
+    "print(f\"Total routes: {len(df_routes):,}\")\n",
+    "\n"
    ]
   },
   {
@@ -114,14 +132,22 @@
     "2. `y_norm`: Normalized vertical position on the board (-1 to 1)\n",
     "3. `is_hold`: 1 if this token represents a hold, 0 otherwise\n",
     "\n",
-    "These features are projected through a linear layer and added to the token embeddings. This is similar to how some vision-language models inject spatial features from images alongside text tokens."
+    "These features are projected through a linear layer and added to the token embeddings. This is similar to how some vision-language models inject spatial features from images alongside text tokens.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "95bb745f",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:48.772384Z",
+     "iopub.status.busy": "2026-06-07T15:48:48.771749Z",
+     "iopub.status.idle": "2026-06-07T15:48:52.916642Z",
+     "shell.execute_reply": "2026-06-07T15:48:52.915616Z"
+    }
+   },
    "outputs": [],
    "source": [
     "def encode(tokens):\n",
@@ -153,7 +179,73 @@
     "\n",
     "print(f\"Max sequence length: {max_len}\")\n",
     "print(f\"Coordinate features shape: {coord_features.shape}\")\n",
-    "print(f\"Vocabulary size: {len(stoi)}\")"
+    "print(f\"Vocabulary size: {len(stoi)}\")\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9033f9e8",
+   "metadata": {},
+   "source": [
+    "### Dataset helper"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c55c1d26",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:52.920221Z",
+     "iopub.status.busy": "2026-06-07T15:48:52.919793Z",
+     "iopub.status.idle": "2026-06-07T15:48:52.927627Z",
+     "shell.execute_reply": "2026-06-07T15:48:52.926737Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Pad route-token sequences for transformer grade prediction.\n",
+    "class RouteGradeDataset(Dataset):\n",
+    "    \"\"\"Dataset for transformer encoder grade prediction.\n",
+    "\n",
+    "    Each item returns a padded token sequence, a boolean attention mask, the\n",
+    "    continuous display-difficulty target, and a small amount of route identity\n",
+    "    metadata used when writing prediction CSVs.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(self, df, max_len: int, pad_id: int):\n",
+    "        \"\"\"Store model IDs and labels from a tokenized route DataFrame.\"\"\"\n",
+    "        self.row_ids = df[\"row_id\"].tolist() if \"row_id\" in df.columns else df.index.tolist()\n",
+    "        self.ids = df[\"model_ids\"].tolist()\n",
+    "        self.targets = df[\"display_difficulty\"].astype(float).values\n",
+    "        self.uuids = df[\"uuid\"].tolist()\n",
+    "        self.boards = df[\"board_key\"].astype(str).tolist()\n",
+    "        self.max_len = int(max_len)\n",
+    "        self.pad_id = int(pad_id)\n",
+    "\n",
+    "    def __len__(self) -> int:\n",
+    "        \"\"\"Return the number of route examples.\"\"\"\n",
+    "        return len(self.ids)\n",
+    "\n",
+    "    def __getitem__(self, idx: int):\n",
+    "        \"\"\"Return one padded encoder example and its regression target.\"\"\"\n",
+    "        ids = list(self.ids[idx])[: self.max_len]\n",
+    "        mask = [1] * len(ids)\n",
+    "        if len(ids) < self.max_len:\n",
+    "            pad_n = self.max_len - len(ids)\n",
+    "            ids += [self.pad_id] * pad_n\n",
+    "            mask += [0] * pad_n\n",
+    "\n",
+    "        return {\n",
+    "            \"input_ids\": torch.tensor(ids, dtype=torch.long),\n",
+    "            \"attention_mask\": torch.tensor(mask, dtype=torch.bool),\n",
+    "            \"target\": torch.tensor(self.targets[idx], dtype=torch.float32),\n",
+    "            \"row_id\": int(self.row_ids[idx]),\n",
+    "            \"uuid\": self.uuids[idx],\n",
+    "            \"board_key\": self.boards[idx],\n",
+    "        }\n",
+    "\n"
    ]
   },
   {
@@ -178,14 +270,22 @@
     "- `input_ids`: Integer token IDs, padded to `max_len`\n",
     "- `attention_mask`: 1 for real tokens, 0 for padding\n",
     "- `target`: The difficulty score we want to predict\n",
-    "- `uuid`, `board_key`: Metadata for evaluation"
+    "- `uuid`, `board_key`: Metadata for evaluation\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "2c9e5543",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:52.930809Z",
+     "iopub.status.busy": "2026-06-07T15:48:52.930299Z",
+     "iopub.status.idle": "2026-06-07T15:48:53.612170Z",
+     "shell.execute_reply": "2026-06-07T15:48:53.611156Z"
+    }
+   },
    "outputs": [],
    "source": [
     "train_df = df_routes[df_routes[\"split\"] == \"train\"].reset_index(drop=True)\n",
@@ -202,7 +302,106 @@
     "\n",
     "print(f\"Training samples: {len(train_ds):,}\")\n",
     "print(f\"Validation samples: {len(val_ds):,}\")\n",
-    "print(f\"Test samples: {len(test_ds):,}\")"
+    "print(f\"Test samples: {len(test_ds):,}\")\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03091a62",
+   "metadata": {},
+   "source": [
+    "### Transformer regressor model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "78612fe7",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:53.616012Z",
+     "iopub.status.busy": "2026-06-07T15:48:53.615396Z",
+     "iopub.status.idle": "2026-06-07T15:48:53.640842Z",
+     "shell.execute_reply": "2026-06-07T15:48:53.639849Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Transformer encoder used as a continuous grade regressor.\n",
+    "class JointRouteTransformerRegressor(nn.Module):\n",
+    "    \"\"\"Transformer encoder for joint TB2/Kilter route difficulty prediction.\n",
+    "\n",
+    "    Inputs are token IDs plus an attention mask. Token, position, and learned\n",
+    "    projections of coordinate metadata are added before the encoder. The first\n",
+    "    ``<CLS>`` position is then used as a pooled route representation for scalar\n",
+    "    difficulty regression.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        vocab_size: int,\n",
+    "        max_len: int,\n",
+    "        coord_features: torch.Tensor,\n",
+    "        d_model: int = 128,\n",
+    "        nhead: int = 4,\n",
+    "        num_layers: int = 4,\n",
+    "        dim_feedforward: int = 256,\n",
+    "        dropout: float = 0.10,\n",
+    "        pad_id: int = 0,\n",
+    "    ):\n",
+    "        \"\"\"Create the encoder, coordinate projection, and regression head.\"\"\"\n",
+    "        super().__init__()\n",
+    "        self.vocab_size = vocab_size\n",
+    "        self.max_len = max_len\n",
+    "        self.d_model = d_model\n",
+    "        self.pad_id = pad_id\n",
+    "\n",
+    "        self.token_emb = nn.Embedding(vocab_size, d_model, padding_idx=pad_id)\n",
+    "        self.pos_emb = nn.Embedding(max_len, d_model)\n",
+    "\n",
+    "        self.register_buffer(\"coord_features\", coord_features.clone().float())\n",
+    "        self.coord_proj = nn.Linear(coord_features.shape[1], d_model)\n",
+    "\n",
+    "        encoder_layer = nn.TransformerEncoderLayer(\n",
+    "            d_model=d_model,\n",
+    "            nhead=nhead,\n",
+    "            dim_feedforward=dim_feedforward,\n",
+    "            dropout=dropout,\n",
+    "            activation=\"gelu\",\n",
+    "            batch_first=True,\n",
+    "            norm_first=True,\n",
+    "        )\n",
+    "        self.encoder = nn.TransformerEncoder(\n",
+    "            encoder_layer,\n",
+    "            num_layers=num_layers,\n",
+    "            enable_nested_tensor=False,\n",
+    "        )\n",
+    "        self.norm = nn.LayerNorm(d_model)\n",
+    "        self.head = nn.Sequential(\n",
+    "            nn.Linear(d_model, d_model),\n",
+    "            nn.GELU(),\n",
+    "            nn.Dropout(dropout),\n",
+    "            nn.Linear(d_model, 1),\n",
+    "        )\n",
+    "\n",
+    "    def forward(self, input_ids: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:\n",
+    "        \"\"\"Return one continuous difficulty prediction per input sequence.\"\"\"\n",
+    "        batch_size, seq_len = input_ids.shape\n",
+    "        positions = torch.arange(seq_len, device=input_ids.device).unsqueeze(0).expand(batch_size, seq_len)\n",
+    "\n",
+    "        # Coordinate features are indexed by token ID, so every occurrence of a\n",
+    "        # hold token gets the same physical x/y hint wherever it appears.\n",
+    "        x = self.token_emb(input_ids) + self.pos_emb(positions)\n",
+    "        x = x + self.coord_proj(self.coord_features[input_ids])\n",
+    "\n",
+    "        key_padding_mask = ~attention_mask.bool()\n",
+    "        h = self.encoder(x, src_key_padding_mask=key_padding_mask)\n",
+    "        h = self.norm(h)\n",
+    "\n",
+    "        cls_state = h[:, 0, :]\n",
+    "        return self.head(cls_state).squeeze(-1)\n",
+    "\n"
    ]
   },
   {
@@ -235,14 +434,22 @@
     "- `nhead=4`: Number of attention heads (multi-head attention)\n",
     "- `num_layers=4`: Number of transformer layers\n",
     "- `dim_feedforward=256`: Dimension of the feedforward network inside each layer\n",
-    "- `dropout=0.10`: Dropout probability for regularization"
+    "- `dropout=0.10`: Dropout probability for regularization\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "62c2db48",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:53.644453Z",
+     "iopub.status.busy": "2026-06-07T15:48:53.643654Z",
+     "iopub.status.idle": "2026-06-07T15:48:59.327913Z",
+     "shell.execute_reply": "2026-06-07T15:48:59.326972Z"
+    }
+   },
    "outputs": [],
    "source": [
     "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
@@ -262,7 +469,8 @@
     "optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4, weight_decay=1e-2)\n",
     "\n",
     "print(f\"Device: {device}\")\n",
-    "print(f\"Parameters: {sum(p.numel() for p in model.parameters()):,}\")"
+    "print(f\"Parameters: {sum(p.numel() for p in model.parameters()):,}\")\n",
+    "\n"
    ]
   },
   {
@@ -284,14 +492,22 @@
     "\n",
     "### Early stopping\n",
     "\n",
-    "We stop training if validation loss doesn't improve for `patience` epochs. This prevents overfitting and saves compute."
+    "We stop training if validation loss doesn't improve for `patience` epochs. This prevents overfitting and saves compute.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "665deadb",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:59.331996Z",
+     "iopub.status.busy": "2026-06-07T15:48:59.331485Z",
+     "iopub.status.idle": "2026-06-07T15:48:59.340181Z",
+     "shell.execute_reply": "2026-06-07T15:48:59.339495Z"
+    }
+   },
    "outputs": [],
    "source": [
     "def run_epoch(model, loader, device, optimizer=None):\n",
@@ -341,7 +557,90 @@
     "patience = 12\n",
     "\n",
     "print(f\"Max epochs: {num_epochs}\")\n",
-    "print(f\"Early stopping patience: {patience}\")"
+    "print(f\"Early stopping patience: {patience}\")\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e5bb77f",
+   "metadata": {},
+   "source": [
+    "### Grade metrics helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aeeb2294",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:59.343447Z",
+     "iopub.status.busy": "2026-06-07T15:48:59.342978Z",
+     "iopub.status.idle": "2026-06-07T15:48:59.353066Z",
+     "shell.execute_reply": "2026-06-07T15:48:59.352152Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Map BoardLib display difficulties into grouped V-grade tokens.\n",
+    "GRADE_TO_V = {\n",
+    "    10: 0, 11: 0, 12: 0,\n",
+    "    13: 1, 14: 1,\n",
+    "    15: 2,\n",
+    "    16: 3, 17: 3,\n",
+    "    18: 4, 19: 4,\n",
+    "    20: 5, 21: 5,\n",
+    "    22: 6,\n",
+    "    23: 7,\n",
+    "    24: 8, 25: 8,\n",
+    "    26: 9,\n",
+    "    27: 10,\n",
+    "    28: 11,\n",
+    "    29: 12,\n",
+    "    30: 13,\n",
+    "    31: 14,\n",
+    "    32: 15,\n",
+    "    33: 16,\n",
+    "}\n",
+    "\n",
+    "def to_grouped_v(display_difficulty: float) -> int:\n",
+    "    \"\"\"Map a continuous display difficulty to the nearest grouped V grade.\"\"\"\n",
+    "    rounded = int(round(float(display_difficulty)))\n",
+    "    rounded = max(min(rounded, max(GRADE_TO_V)), min(GRADE_TO_V))\n",
+    "    return GRADE_TO_V[rounded]\n",
+    "\n",
+    "def grade_token(display_difficulty: float) -> str:\n",
+    "    \"\"\"Return the grade-conditioning token for a display difficulty value.\"\"\"\n",
+    "    return f\"<GRADE_V{to_grouped_v(display_difficulty)}>\"\n",
+    "\n",
+    "# Evaluate difficulty regression and grouped V-grade accuracy.\n",
+    "def regression_metrics(y_true, y_pred) -> dict[str, float]:\n",
+    "    \"\"\"Compute difficulty-scale and grouped-V-grade prediction metrics.\"\"\"\n",
+    "    y_true = np.asarray(y_true)\n",
+    "    y_pred = np.asarray(y_pred)\n",
+    "    true_v = np.asarray([to_grouped_v(x) for x in y_true])\n",
+    "    pred_v = np.asarray([to_grouped_v(x) for x in y_pred])\n",
+    "\n",
+    "    return {\n",
+    "        \"mae\": float(mean_absolute_error(y_true, y_pred)),\n",
+    "        \"rmse\": float(math.sqrt(mean_squared_error(y_true, y_pred))),\n",
+    "        \"r2\": float(r2_score(y_true, y_pred)),\n",
+    "        \"within_1_difficulty\": float(np.mean(np.abs(y_true - y_pred) <= 1) * 100),\n",
+    "        \"within_2_difficulty\": float(np.mean(np.abs(y_true - y_pred) <= 2) * 100),\n",
+    "        \"exact_grouped_v\": float(np.mean(true_v == pred_v) * 100),\n",
+    "        \"within_1_vgrade\": float(np.mean(np.abs(true_v - pred_v) <= 1) * 100),\n",
+    "        \"within_2_vgrades\": float(np.mean(np.abs(true_v - pred_v) <= 2) * 100),\n",
+    "    }\n",
+    "\n",
+    "def metrics_by_board(pred_df: pd.DataFrame) -> pd.DataFrame:\n",
+    "    \"\"\"Compute regression metrics separately for each board in a prediction table.\"\"\"\n",
+    "    rows = []\n",
+    "    for board_key, frame in pred_df.groupby(\"board_key\"):\n",
+    "        metrics = regression_metrics(frame[\"y_true\"].values, frame[\"y_pred\"].values)\n",
+    "        rows.append({\"board_key\": board_key, **metrics})\n",
+    "    return pd.DataFrame(rows)\n",
+    "\n"
    ]
   },
   {
@@ -360,14 +659,22 @@
     "5. **Validate**: Check performance on held-out validation data\n",
     "6. **Early stopping**: Stop if validation loss stops improving\n",
     "\n",
-    "We track both fine-grained metrics (MAE, RMSE) and practical metrics (V-grade accuracy within ±1 grade)."
+    "We track both fine-grained metrics (MAE, RMSE) and practical metrics (V-grade accuracy within ±1 grade).\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "476b158d",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T15:48:59.356313Z",
+     "iopub.status.busy": "2026-06-07T15:48:59.355799Z",
+     "iopub.status.idle": "2026-06-07T19:11:46.644946Z",
+     "shell.execute_reply": "2026-06-07T19:11:46.644060Z"
+    }
+   },
    "outputs": [],
    "source": [
     "history = []\n",
@@ -420,7 +727,8 @@
     "if best_state is not None:\n",
     "    model.load_state_dict(best_state)\n",
     "\n",
-    "print(f\"\\nTraining complete. Best epoch: {best_epoch}, Best val MAE: {best_val_mae:.4f}\")"
+    "print(f\"\\nTraining complete. Best epoch: {best_epoch}, Best val MAE: {best_val_mae:.4f}\")\n",
+    "\n"
    ]
   },
   {
@@ -438,14 +746,22 @@
     "- **Within ±1 difficulty**: Percentage of predictions within 1 point\n",
     "- **Within ±1 V-grade**: Percentage of predictions within 1 V-grade\n",
     "\n",
-    "We also break down performance by board (TB2 vs Kilter) to check for bias."
+    "We also break down performance by board (TB2 vs Kilter) to check for bias.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "9abc3a72",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:11:46.648067Z",
+     "iopub.status.busy": "2026-06-07T19:11:46.647798Z",
+     "iopub.status.idle": "2026-06-07T19:12:05.427217Z",
+     "shell.execute_reply": "2026-06-07T19:12:05.426288Z"
+    }
+   },
    "outputs": [],
    "source": [
     "test_loss, test_pred, test_true, test_uuid, test_board = run_epoch(model, test_loader, device, optimizer=None)\n",
@@ -467,7 +783,60 @@
     "    print(f\"{key:24s}: {value:8.4f}{suffix}\")\n",
     "\n",
     "print(\"\\nBoard-specific test performance:\")\n",
-    "print(board_metrics_df.to_string(index=False))"
+    "print(board_metrics_df.to_string(index=False))\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01c90e93",
+   "metadata": {},
+   "source": [
+    "### JSON output helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3027d982",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:05.430611Z",
+     "iopub.status.busy": "2026-06-07T19:12:05.430084Z",
+     "iopub.status.idle": "2026-06-07T19:12:05.436838Z",
+     "shell.execute_reply": "2026-06-07T19:12:05.436135Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Write JSON artifacts after converting NumPy/pandas values to plain Python values.\n",
+    "def json_safe(obj: Any) -> Any:\n",
+    "    \"\"\"Convert NumPy/pandas values into JSON-serializable Python objects.\"\"\"\n",
+    "    if isinstance(obj, dict):\n",
+    "        return {str(k): json_safe(v) for k, v in obj.items()}\n",
+    "    if isinstance(obj, (list, tuple)):\n",
+    "        return [json_safe(v) for v in obj]\n",
+    "    if isinstance(obj, np.integer):\n",
+    "        return int(obj)\n",
+    "    if isinstance(obj, np.floating):\n",
+    "        if np.isnan(obj):\n",
+    "            return None\n",
+    "        return float(obj)\n",
+    "    if isinstance(obj, np.ndarray):\n",
+    "        return json_safe(obj.tolist())\n",
+    "    try:\n",
+    "        if pd.isna(obj):\n",
+    "            return None\n",
+    "    except Exception:\n",
+    "        pass\n",
+    "    return obj\n",
+    "\n",
+    "def write_json(path: str | Path, payload: Any) -> None:\n",
+    "    \"\"\"Write an object as indented UTF-8 JSON after ``json_safe`` cleanup.\"\"\"\n",
+    "    path = Path(path)\n",
+    "    path.parent.mkdir(parents=True, exist_ok=True)\n",
+    "    path.write_text(json.dumps(json_safe(payload), indent=2), encoding=\"utf-8\")\n",
+    "\n"
    ]
   },
   {
@@ -477,14 +846,22 @@
    "source": [
     "## Save Model and Artifacts\n",
     "\n",
-    "We save the trained model checkpoint and evaluation metrics for use in notebook 04 (route evaluation) and for future inference."
+    "We save the trained model checkpoint and evaluation metrics for use in notebook 04 (route evaluation) and for future inference.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "save_model",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:05.439746Z",
+     "iopub.status.busy": "2026-06-07T19:12:05.439205Z",
+     "iopub.status.idle": "2026-06-07T19:12:05.604325Z",
+     "shell.execute_reply": "2026-06-07T19:12:05.603607Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Save model checkpoint\n",
@@ -520,13 +897,14 @@
     "pred_df.to_csv(OUT_DIR / \"test_predictions.csv\", index=False)\n",
     "board_metrics_df.to_csv(OUT_DIR / \"board_metrics.csv\", index=False)\n",
     "\n",
-    "from climbingboardgpt.utils import write_json\n",
+    "# write_json is defined in the JSON output helper cell above.\n",
     "write_json(OUT_DIR / \"overall_metrics.json\", overall_metrics)\n",
     "\n",
     "print(f\"Saved model checkpoint to: {model_path}\")\n",
     "print(f\"Saved training history to: {OUT_DIR / 'training_history.csv'}\")\n",
     "print(f\"Saved test predictions to: {OUT_DIR / 'test_predictions.csv'}\")\n",
-    "print(f\"Saved board metrics to: {OUT_DIR / 'board_metrics.csv'}\")"
+    "print(f\"Saved board metrics to: {OUT_DIR / 'board_metrics.csv'}\")\n",
+    "\n"
    ]
   },
   {
@@ -542,7 +920,8 @@
     "\n",
     "3. **Joint training across boards**: By training on both TB2 and Kilter data simultaneously, the model can share statistical strength. The board token (`<BOARD_TB2>` vs `<BOARD_KILTER>`) tells it which \"language\" it's operating in.\n",
     "\n",
-    "4. **The gap between fine-grained and grouped metrics**: Being off by 1 difficulty point often stays within the same V-grade bucket. This is why the ±1 V-grade accuracy is much higher than the ±1 difficulty accuracy."
+    "4. **The gap between fine-grained and grouped metrics**: Being off by 1 difficulty point often stays within the same V-grade bucket. This is why the ±1 V-grade accuracy is much higher than the ±1 difficulty accuracy.\n",
+    "\n"
    ]
   }
  ],
@@ -553,8 +932,16 @@
    "name": "python3"
   },
   "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
    "name": "python",
-   "version": "3.14.4"
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.12"
   }
  },
  "nbformat": 4,
diff --git a/notebooks/03_joint_route_generator.ipynb b/notebooks/03_joint_route_generator.ipynb
index c2fe8b8..f49affc 100644
--- a/notebooks/03_joint_route_generator.ipynb
+++ b/notebooks/03_joint_route_generator.ipynb
@@ -43,40 +43,58 @@
     "- `<ANGLE_40>`: At 40 degrees\n",
     "- `<GRADE_V6>`: At V6 difficulty\n",
     "\n",
-    "This is analogous to how ChatGPT uses a system prompt to condition its responses."
+    "This is analogous to how ChatGPT uses a system prompt to condition its responses.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "b6590822",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:14.101804Z",
+     "iopub.status.busy": "2026-06-07T19:12:14.101439Z",
+     "iopub.status.idle": "2026-06-07T19:12:16.162395Z",
+     "shell.execute_reply": "2026-06-07T19:12:16.161684Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "from pathlib import Path\n",
-    "import sys\n",
+    "from __future__ import annotations\n",
+    "\n",
+    "import ast\n",
     "import json\n",
     "import math\n",
+    "import re\n",
+    "from dataclasses import dataclass\n",
+    "from pathlib import Path\n",
+    "from typing import Iterable\n",
+    "\n",
+    "import numpy as np\n",
     "import pandas as pd\n",
     "import torch\n",
-    "from torch.utils.data import DataLoader\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "from torch.utils.data import DataLoader, Dataset\n",
     "\n",
     "ROOT = Path.cwd().resolve()\n",
     "if ROOT.name == \"notebooks\":\n",
-    "    ROOT = ROOT.parent\n",
-    "sys.path.insert(0, str(ROOT / \"src\"))\n",
-    "\n",
-    "from climbingboardgpt.config import load_board_configs\n",
-    "from climbingboardgpt.datasets import RouteGPTDataset\n",
-    "from climbingboardgpt.generation import generate_one\n",
-    "from climbingboardgpt.models import JointRouteGPT"
+    "    ROOT = ROOT.parent"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "f09fdf54",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:16.166017Z",
+     "iopub.status.busy": "2026-06-07T19:12:16.165651Z",
+     "iopub.status.idle": "2026-06-07T19:12:21.900885Z",
+     "shell.execute_reply": "2026-06-07T19:12:21.899985Z"
+    }
+   },
    "outputs": [],
    "source": [
     "TOKENIZED = ROOT / \"data\" / \"processed\" / \"tokenized\"\n",
@@ -92,6 +110,59 @@
     "print(f\"Total routes: {len(df_routes):,}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4fcba532",
+   "metadata": {},
+   "source": [
+    "### Causal dataset helper"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "40021fc1",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:21.904008Z",
+     "iopub.status.busy": "2026-06-07T19:12:21.903750Z",
+     "iopub.status.idle": "2026-06-07T19:12:21.910270Z",
+     "shell.execute_reply": "2026-06-07T19:12:21.909595Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Pad route-token sequences and create shifted input/target pairs for causal modeling.\n",
+    "class RouteGPTDataset(Dataset):\n",
+    "    \"\"\"Dataset for causal next-token route generation.\n",
+    "\n",
+    "    The full sequence is padded once, then split into ``input_ids`` and\n",
+    "    ``target_ids`` shifted by one position for teacher-forced language-model\n",
+    "    training.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(self, df, max_len: int, pad_id: int):\n",
+    "        \"\"\"Store GPT token ID sequences from a tokenized route DataFrame.\"\"\"\n",
+    "        self.ids = df[\"gpt_ids\"].tolist()\n",
+    "        self.max_len = int(max_len)\n",
+    "        self.pad_id = int(pad_id)\n",
+    "\n",
+    "    def __len__(self) -> int:\n",
+    "        \"\"\"Return the number of route examples.\"\"\"\n",
+    "        return len(self.ids)\n",
+    "\n",
+    "    def __getitem__(self, idx: int):\n",
+    "        \"\"\"Return one padded causal-language-model training example.\"\"\"\n",
+    "        ids = list(self.ids[idx])[: self.max_len]\n",
+    "        if len(ids) < self.max_len:\n",
+    "            ids += [self.pad_id] * (self.max_len - len(ids))\n",
+    "\n",
+    "        return {\n",
+    "            \"input_ids\": torch.tensor(ids[:-1], dtype=torch.long),\n",
+    "            \"target_ids\": torch.tensor(ids[1:], dtype=torch.long),\n",
+    "        }"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "fe4b0faf",
@@ -114,14 +185,22 @@
     "\n",
     "For the grade predictor (notebook 02), we excluded the grade because the model needed to predict it. But for the generator, we **include** the grade (`<GRADE_V6>`) in the training data so the model learns the relationship between grade and hold selection.\n",
     "\n",
-    "At generation time, we provide the grade as part of the prompt, and the model generates holds that are appropriate for that grade."
+    "At generation time, we provide the grade as part of the prompt, and the model generates holds that are appropriate for that grade.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "7ad61dbd",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:21.913286Z",
+     "iopub.status.busy": "2026-06-07T19:12:21.912788Z",
+     "iopub.status.idle": "2026-06-07T19:12:25.369590Z",
+     "shell.execute_reply": "2026-06-07T19:12:25.368643Z"
+    }
+   },
    "outputs": [],
    "source": [
     "def encode(tokens):\n",
@@ -153,6 +232,132 @@
     "print(f\"Validation samples: {len(val_ds):,}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "552b9f69",
+   "metadata": {},
+   "source": [
+    "### GPT model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4085a314",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:25.373057Z",
+     "iopub.status.busy": "2026-06-07T19:12:25.372812Z",
+     "iopub.status.idle": "2026-06-07T19:12:25.388342Z",
+     "shell.execute_reply": "2026-06-07T19:12:25.387476Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# GPT-style causal transformer for route generation.\n",
+    "class JointRouteGPT(nn.Module):\n",
+    "    \"\"\"Tiny GPT-style causal transformer for board-conditioned route generation.\n",
+    "\n",
+    "    PyTorch's ``TransformerEncoder`` is used with a causal mask, which makes it\n",
+    "    behave like a decoder-only language model for short route sequences.\n",
+    "\n",
+    "    Why use ``TransformerEncoder`` rather than ``TransformerDecoder``?\n",
+    "    -------------------------------------------------------------------\n",
+    "    PyTorch's ``TransformerDecoderLayer`` expects two inputs: a decoder\n",
+    "    sequence and a separate encoder memory for cross-attention. For\n",
+    "    unconditional or prompt-conditioned generation there is no encoder,\n",
+    "    so ``TransformerDecoderLayer`` would always ignore the second input\n",
+    "    or require a dummy placeholder. Using ``TransformerEncoder`` with a\n",
+    "    causal mask avoids this mismatch, keeps the module list uniform,\n",
+    "    and produces identical behaviour for short autoregressive generation.\n",
+    "\n",
+    "    The trade-off is that ``TransformerEncoder`` does not natively prevent\n",
+    "    attention to future positions — the causal mask must be constructed\n",
+    "    manually (see ``forward``). For the sequence lengths seen here\n",
+    "    (at most ~400 tokens) the overhead of the upper-triangular mask is\n",
+    "    negligible, and ``enable_nested_tensor=False`` is set to avoid SDPA\n",
+    "    optimisations that do not support masked encoders.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        vocab_size: int,\n",
+    "        block_size: int,\n",
+    "        n_embd: int = 128,\n",
+    "        n_head: int = 4,\n",
+    "        n_layer: int = 4,\n",
+    "        dropout: float = 0.10,\n",
+    "        pad_id: int = 0,\n",
+    "    ):\n",
+    "        \"\"\"Create the token/position embeddings, causal blocks, and LM head.\"\"\"\n",
+    "        super().__init__()\n",
+    "        self.vocab_size = vocab_size\n",
+    "        self.block_size = block_size\n",
+    "        self.pad_id = pad_id\n",
+    "\n",
+    "        self.token_emb = nn.Embedding(vocab_size, n_embd, padding_idx=pad_id)\n",
+    "        self.pos_emb = nn.Embedding(block_size, n_embd)\n",
+    "        self.drop = nn.Dropout(dropout)\n",
+    "\n",
+    "        layer = nn.TransformerEncoderLayer(\n",
+    "            d_model=n_embd,\n",
+    "            nhead=n_head,\n",
+    "            dim_feedforward=4 * n_embd,\n",
+    "            dropout=dropout,\n",
+    "            activation=\"gelu\",\n",
+    "            batch_first=True,\n",
+    "            norm_first=True,\n",
+    "        )\n",
+    "        self.blocks = nn.TransformerEncoder(\n",
+    "            layer,\n",
+    "            num_layers=n_layer,\n",
+    "            enable_nested_tensor=False,\n",
+    "        )\n",
+    "        self.ln_f = nn.LayerNorm(n_embd)\n",
+    "        self.lm_head = nn.Linear(n_embd, vocab_size, bias=False)\n",
+    "        self.lm_head.weight = self.token_emb.weight\n",
+    "\n",
+    "    def forward(\n",
+    "        self,\n",
+    "        idx: torch.Tensor,\n",
+    "        targets: torch.Tensor | None = None,\n",
+    "    ) -> tuple[torch.Tensor, torch.Tensor | None]:\n",
+    "        \"\"\"Return next-token logits and, when targets are supplied, CE loss.\"\"\"\n",
+    "        _, seq_len = idx.shape\n",
+    "        if seq_len > self.block_size:\n",
+    "            idx = idx[:, -self.block_size :]\n",
+    "            seq_len = idx.shape[1]\n",
+    "\n",
+    "        positions = torch.arange(seq_len, device=idx.device).unsqueeze(0)\n",
+    "        x = self.drop(self.token_emb(idx) + self.pos_emb(positions))\n",
+    "\n",
+    "        causal_mask = torch.triu(\n",
+    "            torch.ones(seq_len, seq_len, device=idx.device, dtype=torch.bool),\n",
+    "            diagonal=1,\n",
+    "        )\n",
+    "        # Padding masks suppress attention to right-padded context tokens while\n",
+    "        # the causal mask suppresses attention to future positions.\n",
+    "        key_padding_mask = idx.eq(self.pad_id)\n",
+    "\n",
+    "        h = self.blocks(\n",
+    "            x,\n",
+    "            mask=causal_mask,\n",
+    "            src_key_padding_mask=key_padding_mask,\n",
+    "        )\n",
+    "        h = self.ln_f(h)\n",
+    "        logits = self.lm_head(h)\n",
+    "\n",
+    "        loss = None\n",
+    "        if targets is not None:\n",
+    "            loss = F.cross_entropy(\n",
+    "                logits.reshape(-1, logits.size(-1)),\n",
+    "                targets.reshape(-1),\n",
+    "                ignore_index=self.pad_id,\n",
+    "            )\n",
+    "\n",
+    "        return logits, loss"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "66d98641",
@@ -177,21 +382,29 @@
     "- `n_layer=4`: Number of transformer layers (GPT-2 small uses 12)\n",
     "- `dropout=0.10`: Dropout probability\n",
     "\n",
-    "This is intentionally small — we're training on ~40K short sequences, not billions of long documents.\n",
+    "This is intentionally small — we're training on a few hundred thousand short sequences, not billions of long documents.\n",
     "\n",
     "### Weight tying\n",
     "\n",
     "The output projection layer shares weights with the token embedding layer (`self.lm_head.weight = self.token_emb.weight`). This is a common technique that:\n",
     "- Reduces parameter count\n",
     "- Acts as a regularizer\n",
-    "- Is used in GPT-2 and many other language models"
+    "- Is used in GPT-2 and many other language models\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "3eec6f35",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:25.391551Z",
+     "iopub.status.busy": "2026-06-07T19:12:25.391044Z",
+     "iopub.status.idle": "2026-06-07T19:12:27.304182Z",
+     "shell.execute_reply": "2026-06-07T19:12:27.303257Z"
+    }
+   },
    "outputs": [],
    "source": [
     "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
@@ -216,7 +429,14 @@
    "cell_type": "code",
    "execution_count": null,
    "id": "f999cf05",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:27.307681Z",
+     "iopub.status.busy": "2026-06-07T19:12:27.307034Z",
+     "iopub.status.idle": "2026-06-07T19:12:27.314453Z",
+     "shell.execute_reply": "2026-06-07T19:12:27.313491Z"
+    }
+   },
    "outputs": [],
    "source": [
     "def train_epoch():\n",
@@ -275,14 +495,22 @@
     "- A model that picks uniformly from a 1000-token vocab has perplexity = 1000\n",
     "- Good language models on English text achieve perplexity ~15-20\n",
     "\n",
-    "Our vocabulary is ~4000+ tokens, so a perplexity significantly below that indicates the model is learning meaningful patterns."
+    "Our vocabulary is ~4000+ tokens, so a perplexity significantly below that indicates the model is learning meaningful patterns.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "70b38b02",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T19:12:27.317432Z",
+     "iopub.status.busy": "2026-06-07T19:12:27.317008Z",
+     "iopub.status.idle": "2026-06-07T23:43:38.199890Z",
+     "shell.execute_reply": "2026-06-07T23:43:38.198963Z"
+    }
+   },
    "outputs": [],
    "source": [
     "history = []\n",
@@ -335,6 +563,361 @@
     "print(f\"Best validation perplexity: {math.exp(min(best_val_loss, 20)):.1f}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "20096c62",
+   "metadata": {},
+   "source": [
+    "### Board configuration helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8d26d6d4",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:43:38.203421Z",
+     "iopub.status.busy": "2026-06-07T23:43:38.203126Z",
+     "iopub.status.idle": "2026-06-07T23:43:38.217533Z",
+     "shell.execute_reply": "2026-06-07T23:43:38.216653Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Find the project root and load board configuration JSON files.\n",
+    "def find_project_root(start: str | Path | None = None) -> Path:\n",
+    "    \"\"\"Walk upward until the repository root markers are found.\n",
+    "\n",
+    "    The project root is identified by both ``pyproject.toml`` and ``configs``.\n",
+    "    If neither marker pair is found, the resolved starting directory is returned\n",
+    "    so callers still have a deterministic base path.\n",
+    "    \"\"\"\n",
+    "    current = Path(start).resolve() if start is not None else Path.cwd().resolve()\n",
+    "    for candidate in [current, *current.parents]:\n",
+    "        if (candidate / \"pyproject.toml\").exists() and (candidate / \"configs\").exists():\n",
+    "            return candidate\n",
+    "    return current\n",
+    "\n",
+    "@dataclass(frozen=True)\n",
+    "class BoardConfig:\n",
+    "    \"\"\"Configuration for a single climbing board.\n",
+    "    \n",
+    "    This dataclass stores all board-specific settings needed for\n",
+    "    data loading, tokenization, and model training.\n",
+    "    \n",
+    "    Attributes:\n",
+    "        board_key: Short identifier (e.g., \"tb2\", \"kilter\")\n",
+    "        display_name: Human-readable name (e.g., \"Tension Board 2 Mirror\")\n",
+    "        token_prefix: Namespace for hold tokens (e.g., \"TB2\", \"KILTER\")\n",
+    "        db_path: Path to the SQLite database\n",
+    "        layout_id: Which layout in the database to use\n",
+    "        max_angle: Filter out routes steeper than this (None = no filter)\n",
+    "        min_fa_date: Filter out routes first ascended before this date\n",
+    "        placement_y_max: Filter out placements above this Y coordinate\n",
+    "        include_mirror_placement_id: Whether to include mirror info (TB2 only)\n",
+    "        role_definitions: Maps semantic role names to numeric IDs\n",
+    "        boardlib_database_command: Command to download the database\n",
+    "        boardlib_images_command: Command to download board images\n",
+    "        notes: Additional notes about the configuration\n",
+    "    \"\"\"\n",
+    "    board_key: str\n",
+    "    display_name: str\n",
+    "    token_prefix: str\n",
+    "    db_path: Path\n",
+    "    layout_id: int\n",
+    "    max_angle: float | None\n",
+    "    min_fa_date: str | None\n",
+    "    placement_y_max: float | None\n",
+    "    include_mirror_placement_id: bool\n",
+    "    role_definitions: dict[str, int]\n",
+    "    boardlib_database_command: str | None = None\n",
+    "    boardlib_images_command: str | None = None\n",
+    "    notes: tuple[str, ...] = ()\n",
+    "\n",
+    "    @property\n",
+    "    def role_id_to_name(self) -> dict[int, str]:\n",
+    "        \"\"\"Reverse mapping from numeric role IDs to semantic role names.\n",
+    "        \n",
+    "        Example: {5: 'start', 6: 'middle', 7: 'finish', 8: 'foot'} for TB2\n",
+    "        \"\"\"\n",
+    "        return {int(role_id): name for name, role_id in self.role_definitions.items()}\n",
+    "\n",
+    "    @property\n",
+    "    def board_token(self) -> str:\n",
+    "        \"\"\"The special token representing this board.\n",
+    "        \n",
+    "        Example: \"<BOARD_TB2>\" or \"<BOARD_KILTER>\"\n",
+    "        \"\"\"\n",
+    "        return f\"<BOARD_{self.token_prefix}>\"\n",
+    "\n",
+    "    def resolve_db_path(self, project_root: Path | None = None) -> Path:\n",
+    "        \"\"\"Resolve the database path relative to the project root.\n",
+    "        \n",
+    "        If db_path is absolute, return it as-is.\n",
+    "        Otherwise, resolve it relative to the project root.\n",
+    "        \"\"\"\n",
+    "        project_root = project_root or find_project_root()\n",
+    "        return self.db_path if self.db_path.is_absolute() else project_root / self.db_path\n",
+    "\n",
+    "def load_board_config(board_key: str, config_dir: str | Path | None = None) -> BoardConfig:\n",
+    "    \"\"\"Load a single board configuration from a JSON file.\n",
+    "    \n",
+    "    Args:\n",
+    "        board_key: Board identifier (e.g., \"tb2\", \"kilter\")\n",
+    "        config_dir: Directory containing config JSON files\n",
+    "        \n",
+    "    Returns:\n",
+    "        BoardConfig dataclass with all board settings\n",
+    "        \n",
+    "    Raises:\n",
+    "        FileNotFoundError: If the config file doesn't exist\n",
+    "    \"\"\"\n",
+    "    project_root = find_project_root()\n",
+    "    config_dir = Path(config_dir) if config_dir is not None else project_root / \"configs\"\n",
+    "    path = config_dir / f\"{board_key}.json\"\n",
+    "    if not path.exists():\n",
+    "        available = sorted(p.stem for p in config_dir.glob(\"*.json\"))\n",
+    "        raise FileNotFoundError(\n",
+    "            f\"Unknown board config '{board_key}'. Available: {available}\"\n",
+    "        )\n",
+    "\n",
+    "    payload = json.loads(path.read_text(encoding=\"utf-8\"))\n",
+    "    return BoardConfig(\n",
+    "        board_key=str(payload[\"board_key\"]),\n",
+    "        display_name=str(payload[\"display_name\"]),\n",
+    "        token_prefix=str(payload[\"token_prefix\"]),\n",
+    "        db_path=Path(payload[\"db_path\"]),\n",
+    "        layout_id=int(payload[\"layout_id\"]),\n",
+    "        max_angle=None if payload.get(\"max_angle\") is None else float(payload[\"max_angle\"]),\n",
+    "        min_fa_date=payload.get(\"min_fa_date\"),\n",
+    "        placement_y_max=None if payload.get(\"placement_y_max\") is None else float(payload[\"placement_y_max\"]),\n",
+    "        include_mirror_placement_id=bool(payload.get(\"include_mirror_placement_id\", False)),\n",
+    "        role_definitions={str(k): int(v) for k, v in payload[\"role_definitions\"].items()},\n",
+    "        boardlib_database_command=payload.get(\"boardlib_database_command\"),\n",
+    "        boardlib_images_command=payload.get(\"boardlib_images_command\"),\n",
+    "        notes=tuple(payload.get(\"notes\", [])),\n",
+    "    )\n",
+    "\n",
+    "def load_board_configs(board_keys: list[str] | tuple[str, ...]) -> list[BoardConfig]:\n",
+    "    \"\"\"Load multiple board configurations.\n",
+    "    \n",
+    "    Args:\n",
+    "        board_keys: List of board identifiers\n",
+    "        \n",
+    "    Returns:\n",
+    "        List of BoardConfig dataclasses\n",
+    "    \"\"\"\n",
+    "    return [load_board_config(board_key) for board_key in board_keys]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94d352c6",
+   "metadata": {},
+   "source": [
+    "### Generation helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "abdabe8e",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:43:38.220501Z",
+     "iopub.status.busy": "2026-06-07T23:43:38.220197Z",
+     "iopub.status.idle": "2026-06-07T23:43:38.241455Z",
+     "shell.execute_reply": "2026-06-07T23:43:38.240589Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Parse generated hold tokens back into structured hold records.\n",
+    "HOLD_TOKEN_PATTERN = re.compile(r\"^<([A-Z0-9_]+)_p(\\d+)_(start|middle|finish|foot|unknown)>$\")\n",
+    "\n",
+    "def tokens_to_hold_records(tokens: Iterable[str]) -> list[dict[str, object]]:\n",
+    "    \"\"\"Extract hold records from model tokens using the shared hold-token grammar.\"\"\"\n",
+    "    rows: list[dict[str, object]] = []\n",
+    "    for token in tokens:\n",
+    "        match = HOLD_TOKEN_PATTERN.match(str(token))\n",
+    "        if match is None:\n",
+    "            continue\n",
+    "        board_prefix = match.group(1)\n",
+    "        rows.append(\n",
+    "            {\n",
+    "                \"token\": str(token),\n",
+    "                \"board_token_prefix\": board_prefix,\n",
+    "                \"board_prefix\": board_prefix,\n",
+    "                \"placement_id\": int(match.group(2)),\n",
+    "                \"role\": match.group(3),\n",
+    "            }\n",
+    "        )\n",
+    "    return rows\n",
+    "\n",
+    "# Sample routes from the trained GPT model and convert them back to frames strings.\n",
+    "def top_k_filter(logits: torch.Tensor, k: int | None) -> torch.Tensor:\n",
+    "    \"\"\"Mask logits outside the top ``k`` choices for each batch row.\"\"\"\n",
+    "    if k is None or k <= 0 or k >= logits.size(-1):\n",
+    "        return logits\n",
+    "    values, _ = torch.topk(logits, k)\n",
+    "    cutoff = values[:, [-1]]\n",
+    "    return torch.where(logits < cutoff, torch.full_like(logits, -float(\"inf\")), logits)\n",
+    "\n",
+    "@torch.no_grad()\n",
+    "def sample_ids(\n",
+    "    model,\n",
+    "    prompt_ids: list[int],\n",
+    "    device: torch.device,\n",
+    "    max_new_tokens: int = 40,\n",
+    "    temperature: float = 0.9,\n",
+    "    top_k: int | None = 50,\n",
+    "    eos_id: int | None = None,\n",
+    "    forbidden_ids: Iterable[int] | None = None,\n",
+    ") -> list[int]:\n",
+    "    \"\"\"Autoregressively sample token IDs from a trained route generator.\n",
+    "\n",
+    "    The returned list includes the prompt IDs and all sampled IDs up to either\n",
+    "    ``max_new_tokens`` or the first sampled ``eos_id``.\n",
+    "    \"\"\"\n",
+    "    model.eval()\n",
+    "    sequence = torch.tensor([prompt_ids], dtype=torch.long, device=device)\n",
+    "    forbidden_ids = set(forbidden_ids or [])\n",
+    "\n",
+    "    for _ in range(max_new_tokens):\n",
+    "        idx_cond = sequence[:, -model.block_size :]\n",
+    "        logits, _ = model(idx_cond)\n",
+    "        logits = logits[:, -1, :] / max(temperature, 1e-6)\n",
+    "\n",
+    "        # Special tokens like <PAD> and <CLS> are valid vocabulary entries but\n",
+    "        # should never be emitted in the middle of a generated climb.\n",
+    "        for token_id in forbidden_ids:\n",
+    "            logits[:, int(token_id)] = -float(\"inf\")\n",
+    "\n",
+    "        logits = top_k_filter(logits, top_k)\n",
+    "        probs = F.softmax(logits, dim=-1)\n",
+    "        next_id = torch.multinomial(probs, num_samples=1)\n",
+    "        sequence = torch.cat([sequence, next_id], dim=1)\n",
+    "\n",
+    "        if eos_id is not None and int(next_id.item()) == int(eos_id):\n",
+    "            break\n",
+    "\n",
+    "    return sequence[0].detach().cpu().tolist()\n",
+    "\n",
+    "def prompt_tokens(board_prefix: str, angle: int, grouped_v: int) -> list[str]:\n",
+    "    \"\"\"Build the conditioning prefix used before sampling hold tokens.\"\"\"\n",
+    "    return [\n",
+    "        \"<BOS>\",\n",
+    "        f\"<BOARD_{board_prefix}>\",\n",
+    "        f\"<ANGLE_{int(angle)}>\",\n",
+    "        f\"<GRADE_V{int(grouped_v)}>\",\n",
+    "    ]\n",
+    "\n",
+    "def hold_records(tokens: Iterable[str]) -> list[dict[str, object]]:\n",
+    "    \"\"\"Extract hold records from generated tokens.\"\"\"\n",
+    "    return tokens_to_hold_records(tokens)\n",
+    "\n",
+    "def validity_summary(tokens: Iterable[str], requested_board_prefix: str | None = None) -> dict[str, object]:\n",
+    "    \"\"\"Summarize basic structural validity for generated token sequences.\"\"\"\n",
+    "    records = hold_records(tokens)\n",
+    "    placements = [record[\"placement_id\"] for record in records]\n",
+    "    roles = [record[\"role\"] for record in records]\n",
+    "    prefixes = [record[\"board_prefix\"] for record in records]\n",
+    "\n",
+    "    one_board_only = len(set(prefixes)) <= 1\n",
+    "    matches_requested_board = requested_board_prefix is None or all(prefix == requested_board_prefix for prefix in prefixes)\n",
+    "    no_duplicates = len(placements) == len(set(placements))\n",
+    "    has_start = \"start\" in roles\n",
+    "    has_finish = \"finish\" in roles\n",
+    "    enough_holds = len(records) >= 3\n",
+    "\n",
+    "    return {\n",
+    "        \"n_hold_tokens\": len(records),\n",
+    "        \"n_unique_placements\": len(set(placements)),\n",
+    "        \"has_duplicate_placements\": not no_duplicates,\n",
+    "        \"one_board_only\": one_board_only,\n",
+    "        \"matches_requested_board\": matches_requested_board,\n",
+    "        \"has_start\": has_start,\n",
+    "        \"has_middle\": \"middle\" in roles,\n",
+    "        \"has_finish\": has_finish,\n",
+    "        \"n_start\": roles.count(\"start\"),\n",
+    "        \"n_middle\": roles.count(\"middle\"),\n",
+    "        \"n_foot\": roles.count(\"foot\"),\n",
+    "        \"n_finish\": roles.count(\"finish\"),\n",
+    "        \"basic_valid\": bool(one_board_only and matches_requested_board and no_duplicates and has_start and has_finish and enough_holds),\n",
+    "    }\n",
+    "\n",
+    "def generated_tokens_to_frames(tokens: Iterable[str], role_name_to_id: dict[str, int], board_prefix: str | None = None) -> str:\n",
+    "    \"\"\"Convert generated hold tokens back into a frames string.\n",
+    "\n",
+    "    Duplicate placements and unknown roles are skipped, matching the forgiving\n",
+    "    cleanup used by the demo scripts and webapp.\n",
+    "    \"\"\"\n",
+    "    pieces = []\n",
+    "    seen = set()\n",
+    "    for record in hold_records(tokens):\n",
+    "        if board_prefix is not None and str(record[\"board_prefix\"]) != board_prefix:\n",
+    "            continue\n",
+    "        placement_id = int(record[\"placement_id\"])\n",
+    "        role = str(record[\"role\"])\n",
+    "        if placement_id in seen or role not in role_name_to_id:\n",
+    "            continue\n",
+    "        seen.add(placement_id)\n",
+    "        pieces.append(f\"p{placement_id}r{int(role_name_to_id[role])}\")\n",
+    "    return \"\".join(pieces)\n",
+    "\n",
+    "def generate_one(\n",
+    "    model,\n",
+    "    stoi: dict[str, int],\n",
+    "    itos: dict[int, str],\n",
+    "    device: torch.device,\n",
+    "    board_prefix: str,\n",
+    "    angle: int,\n",
+    "    grouped_v: int,\n",
+    "    role_name_to_id: dict[str, int],\n",
+    "    temperature: float = 0.9,\n",
+    "    top_k: int | None = 50,\n",
+    "    max_new_tokens: int = 40,\n",
+    ") -> dict[str, object]:\n",
+    "    \"\"\"Generate one route and return tokens, frames, request metadata, validity.\"\"\"\n",
+    "    unk_id = stoi[\"<UNK>\"]\n",
+    "    eos_id = stoi[\"<EOS>\"]\n",
+    "    forbidden_ids = [\n",
+    "        stoi[\"<PAD>\"],\n",
+    "        stoi[\"<UNK>\"],\n",
+    "        stoi[\"<BOS>\"],\n",
+    "        stoi[\"<CLS>\"],\n",
+    "        stoi[\"<MASK>\"],\n",
+    "    ]\n",
+    "\n",
+    "    prompt = prompt_tokens(board_prefix, angle, grouped_v)\n",
+    "    prompt_ids = [stoi.get(token, unk_id) for token in prompt]\n",
+    "    token_ids = sample_ids(\n",
+    "        model=model,\n",
+    "        prompt_ids=prompt_ids,\n",
+    "        device=device,\n",
+    "        max_new_tokens=max_new_tokens,\n",
+    "        temperature=temperature,\n",
+    "        top_k=top_k,\n",
+    "        eos_id=eos_id,\n",
+    "        forbidden_ids=forbidden_ids,\n",
+    "    )\n",
+    "    tokens = [itos.get(int(idx), \"<UNK>\") for idx in token_ids]\n",
+    "    validity = validity_summary(tokens, requested_board_prefix=board_prefix)\n",
+    "\n",
+    "    return {\n",
+    "        \"requested_board_prefix\": board_prefix,\n",
+    "        \"requested_angle\": int(angle),\n",
+    "        \"requested_grouped_v\": int(grouped_v),\n",
+    "        \"temperature\": float(temperature),\n",
+    "        \"top_k\": None if top_k is None else int(top_k),\n",
+    "        \"tokens\": tokens,\n",
+    "        \"sequence\": \" \".join(tokens),\n",
+    "        \"frames\": generated_tokens_to_frames(tokens, role_name_to_id, board_prefix=board_prefix),\n",
+    "        **validity,\n",
+    "    }"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "69926180",
@@ -356,14 +939,22 @@
     "- **Temperature** (default 0.9): Controls randomness. Lower = more deterministic, higher = more random\n",
     "- **Top-k** (default 50): Only consider the k most likely tokens. This prevents the model from generating very unlikely tokens.\n",
     "\n",
-    "These are the same techniques used in language models like GPT-3 to control output diversity."
+    "These are the same techniques used in language models like GPT-3 to control output diversity.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "029eb911",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:43:38.244254Z",
+     "iopub.status.busy": "2026-06-07T23:43:38.244037Z",
+     "iopub.status.idle": "2026-06-07T23:43:38.680983Z",
+     "shell.execute_reply": "2026-06-07T23:43:38.679992Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Generate sample routes for both boards\n",
@@ -400,14 +991,22 @@
    "source": [
     "## Generate More Routes for Evaluation\n",
     "\n",
-    "Notebook 04 needs a larger set of generated routes for meaningful evaluation. Let's generate routes across multiple angles and grades for both boards."
+    "Notebook 04 needs a larger set of generated routes for meaningful evaluation. Let's generate routes across multiple angles and grades for both boards.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "generate_bulk",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:43:38.684476Z",
+     "iopub.status.busy": "2026-06-07T23:43:38.683935Z",
+     "iopub.status.idle": "2026-06-07T23:43:53.779260Z",
+     "shell.execute_reply": "2026-06-07T23:43:53.778391Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Generate routes across multiple angles and grades for evaluation\n",
@@ -454,14 +1053,22 @@
    "source": [
     "## Save Model and Generated Routes\n",
     "\n",
-    "We save the trained model checkpoint and generated routes for use in notebook 04 (evaluation)."
+    "We save the trained model checkpoint and generated routes for use in notebook 04 (evaluation).\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "save_outputs",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:43:53.782874Z",
+     "iopub.status.busy": "2026-06-07T23:43:53.782303Z",
+     "iopub.status.idle": "2026-06-07T23:43:53.831685Z",
+     "shell.execute_reply": "2026-06-07T23:43:53.830785Z"
+    }
+   },
    "outputs": [],
    "source": [
     "import os\n",
@@ -509,8 +1116,16 @@
    "name": "python3"
   },
   "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
    "name": "python",
-   "version": "3.11"
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.12"
   }
  },
  "nbformat": 4,
diff --git a/notebooks/04_generated_route_evaluation.ipynb b/notebooks/04_generated_route_evaluation.ipynb
index 7bcb8b8..8388a0e 100644
--- a/notebooks/04_generated_route_evaluation.ipynb
+++ b/notebooks/04_generated_route_evaluation.ipynb
@@ -19,8 +19,8 @@
     "### Validity checks\n",
     "\n",
     "A \"basic valid\" route must have:\n",
-    "- At least 3 holds (you need at least 2 hands + 1 foot to climb)\n",
-    "- No duplicate placements (you can't use the same hold twice)\n",
+    "- At least 3 holds\n",
+    "- No duplicate placements\n",
     "- At least one start hold and one finish hold\n",
     "- All holds from the same board (no mixing TB2 and Kilter holds)\n",
     "\n",
@@ -35,46 +35,55 @@
     "- Jaccard similarity = |A intersection B| / |A union B|\n",
     "- Novelty distance = 1 - Jaccard similarity\n",
     "\n",
-    "A novelty distance of 1.0 means the generated route shares no holds with any real route. A distance of 0.0 means it's identical to an existing route."
+    "A novelty distance of 1.0 means the generated route shares no holds with any real route. A distance of 0.0 means it's identical to an existing route.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "726b846f",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:44:02.200057Z",
+     "iopub.status.busy": "2026-06-07T23:44:02.199717Z",
+     "iopub.status.idle": "2026-06-07T23:44:04.626359Z",
+     "shell.execute_reply": "2026-06-07T23:44:04.625624Z"
+    }
+   },
    "outputs": [],
    "source": [
+    "from __future__ import annotations\n",
+    "\n",
+    "import ast\n",
+    "import re\n",
     "from pathlib import Path\n",
-    "import sys\n",
+    "from typing import Iterable\n",
+    "\n",
     "import numpy as np\n",
     "import pandas as pd\n",
     "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "from scipy.spatial.distance import pdist\n",
     "\n",
     "ROOT = Path.cwd().resolve()\n",
     "if ROOT.name == \"notebooks\":\n",
-    "    ROOT = ROOT.parent\n",
-    "sys.path.insert(0, str(ROOT / \"src\"))\n",
-    "\n",
-    "from climbingboardgpt.evaluation import (\n",
-    "    build_placement_coords,\n",
-    "    frames_to_holds,\n",
-    "    holds_to_placement_set,\n",
-    "    nearest_real_route_same_board,\n",
-    "    parse_token_list,\n",
-    "    simple_route_features,\n",
-    "    tokens_to_hold_records,\n",
-    "    validity_from_records,\n",
-    ")\n",
-    "from climbingboardgpt.grades import to_grouped_v\n",
-    "from climbingboardgpt.models import JointRouteTransformerRegressor"
+    "    ROOT = ROOT.parent"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "7f8bb61f",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:44:04.629832Z",
+     "iopub.status.busy": "2026-06-07T23:44:04.629390Z",
+     "iopub.status.idle": "2026-06-07T23:44:10.364160Z",
+     "shell.execute_reply": "2026-06-07T23:44:10.363335Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Load generated routes and real routes for comparison\n",
@@ -111,6 +120,107 @@
     "print(f\"Real routes: {len(df_real):,}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "6dc0ac67",
+   "metadata": {},
+   "source": [
+    "### Token parsing and validity helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c32f7ced",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:44:10.368243Z",
+     "iopub.status.busy": "2026-06-07T23:44:10.367603Z",
+     "iopub.status.idle": "2026-06-07T23:44:10.380028Z",
+     "shell.execute_reply": "2026-06-07T23:44:10.379312Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Parse generated token strings and compute basic route-validity flags.\n",
+    "HOLD_TOKEN_PATTERN = re.compile(r\"^<([A-Z0-9_]+)_p(\\d+)_(start|middle|finish|foot|unknown)>$\")\n",
+    "\n",
+    "def parse_tokens(value) -> list[str]:\n",
+    "    \"\"\"Parse tokens from a list, repr-style list string, or whitespace sequence.\"\"\"\n",
+    "    if isinstance(value, list):\n",
+    "        return [str(v) for v in value]\n",
+    "    if not isinstance(value, str):\n",
+    "        return []\n",
+    "\n",
+    "    try:\n",
+    "        parsed = ast.literal_eval(value)\n",
+    "        if isinstance(parsed, list):\n",
+    "            return [str(v) for v in parsed]\n",
+    "    except (SyntaxError, ValueError):\n",
+    "        pass\n",
+    "\n",
+    "    return value.split()\n",
+    "\n",
+    "def tokens_to_hold_records(tokens: Iterable[str]) -> list[dict[str, object]]:\n",
+    "    \"\"\"Extract hold records from model tokens using the shared hold-token grammar.\"\"\"\n",
+    "    rows: list[dict[str, object]] = []\n",
+    "    for token in tokens:\n",
+    "        match = HOLD_TOKEN_PATTERN.match(str(token))\n",
+    "        if match is None:\n",
+    "            continue\n",
+    "        board_prefix = match.group(1)\n",
+    "        rows.append(\n",
+    "            {\n",
+    "                \"token\": str(token),\n",
+    "                \"board_token_prefix\": board_prefix,\n",
+    "                \"board_prefix\": board_prefix,\n",
+    "                \"placement_id\": int(match.group(2)),\n",
+    "                \"role\": match.group(3),\n",
+    "            }\n",
+    "        )\n",
+    "    return rows\n",
+    "\n",
+    "def parse_token_list(value) -> list[str]:\n",
+    "    \"\"\"Compatibility wrapper around the shared token parser.\"\"\"\n",
+    "    return parse_tokens(value)\n",
+    "\n",
+    "def validity_from_records(records: list[dict[str, object]], requested_board_prefix: str | None = None) -> dict[str, object]:\n",
+    "    \"\"\"Compute evaluation-specific route-validity flags from hold records.\"\"\"\n",
+    "    placements = [int(record[\"placement_id\"]) for record in records]\n",
+    "    roles = [str(record[\"role\"]) for record in records]\n",
+    "    prefixes = [str(record[\"board_token_prefix\"]) for record in records]\n",
+    "    one_board_only = len(set(prefixes)) <= 1\n",
+    "    matches_requested_board = requested_board_prefix is None or all(prefix == requested_board_prefix for prefix in prefixes)\n",
+    "\n",
+    "    out = {\n",
+    "        \"n_holds_eval\": len(records),\n",
+    "        \"n_unique_placements_eval\": len(set(placements)),\n",
+    "        \"has_duplicate_placements_eval\": len(records) != len(set(placements)),\n",
+    "        \"one_board_only_eval\": one_board_only,\n",
+    "        \"matches_requested_board_eval\": matches_requested_board,\n",
+    "        \"n_start_eval\": roles.count(\"start\"),\n",
+    "        \"n_middle_eval\": roles.count(\"middle\"),\n",
+    "        \"n_foot_eval\": roles.count(\"foot\"),\n",
+    "        \"n_finish_eval\": roles.count(\"finish\"),\n",
+    "        \"has_start_eval\": \"start\" in roles,\n",
+    "        \"has_middle_eval\": \"middle\" in roles,\n",
+    "        \"has_finish_eval\": \"finish\" in roles,\n",
+    "    }\n",
+    "    out[\"basic_valid_eval\"] = (\n",
+    "        one_board_only\n",
+    "        and out[\"n_holds_eval\"] >= 3\n",
+    "        and out[\"n_holds_eval\"] == out[\"n_unique_placements_eval\"]\n",
+    "        and out[\"has_start_eval\"]\n",
+    "        and out[\"has_finish_eval\"]\n",
+    "    )\n",
+    "    out[\"strict_valid_eval\"] = (\n",
+    "        out[\"basic_valid_eval\"]\n",
+    "        and out[\"has_middle_eval\"]\n",
+    "        and out[\"n_holds_eval\"] >= 4\n",
+    "    )\n",
+    "    return out"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "0091bafb",
@@ -118,14 +228,22 @@
    "source": [
     "## Parse generated tokens and check validity\n",
     "\n",
-    "We parse the generated token sequences and check each route for validity."
+    "We parse the generated token sequences and check each route for validity.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "f5c2b25a",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:44:10.383121Z",
+     "iopub.status.busy": "2026-06-07T23:44:10.382759Z",
+     "iopub.status.idle": "2026-06-07T23:44:10.430410Z",
+     "shell.execute_reply": "2026-06-07T23:44:10.429741Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Parse the token strings into structured records\n",
@@ -149,6 +267,89 @@
     "print(validity_summary)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "0ff31b72",
+   "metadata": {},
+   "source": [
+    "### Novelty helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10a40f48",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:44:10.434117Z",
+     "iopub.status.busy": "2026-06-07T23:44:10.433720Z",
+     "iopub.status.idle": "2026-06-07T23:44:10.443045Z",
+     "shell.execute_reply": "2026-06-07T23:44:10.442319Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Compare generated hold sets to real routes using Jaccard similarity.\n",
+    "def frames_to_holds(frames: str | None) -> list[tuple[int, int]]:\n",
+    "    \"\"\"Parse a frames string into ``(placement_id, role_id)`` pairs.\"\"\"\n",
+    "    if not isinstance(frames, str):\n",
+    "        return []\n",
+    "    return [(int(p), int(r)) for p, r in re.findall(r\"p(\\d+)r(\\d+)\", frames)]\n",
+    "\n",
+    "def holds_to_placement_set(holds: Iterable[tuple[int, int]]) -> frozenset[int]:\n",
+    "    \"\"\"Drop role IDs and represent a route by its unique placement IDs.\"\"\"\n",
+    "    return frozenset(int(placement_id) for placement_id, _ in holds)\n",
+    "\n",
+    "def jaccard(a: frozenset[int], b: frozenset[int]) -> float:\n",
+    "    \"\"\"Return Jaccard similarity between two placement sets.\"\"\"\n",
+    "    if not a and not b:\n",
+    "        return 1.0\n",
+    "    if not a or not b:\n",
+    "        return 0.0\n",
+    "    return len(a & b) / len(a | b)\n",
+    "\n",
+    "def nearest_real_route_same_board(\n",
+    "    generated_set: frozenset[int],\n",
+    "    generated_board_key: str,\n",
+    "    real_df: pd.DataFrame,\n",
+    ") -> dict[str, object]:\n",
+    "    \"\"\"Find the most similar real route on the same board by Jaccard score.\n",
+    "\n",
+    "    .. note::\n",
+    "\n",
+    "       This function performs an O(n) linear scan over all real routes for\n",
+    "       the matching board, computing a Jaccard similarity for each one. With\n",
+    "       ~256K training examples, evaluating 400 generated routes costs roughly\n",
+    "       O(100M) Jaccard comparisons. This is acceptable for evaluation scripts\n",
+    "       but would not scale to a real-time or high-throughput setting without\n",
+    "       an approximate nearest-neighbour index.\n",
+    "    \"\"\"\n",
+    "    board_frame = real_df[real_df[\"board_key\"] == generated_board_key]\n",
+    "    if board_frame.empty:\n",
+    "        return {\n",
+    "            \"nearest_real_jaccard\": np.nan,\n",
+    "            \"nearest_real_uuid\": None,\n",
+    "            \"nearest_real_name\": None,\n",
+    "            \"nearest_real_grouped_v\": None,\n",
+    "            \"nearest_real_angle\": None,\n",
+    "            \"novelty_distance\": np.nan,\n",
+    "        }\n",
+    "\n",
+    "    similarities = board_frame[\"hold_set\"].map(lambda hold_set: jaccard(generated_set, hold_set))\n",
+    "    best_idx = similarities.idxmax()\n",
+    "    row = board_frame.loc[best_idx]\n",
+    "\n",
+    "    nearest_real_jaccard = float(similarities.loc[best_idx])\n",
+    "    return {\n",
+    "        \"nearest_real_jaccard\": nearest_real_jaccard,\n",
+    "        \"nearest_real_uuid\": row[\"uuid\"],\n",
+    "        \"nearest_real_name\": row[\"climb_name\"],\n",
+    "        \"nearest_real_grouped_v\": row[\"grouped_v\"],\n",
+    "        \"nearest_real_angle\": row[\"angle\"],\n",
+    "        \"novelty_distance\": 1.0 - nearest_real_jaccard,\n",
+    "    }"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "0cf2170e",
@@ -156,14 +357,22 @@
    "source": [
     "## Novelty against real climbs\n",
     "\n",
-    "For each generated route, we find the most similar real route from the same board (by Jaccard similarity of hold sets). A good generator should produce routes that are novel (low Jaccard similarity to existing routes) while still being valid."
+    "For each generated route, we find the most similar real route from the same board (by Jaccard similarity of hold sets). A good generator should produce routes that are novel (low Jaccard similarity to existing routes) while still being valid.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "e7f34524",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:44:10.446422Z",
+     "iopub.status.busy": "2026-06-07T23:44:10.445998Z",
+     "iopub.status.idle": "2026-06-07T23:46:41.914124Z",
+     "shell.execute_reply": "2026-06-07T23:46:41.913292Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Convert hold sets to frozensets for fast comparison\n",
@@ -201,6 +410,105 @@
     "print(novelty_summary)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ad70ff4c",
+   "metadata": {},
+   "source": [
+    "### Geometry helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "85ddaf53",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:46:41.918125Z",
+     "iopub.status.busy": "2026-06-07T23:46:41.917658Z",
+     "iopub.status.idle": "2026-06-07T23:46:41.929570Z",
+     "shell.execute_reply": "2026-06-07T23:46:41.928790Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Compute simple geometric descriptors from placement coordinates.\n",
+    "def build_placement_coords(df_token_meta: pd.DataFrame) -> dict[tuple[str, int], dict[str, float]]:\n",
+    "    \"\"\"Build a placement-coordinate lookup from token metadata.\"\"\"\n",
+    "    hold_meta = df_token_meta[df_token_meta[\"kind\"] == \"hold\"].dropna(subset=[\"placement_id\"]).copy()\n",
+    "    coords = {}\n",
+    "    for _, row in hold_meta.drop_duplicates([\"board_key\", \"placement_id\"]).iterrows():\n",
+    "        key = (str(row[\"board_key\"]), int(row[\"placement_id\"]))\n",
+    "        coords[key] = {\n",
+    "            \"x\": float(row[\"x\"]),\n",
+    "            \"y\": float(row[\"y\"]),\n",
+    "        }\n",
+    "    return coords\n",
+    "\n",
+    "def simple_route_features(\n",
+    "    board_key: str,\n",
+    "    records: list[dict[str, object]],\n",
+    "    placement_coords: dict[tuple[str, int], dict[str, float]],\n",
+    ") -> dict[str, float]:\n",
+    "    \"\"\"Compute simple geometric route features from hold coordinates.\n",
+    "\n",
+    "    These features are descriptive rather than a full climbing-physics model:\n",
+    "    height/width describe route spread, and hand-reach distances summarize the\n",
+    "    pairwise spacing among start/middle/finish holds.\n",
+    "    \"\"\"\n",
+    "    rows = []\n",
+    "    for record in records:\n",
+    "        key = (str(board_key), int(record[\"placement_id\"]))\n",
+    "        coord = placement_coords.get(key)\n",
+    "        if coord is None:\n",
+    "            continue\n",
+    "        x = float(coord[\"x\"])\n",
+    "        y = float(coord[\"y\"])\n",
+    "        if np.isnan(x) or np.isnan(y):\n",
+    "            continue\n",
+    "        role = str(record[\"role\"])\n",
+    "        rows.append(\n",
+    "            {\n",
+    "                \"x\": x,\n",
+    "                \"y\": y,\n",
+    "                \"role\": role,\n",
+    "                \"is_hand\": role in {\"start\", \"middle\", \"finish\"},\n",
+    "                \"is_foot\": role == \"foot\",\n",
+    "            }\n",
+    "        )\n",
+    "\n",
+    "    if not rows:\n",
+    "        return {\n",
+    "            \"geom_n_holds\": 0.0,\n",
+    "            \"geom_height\": np.nan,\n",
+    "            \"geom_width\": np.nan,\n",
+    "            \"geom_mean_y\": np.nan,\n",
+    "            \"geom_mean_x_abs\": np.nan,\n",
+    "            \"geom_mean_hand_reach\": np.nan,\n",
+    "            \"geom_max_hand_reach\": np.nan,\n",
+    "        }\n",
+    "\n",
+    "    d = pd.DataFrame(rows)\n",
+    "    out = {\n",
+    "        \"geom_n_holds\": float(len(d)),\n",
+    "        \"geom_height\": float(d[\"y\"].max() - d[\"y\"].min()),\n",
+    "        \"geom_width\": float(d[\"x\"].max() - d[\"x\"].min()),\n",
+    "        \"geom_mean_y\": float(d[\"y\"].mean()),\n",
+    "        \"geom_mean_x_abs\": float(d[\"x\"].abs().mean()),\n",
+    "    }\n",
+    "\n",
+    "    hands = d[d[\"is_hand\"]].sort_values([\"y\", \"x\"])\n",
+    "    if len(hands) >= 2:\n",
+    "        distances = pdist(hands[[\"x\", \"y\"]].values)\n",
+    "        out[\"geom_mean_hand_reach\"] = float(distances.mean())\n",
+    "        out[\"geom_max_hand_reach\"] = float(distances.max())\n",
+    "    else:\n",
+    "        out[\"geom_mean_hand_reach\"] = np.nan\n",
+    "        out[\"geom_max_hand_reach\"] = np.nan\n",
+    "\n",
+    "    return out"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "b581705d",
@@ -215,14 +523,22 @@
     "- `geom_width`: Horizontal extent\n",
     "- `geom_mean_hand_reach`: Average distance between hand holds\n",
     "\n",
-    "These features help us understand whether generated routes have reasonable spatial properties."
+    "These features help us understand whether generated routes have reasonable spatial properties.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "d74d4cad",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:46:41.932520Z",
+     "iopub.status.busy": "2026-06-07T23:46:41.932262Z",
+     "iopub.status.idle": "2026-06-07T23:46:42.775565Z",
+     "shell.execute_reply": "2026-06-07T23:46:42.774476Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Build coordinate lookup from token metadata\n",
@@ -252,6 +568,134 @@
     "print(geom_summary)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "44036a1e",
+   "metadata": {},
+   "source": [
+    "### Critic model and grade helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9cfff1f4",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:46:42.779195Z",
+     "iopub.status.busy": "2026-06-07T23:46:42.778895Z",
+     "iopub.status.idle": "2026-06-07T23:46:42.791727Z",
+     "shell.execute_reply": "2026-06-07T23:46:42.790706Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Map BoardLib display difficulties into grouped V-grade tokens.\n",
+    "GRADE_TO_V = {\n",
+    "    10: 0, 11: 0, 12: 0,\n",
+    "    13: 1, 14: 1,\n",
+    "    15: 2,\n",
+    "    16: 3, 17: 3,\n",
+    "    18: 4, 19: 4,\n",
+    "    20: 5, 21: 5,\n",
+    "    22: 6,\n",
+    "    23: 7,\n",
+    "    24: 8, 25: 8,\n",
+    "    26: 9,\n",
+    "    27: 10,\n",
+    "    28: 11,\n",
+    "    29: 12,\n",
+    "    30: 13,\n",
+    "    31: 14,\n",
+    "    32: 15,\n",
+    "    33: 16,\n",
+    "}\n",
+    "\n",
+    "def to_grouped_v(display_difficulty: float) -> int:\n",
+    "    \"\"\"Map a continuous display difficulty to the nearest grouped V grade.\"\"\"\n",
+    "    rounded = int(round(float(display_difficulty)))\n",
+    "    rounded = max(min(rounded, max(GRADE_TO_V)), min(GRADE_TO_V))\n",
+    "    return GRADE_TO_V[rounded]\n",
+    "\n",
+    "def grade_token(display_difficulty: float) -> str:\n",
+    "    \"\"\"Return the grade-conditioning token for a display difficulty value.\"\"\"\n",
+    "    return f\"<GRADE_V{to_grouped_v(display_difficulty)}>\"\n",
+    "\n",
+    "# Transformer encoder used as a continuous grade regressor.\n",
+    "class JointRouteTransformerRegressor(nn.Module):\n",
+    "    \"\"\"Transformer encoder for joint TB2/Kilter route difficulty prediction.\n",
+    "\n",
+    "    Inputs are token IDs plus an attention mask. Token, position, and learned\n",
+    "    projections of coordinate metadata are added before the encoder. The first\n",
+    "    ``<CLS>`` position is then used as a pooled route representation for scalar\n",
+    "    difficulty regression.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        vocab_size: int,\n",
+    "        max_len: int,\n",
+    "        coord_features: torch.Tensor,\n",
+    "        d_model: int = 128,\n",
+    "        nhead: int = 4,\n",
+    "        num_layers: int = 4,\n",
+    "        dim_feedforward: int = 256,\n",
+    "        dropout: float = 0.10,\n",
+    "        pad_id: int = 0,\n",
+    "    ):\n",
+    "        \"\"\"Create the encoder, coordinate projection, and regression head.\"\"\"\n",
+    "        super().__init__()\n",
+    "        self.vocab_size = vocab_size\n",
+    "        self.max_len = max_len\n",
+    "        self.d_model = d_model\n",
+    "        self.pad_id = pad_id\n",
+    "\n",
+    "        self.token_emb = nn.Embedding(vocab_size, d_model, padding_idx=pad_id)\n",
+    "        self.pos_emb = nn.Embedding(max_len, d_model)\n",
+    "\n",
+    "        self.register_buffer(\"coord_features\", coord_features.clone().float())\n",
+    "        self.coord_proj = nn.Linear(coord_features.shape[1], d_model)\n",
+    "\n",
+    "        encoder_layer = nn.TransformerEncoderLayer(\n",
+    "            d_model=d_model,\n",
+    "            nhead=nhead,\n",
+    "            dim_feedforward=dim_feedforward,\n",
+    "            dropout=dropout,\n",
+    "            activation=\"gelu\",\n",
+    "            batch_first=True,\n",
+    "            norm_first=True,\n",
+    "        )\n",
+    "        self.encoder = nn.TransformerEncoder(\n",
+    "            encoder_layer,\n",
+    "            num_layers=num_layers,\n",
+    "            enable_nested_tensor=False,\n",
+    "        )\n",
+    "        self.norm = nn.LayerNorm(d_model)\n",
+    "        self.head = nn.Sequential(\n",
+    "            nn.Linear(d_model, d_model),\n",
+    "            nn.GELU(),\n",
+    "            nn.Dropout(dropout),\n",
+    "            nn.Linear(d_model, 1),\n",
+    "        )\n",
+    "\n",
+    "    def forward(self, input_ids: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:\n",
+    "        \"\"\"Return one continuous difficulty prediction per input sequence.\"\"\"\n",
+    "        batch_size, seq_len = input_ids.shape\n",
+    "        positions = torch.arange(seq_len, device=input_ids.device).unsqueeze(0).expand(batch_size, seq_len)\n",
+    "\n",
+    "        # Coordinate features are indexed by token ID, so every occurrence of a\n",
+    "        # hold token gets the same physical x/y hint wherever it appears.\n",
+    "        x = self.token_emb(input_ids) + self.pos_emb(positions)\n",
+    "        x = x + self.coord_proj(self.coord_features[input_ids])\n",
+    "\n",
+    "        key_padding_mask = ~attention_mask.bool()\n",
+    "        h = self.encoder(x, src_key_padding_mask=key_padding_mask)\n",
+    "        h = self.norm(h)\n",
+    "\n",
+    "        cls_state = h[:, 0, :]\n",
+    "        return self.head(cls_state).squeeze(-1)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "4455557a",
@@ -259,16 +703,21 @@
    "source": [
     "## Grade consistency (using the trained critic)\n",
     "\n",
-    "If we have a trained grade predictor (from notebook 02), we can use it as a **critic** to check whether generated routes have grades consistent with what was requested.\n",
-    "\n",
-    "This is similar to how GANs use a discriminator to evaluate generated samples, except our critic is a regression model rather than a binary classifier."
+    "If we have a trained grade predictor (from notebook 02), we can use it as a **critic** to check whether generated routes have grades consistent with what was requested.\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "88747d6e",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:46:42.795099Z",
+     "iopub.status.busy": "2026-06-07T23:46:42.794788Z",
+     "iopub.status.idle": "2026-06-07T23:46:43.323348Z",
+     "shell.execute_reply": "2026-06-07T23:46:43.321923Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Try to load the grade critic from notebook 02\n",
@@ -355,7 +804,14 @@
    "cell_type": "code",
    "execution_count": null,
    "id": "critic_eval",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:46:43.327454Z",
+     "iopub.status.busy": "2026-06-07T23:46:43.326834Z",
+     "iopub.status.idle": "2026-06-07T23:46:44.473105Z",
+     "shell.execute_reply": "2026-06-07T23:46:44.472309Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Apply the critic to evaluate grade consistency\n",
@@ -390,14 +846,22 @@
     "- **Basic validity** (required): At least 3 holds, start/finish, no duplicates, one board\n",
     "- **Strict validity** (bonus): Also has middle holds and 4+ holds\n",
     "- **Novelty** (higher is better): Distance from nearest real route\n",
-    "- **Grade consistency** (if critic available): Predicted grade close to requested grade"
+    "- **Grade consistency** (if critic available): Predicted grade close to requested grade\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "88747d6e2",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:46:44.476183Z",
+     "iopub.status.busy": "2026-06-07T23:46:44.475814Z",
+     "iopub.status.idle": "2026-06-07T23:46:44.489525Z",
+     "shell.execute_reply": "2026-06-07T23:46:44.488845Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Rank candidates by composite score\n",
@@ -427,14 +891,22 @@
    "source": [
     "## Save evaluation results\n",
     "\n",
-    "We save the full evaluation DataFrame and the top candidates for further analysis."
+    "We save the full evaluation DataFrame and the top candidates for further analysis.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "save_results",
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-06-07T23:46:44.492676Z",
+     "iopub.status.busy": "2026-06-07T23:46:44.492218Z",
+     "iopub.status.idle": "2026-06-07T23:46:44.561651Z",
+     "shell.execute_reply": "2026-06-07T23:46:44.560880Z"
+    }
+   },
    "outputs": [],
    "source": [
     "# Save evaluation results\n",
@@ -485,7 +957,8 @@
     "\n",
     "- Validity checks are structural, not semantic. A route might have valid start/finish holds but still be impossible.\n",
     "- Geometric features are simple. More sophisticated analysis could check reachability and move sequences.\n",
-    "- The critic model was trained on real data, so it may not generalize well to novel route structures."
+    "- The critic model was trained on real data, so it may not generalize well to novel route structures.\n",
+    "\n"
    ]
   }
  ],
@@ -505,7 +978,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.14.4"
+   "version": "3.12.12"
   }
  },
  "nbformat": 4,