20226-05-22 minor update

This commit is contained in:
Pawel
2026-05-22 14:22:54 -04:00
parent 1089793165
commit ecc4723013
6 changed files with 20 additions and 5 deletions

View File

@@ -9,7 +9,7 @@
"\n",
"## What is tokenization and why does it matter?\n",
"\n",
"In natural language processing, **tokenization** is the process of converting raw text into a sequence of discrete symbols (tokens) that a model can process. For example, the sentence \"I love climbing\" might be tokenized as `[\"I\", \" love\", \" climbing\"]` using a subword tokenizer like BPE.\n",
"In natural language processing, **tokenization** is the process of converting raw text into a sequence of discrete symbols (tokens) that a model can process. For example, the sentence \"I climb rocks\" might be tokenized as `[\"I\", \" climb\", \" rocks\"]` using a subword tokenizer like BPE.\n",
"\n",
"For climbing board routes, we face an analogous problem: how do we convert a climb — which is fundamentally a *set of holds at specific positions with specific roles* — into a sequence of tokens that a transformer can learn from?\n",
"\n",