20226-05-22 minor update
This commit is contained in:
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
"## What is tokenization and why does it matter?\n",
|
||||
"\n",
|
||||
"In natural language processing, **tokenization** is the process of converting raw text into a sequence of discrete symbols (tokens) that a model can process. For example, the sentence \"I love climbing\" might be tokenized as `[\"I\", \" love\", \" climbing\"]` using a subword tokenizer like BPE.\n",
|
||||
"In natural language processing, **tokenization** is the process of converting raw text into a sequence of discrete symbols (tokens) that a model can process. For example, the sentence \"I climb rocks\" might be tokenized as `[\"I\", \" climb\", \" rocks\"]` using a subword tokenizer like BPE.\n",
|
||||
"\n",
|
||||
"For climbing board routes, we face an analogous problem: how do we convert a climb — which is fundamentally a *set of holds at specific positions with specific roles* — into a sequence of tokens that a transformer can learn from?\n",
|
||||
"\n",
|
||||
|
||||
Reference in New Issue
Block a user