Updated docs + added TTS backends

2026-06-03 14:01:18 -04:00
parent 226fecbe71
commit c923f90a75
14 changed files with 916 additions and 36 deletions
@@ -18,6 +18,7 @@ The name is a coined Japanese compound from `採` as in gathering/collecting and
 - [Anki](https://apps.ankiweb.net/) with [AnkiConnect](https://github.com/amikey/anki-connect)
 - `ffmpeg`
 - Python dependencies from `requirements.txt`
+- Optional extra TTS backend tools: `piper`, `espeak-ng`, and `kokoro-onnx`.
 - spaCy models for word mining:

 ```shell
@@ -35,6 +36,62 @@ pip install -r requirements.txt
 sudo dnf install ffmpeg
 ```

+### Optional TTS Backends
+
+The default `edge-tts` backend is installed by `requirements.txt`. Install only
+the optional pieces you plan to test:
+
+```shell
+# Python-backed optional engines: piper, kokoro.
+pip install -r requirements-tts.txt
+
+# System package for espeak-ng.
+sudo dnf install espeak-ng
+```
+
+Other package-manager names:
+
+```shell
+sudo apt-get install espeak-ng
+sudo pacman -S espeak-ng
+```
+
+Backend notes:
+
+- `edge-tts`: installed by `pip install edge-tts`; no API key, but it uses
+  Microsoft Edge's online TTS service.
+- `gtts`: installed by `requirements.txt`; no API key, but it uses Google's
+  online TTS service through `gtts-cli`.
+- `piper`: installed by `pip install piper-tts`; you still need a compatible
+  `.onnx` voice model, usually with its matching `.onnx.json` config file.
+- `espeak-ng`: installed through your OS package manager, not pip.
+- `kokoro`: installed by `pip install kokoro-onnx soundfile`; you still need
+  `kokoro-v1.0.onnx` and `voices-v1.0.bin`, plus any language-specific G2P
+  setup required by your Kokoro release.
+
+Example model downloads for the README smoke tests:
+
+```shell
+mkdir -p ~/.local/share/saiki/models
+
+# Piper Spanish voice model plus matching config.
+wget -O ~/.local/share/saiki/models/es_ES-davefx-medium.onnx \
+  https://huggingface.co/rhasspy/piper-voices/resolve/main/es/es_ES/davefx/medium/es_ES-davefx-medium.onnx
+wget -O ~/.local/share/saiki/models/es_ES-davefx-medium.onnx.json \
+  https://huggingface.co/rhasspy/piper-voices/resolve/main/es/es_ES/davefx/medium/es_ES-davefx-medium.onnx.json
+
+# Kokoro ONNX model plus voices bundle.
+wget -O ~/.local/share/saiki/models/kokoro-v1.0.onnx \
+  https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
+wget -O ~/.local/share/saiki/models/voices-v1.0.bin \
+  https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
+```
+
+Saiki's default `tts_model_dir` is `~/.local/share/saiki/models`. Relative
+model paths such as `es_ES-davefx-medium.onnx` are resolved under that
+directory. You can override it in YAML with `tts_model_dir` or for one command
+with `--tts-model-dir`.
+
 ## Configuration

 Defaults are built in, but you can override them with YAML:
@@ -57,6 +114,7 @@ media_dir: ~/.var/app/net.ankiweb.Anki/data/Anki2/User 1/collection.media
 audio_output_root: ~/Languages/Anki/anki-audio
 word_output_root: ~/Languages/Anki/anki-words
 sentence_dir: ~/Languages/Anki
+tts_model_dir: ~/.local/share/saiki/models
 note_model: Basic
 fields:
  front: Front
@@ -65,8 +123,8 @@ languages:
  jp:
    name: japanese
    transcript_code: ja
-    tts_code: ja
-    tts_tld: com
+    tts_backend: edge-tts
+    tts_voice: ja-JP-NanamiNeural
    tts_tempo: 1.35
    decks: ["日本語"]
    field: Back
@@ -75,8 +133,8 @@ languages:
  es:
    name: spanish
    transcript_code: es
-    tts_code: es
-    tts_tld: es
+    tts_backend: edge-tts
+    tts_voice: es-ES-ElviraNeural
    tts_tempo: 1.25
    decks: ["Español"]
    field: Back
@@ -174,12 +232,59 @@ Generate TTS audio and add sentence cards to Anki.
 ./saiki.py import es
 ./saiki.py import jp ~/Languages/Anki/sentences_jp.txt
 ./saiki.py import es youtube.tsv --tags youtube,manual
+./saiki.py import es --tts-voice es-MX-DaliaNeural
 ```

 The importer accepts plain text sentence files and TSV/CSV files with a
 `sentence` column. `text-to-speech` is always added as a tag. If `--tags` is not
 provided, `AI-generated` is added.

+TTS is configured per language with `tts_backend`. Supported backends are:
+
+- `edge-tts`: default backend using Microsoft Edge neural voices; configure
+  `tts_voice`.
+- `gtts`: free backend using `gtts-cli`; configure `tts_code` and
+  `tts_tld`.
+- `piper`: local/offline neural TTS; configure `tts_model` with a model path.
+  The stock Piper catalog includes Spanish voices, but not Japanese.
+- `espeak-ng`: local/offline lightweight TTS; configure `tts_voice`. Spanish is
+  supported; Japanese is documented as kana-only and is not recommended for
+  normal Japanese sentence cards.
+- `kokoro`: local/offline neural TTS; configure `tts_model`, `tts_voices`,
+  `tts_voice`, and `tts_code`; some Japanese setups also need
+  `tts_vocab_config`. Kokoro lists Japanese and Spanish voices, but upstream
+  notes that non-English quality can be thin.
+
+You can override backend settings for one import:
+
+```shell
+./saiki.py import jp sentences_jp.txt \
+  --tts-backend edge-tts \
+  --tts-voice ja-JP-KeitaNeural
+```
+
+Voice-listing helpers:
+
+```shell
+./saiki.py tts-voices jp
+./saiki.py tts-voices es --backend edge-tts
+```
+
+Test a TTS backend without creating Anki cards:
+
+```shell
+./saiki.py tts-test es --out /tmp/saiki_edge_default_es.mp3
+./saiki.py tts-test jp --tts-backend edge-tts --tts-voice ja-JP-NanamiNeural --out /tmp/saiki_edge_jp.mp3
+./saiki.py tts-test es --tts-backend edge-tts --tts-voice es-ES-ElviraNeural --out /tmp/saiki_edge_es.mp3
+./saiki.py tts-test es --tts-backend gtts --tts-code es --tts-tld es --out /tmp/saiki_gtts_es.mp3
+./saiki.py tts-test es --tts-backend piper --tts-model es_ES-davefx-medium.onnx --tts-config es_ES-davefx-medium.onnx.json --out /tmp/saiki_piper_es.mp3
+./saiki.py tts-test es --tts-backend espeak-ng --tts-voice es --out /tmp/saiki_espeak_es.mp3
+./saiki.py tts-test es --tts-backend kokoro --tts-model kokoro-v1.0.onnx --tts-voices voices-v1.0.bin --tts-voice ef_dora --out /tmp/saiki_kokoro_es.mp3
+```
+
+For `kokoro`, put `tts_model`, `tts_voices`, and any needed `tts_vocab_config`
+in your config file rather than typing every path each time.
+
 ### Known/New Words

 Compare any generated word list against an existing known list: