9.9 KiB
Saiki
Saiki (採記) is a small toolkit for Anki-based language learning workflows:
listening playlists, word mining, YouTube transcript mining, TTS sentence
imports, and known/new word comparison.
The name is a coined Japanese compound from 採 as in gathering/collecting and
記 as in remembering or recording. Pronunciation: saiki, roughly
"sigh-key".
./saiki.py --help
Requirements
- Python 3.12 recommended
- Anki with AnkiConnect
ffmpeg- Python dependencies from
requirements.txt - Optional extra TTS backend tools:
piper,espeak-ng, andkokoro-onnx. - spaCy models for word mining:
python -m spacy download es_core_news_sm
python -m spacy download ja_core_news_lg
Setup example:
python3.12 -m venv ~/.venv/saiki
source ~/.venv/saiki/bin/activate
python3 -m pip install -U pip
pip install -r requirements.txt
sudo dnf install ffmpeg
Optional TTS Backends
The default edge-tts backend is installed by requirements.txt. Install only
the optional pieces you plan to test:
# Python-backed optional engines: piper, kokoro.
pip install -r requirements-tts.txt
# System package for espeak-ng.
sudo dnf install espeak-ng
Other package-manager names:
sudo apt-get install espeak-ng
sudo pacman -S espeak-ng
Backend notes:
edge-tts: installed bypip install edge-tts; no API key, but it uses Microsoft Edge's online TTS service.gtts: installed byrequirements.txt; no API key, but it uses Google's online TTS service throughgtts-cli.piper: installed bypip install piper-tts; you still need a compatible.onnxvoice model, usually with its matching.onnx.jsonconfig file.espeak-ng: installed through your OS package manager, not pip.kokoro: installed bypip install kokoro-onnx soundfile; you still needkokoro-v1.0.onnxandvoices-v1.0.bin, plus any language-specific G2P setup required by your Kokoro release.
Example model downloads for the README smoke tests:
mkdir -p ~/.local/share/saiki/models
# Piper Spanish voice model plus matching config.
wget -O ~/.local/share/saiki/models/es_ES-davefx-medium.onnx \
https://huggingface.co/rhasspy/piper-voices/resolve/main/es/es_ES/davefx/medium/es_ES-davefx-medium.onnx
wget -O ~/.local/share/saiki/models/es_ES-davefx-medium.onnx.json \
https://huggingface.co/rhasspy/piper-voices/resolve/main/es/es_ES/davefx/medium/es_ES-davefx-medium.onnx.json
# Kokoro ONNX model plus voices bundle.
wget -O ~/.local/share/saiki/models/kokoro-v1.0.onnx \
https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
wget -O ~/.local/share/saiki/models/voices-v1.0.bin \
https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
Saiki's default tts_model_dir is ~/.local/share/saiki/models. Relative
model paths such as es_ES-davefx-medium.onnx are resolved under that
directory. You can override it in YAML with tts_model_dir or for one command
with --tts-model-dir.
Configuration
Defaults are built in, but you can override them with YAML:
~/.config/saiki/config.yaml
Or pass a config explicitly:
./saiki.py --config ./config.yaml words jp
Example:
anki_connect_url: http://localhost:8765
media_dir: ~/.var/app/net.ankiweb.Anki/data/Anki2/User 1/collection.media
audio_output_root: ~/Languages/Anki/anki-audio
word_output_root: ~/Languages/Anki/anki-words
sentence_dir: ~/Languages/Anki
tts_model_dir: ~/.local/share/saiki/models
note_model: Basic
fields:
front: Front
back: Back
languages:
jp:
name: japanese
transcript_code: ja
tts_backend: edge-tts
tts_voice: ja-JP-NanamiNeural
tts_tempo: 1.35
decks: ["日本語"]
field: Back
word_model: ja_core_news_lg
sentence_file: sentences_jp.txt
es:
name: spanish
transcript_code: es
tts_backend: edge-tts
tts_voice: es-ES-ElviraNeural
tts_tempo: 1.25
decks: ["Español"]
field: Back
word_model: es_core_news_sm
sentence_file: sentences_es.txt
A copyable template is also available at examples/config.yaml.
Supported language codes by default:
jpes
CLI
Audio
Extract audio referenced by [sound:...] tags from configured decks and create
an .m3u playlist.
./saiki.py audio jp
./saiki.py audio es --concat
./saiki.py audio jp --media-dir ~/.local/share/Anki2/User\ 1/collection.media --copy-only-new
Outputs go to ~/Languages/Anki/anki-audio/<language>/ by default.
Words
Extract frequent words from Anki notes using AnkiConnect and spaCy.
./saiki.py words jp
./saiki.py words es --deck "Español"
./saiki.py words es --query 'deck:"Español" tag:youtube'
./saiki.py words jp --min-freq 3 --out words_jp.txt
./saiki.py words jp --full-field
Output format:
word frequency
Examples:
comer 12
hablar 9
行く (行き) 8
見る (見た) 6
YouTube
Mine vocabulary or sentence rows from YouTube subtitles.
./saiki.py youtube es VIDEO_ID
./saiki.py youtube es VIDEO_ID --top 50
./saiki.py youtube jp VIDEO_ID --mode sentences
./saiki.py youtube es VIDEO_ID --raw --no-stopwords
Export Anki-ready sentence rows:
./saiki.py youtube es VIDEO_ID --mode sentences --out youtube.tsv
Export only rows that appear to contain unknown vocabulary:
./saiki.py youtube es VIDEO_ID \
--mode sentences \
--out youtube_new.tsv \
--known-words ~/Languages/Anki/anki-words/spanish/words_es.txt \
--only-new
Sentence exports contain:
sentence timestamp video_url vocab_guess
Import
Generate TTS audio and add sentence cards to Anki.
./saiki.py import es
./saiki.py import jp ~/Languages/Anki/sentences_jp.txt
./saiki.py import es youtube.tsv --tags youtube,manual
./saiki.py import es --tts-voice es-MX-DaliaNeural
The importer accepts plain text sentence files and TSV/CSV files with a
sentence column. text-to-speech is always added as a tag. If --tags is not
provided, AI-generated is added.
TTS is configured per language with tts_backend. Supported backends are:
edge-tts: default backend using Microsoft Edge neural voices; configuretts_voice.gtts: free backend usinggtts-cli; configuretts_codeandtts_tld.piper: local/offline neural TTS; configuretts_modelwith a model path. The stock Piper catalog includes Spanish voices, but not Japanese.espeak-ng: local/offline lightweight TTS; configuretts_voice. Spanish is supported; Japanese is documented as kana-only and is not recommended for normal Japanese sentence cards.kokoro: local/offline neural TTS; configuretts_model,tts_voices,tts_voice, andtts_code; some Japanese setups also needtts_vocab_config. Kokoro lists Japanese and Spanish voices, but upstream notes that non-English quality can be thin.
You can override backend settings for one import:
./saiki.py import jp sentences_jp.txt \
--tts-backend edge-tts \
--tts-voice ja-JP-KeitaNeural
Voice-listing helpers:
./saiki.py tts-voices jp
./saiki.py tts-voices es --backend edge-tts
Test a TTS backend without creating Anki cards:
./saiki.py tts-test es --out /tmp/saiki_edge_default_es.mp3
./saiki.py tts-test jp --tts-backend edge-tts --tts-voice ja-JP-NanamiNeural --out /tmp/saiki_edge_jp.mp3
./saiki.py tts-test es --tts-backend edge-tts --tts-voice es-ES-ElviraNeural --out /tmp/saiki_edge_es.mp3
./saiki.py tts-test es --tts-backend gtts --tts-code es --tts-tld es --out /tmp/saiki_gtts_es.mp3
./saiki.py tts-test es --tts-backend piper --tts-model es_ES-davefx-medium.onnx --tts-config es_ES-davefx-medium.onnx.json --out /tmp/saiki_piper_es.mp3
./saiki.py tts-test es --tts-backend espeak-ng --tts-voice es --out /tmp/saiki_espeak_es.mp3
./saiki.py tts-test es --tts-backend kokoro --tts-model kokoro-v1.0.onnx --tts-voices voices-v1.0.bin --tts-voice ef_dora --out /tmp/saiki_kokoro_es.mp3
For kokoro, put tts_model, tts_voices, and any needed tts_vocab_config
in your config file rather than typing every path each time.
Known/New Words
Compare any generated word list against an existing known list:
./saiki.py compare-words transcript_words.txt ~/Languages/Anki/anki-words/spanish/words_es.txt
This prints entries from the first file whose word key does not appear in the second file.
Card Assumptions
The default configuration assumes Basic notes with audio on Front and the
target-language sentence on Back. Word mining reads only the first visible
line by default; use --full-field to process the whole field.
To Do
- Add support for different Anki note/card types, including configurable field mappings per language and per import workflow.
- Support multiple import profiles, such as sentence cards, vocab cards, audio cards, and cloze cards.
- Let YouTube exports map directly into configurable note fields, not just a
fixed
sentencecolumn. - Add richer transcript filtering, such as minimum/maximum sentence length, duplicate removal, and punctuation cleanup.
- Add optional audio slicing from videos when timestamp data is available.
- Improve known/new word matching with better lemmatization for transcript vocabulary.
- Add more language profiles beyond Japanese and Spanish.
- Add a dry-run mode for imports that previews notes before sending anything to AnkiConnect.
- Build a GUI for common workflows like transcript review, sentence selection, import previews, and configuration editing.
- Add integration tests with mocked AnkiConnect responses.
- Add shell completion or a small installed command once packaging becomes useful.
Tests
Pure logic tests use the standard library test runner:
python -m unittest discover -s tests
License
This project is licensed under the MIT License. See LICENSE.
