Better solution out of the box: whisper

When your GPU allows it, you may consider using openai's whisper to convert audio to text
As I did with medium model

medium 	769 M 	medium.en 	medium 	~5 GB 	~2x    

pacman -Q | grep cuda
cuda 11.8.0-1
cuda-tools 11.8.0-1   

getting the audio

ffmpeg -i v13_1.webm -vn -acodec copy v13_1.ogg    

using whisper it shall return three new files (.vtt .srt .txt) in directory that it was called from

whisper --model medium --language de v13_1.ogg
du -h v13_1.ogg*
416M    v13_1.ogg
644K    v13_1.ogg.srt
440K    v13_1.ogg.txt
612K    v13_1.ogg.vtt

And time to compare quality of the result

./quality_check.py overlord_audiobooks/vol_13/v13_1.ogg.txt 2>/dev/null
0.8536507196550318:68644:overlord_audiobooks/vol_13/v13_1.ogg.txt

Name problem

Reading the whisper result
Whisper gets confused about how to spell fantasy names
Some of variations of Neia (Baraha):