I now see that I used all the right tools in the wrong order. Time for the new order to emerge.
This page is about analyzing what was done untill now. Why is it wrong. What is the better way.What was wrong?
- Previous order looked like this
- Download all of the audio
- Generate subs with whisper
- Slice to smaller clips using timestamps made by whisper
- Split vocal from accompaniment
- Split whisper-vocal clips to end phrases
- Filter out poor quality
- Compile stats and publish
Silence or voice of npc's va or background music.
It takes space, it takes time, it takes my electricity and the same time is completely useless
New order
- Download all of the audio
- Slice everythin to 8 minute clips
- Split vocal from accompaniment
- Define (and split) phrases inside vocal clips (VAD)
- Filter out poor quality
- Generate subs with whisper
- Compile stats and publish