There is background sound/music when I am only interested in her voice
I researched for a few long hours what to do
some links on the topics if you want to "waste" your time as I did
- librosa example
- Open Source Tools & Data for Music Source Separation By Ethan Manilow, Prem Seetharaman, and Justin Salamon
Spleeter is a solution
Bad column was written before I learned to use spleeter better. I am leaving this table for historical reasons.
Good | Bad |
splits vocals and accompaniments from audiofile | converts output files to wav (correction) with sr=44khz |
fast enough for me to use | output naming is weird: created directory with original filename and puts files in it: vocals.wav & accompaniment.wav (you can alter that) |
no configuration required, works out of the box | no configuration possible, I would like to change how some stuff works |
it opened my eyes to new problems | when converted to wav 25G blowed up to over 500G |
??? | I am not sure how it happened. But I screwed up naming and filenames stopped correlating to proper text in metadata.tsv. It is not Spleeter fault (should not be), but still. |
flexible command line options to fix most of the bad | on my machine it struggles with more than 8minute audio |
How I run it
Stuff about spleeter forcing you to have unlimited space by converting files to wav was exaggerated
One can pass other codec type to -c flag to avoid death by asphyxiation
Spleeter can take multiple input files, but there is `MAX_ARGS` limit
I say give it 3000 batches
Example of use
spleeter separate -o output -c ogg -f {filename}_{instrument}.{codec} \ "8_minute_slices/Let's Play Awakening [Rpg Maker Horror ⧸ Demo] 8 - Ups, eine schiefgelaufene Rettungsaktion? [wX9ugDgY-iM].opus_00:13.000_00:21.000.opus_00.opus" \ ...
Demonstration of spleeter
original.ogg
accompaniment.wav
vocals.wav
Everything takes a lot of space! Even file names!
It is my understanding that system ext4 (fat? | ntfs?) reserves space for names at directory creation (4KB) and extends that space if neccessary.
Point is: when there is over 400k file names like
Let's Play Awakening [Rpg Maker Horror ⧸ Demo] 8 - Ups, eine schiefgelaufene Rettungsaktion? [wX9ugDgY-iM].opus_00:13.000_00:21.000.opus
in the same directory, it causes simple commands like
ls | wc -lto run for a few seconds, or even minutes if we are talking about my poor usb drive.
I am not yet clear on the solution, but most likely I will get some short hashing method and shorten all the names with it. For example
printf "Let's Play Awakening [Rpg Maker Horror ⧸ Demo] 8 - Ups, eine schiefgelaufene Rettungsaktion? [wX9ugDgY-iM].opus_00:13.000_00:21.000.opus" | xxhsum | cut -d ' ' -f1 b9205a9cdf641557
I changed my mind about xxhsum. In golang it is an external library. I am going with md5.
spleeter separate -o output -c ogg -f {filename}_{instrument}.{codec} \ 8_minutes_slices/2f56da703d60103bf9cf5848e1b3415a_01.opus \ 8_minutes_slices/da18f80180b3fac3eadba34ae21502b6_01.opus \ 8_minutes_slices/0a67183fe5a554e8af52b9e243d441e8_02.opus \ 8_minutes_slices/07ad09a301080ca18890f7525d43b56a_01.opus \ 8_minutes_slices/aef5fc3c8ec7bc6134e2a2cc20ceeccd_02.opus \ ...
How do I spleet all audio to 8 minute clips
Most related function looks like that
func cutOnEqualParts(filepath, outname, segment string) error { err := ffmpeg.Input(filepath). Output(outname+"_%02d.opus", ffmpeg.KwArgs{ "c": "copy", "map": 0, "segment_time": segment, "f": "segment", "reset_timestamps": 1, }). OverWriteOutput().ErrorToStdOut().Run() return err }Check the Repo for the rest.
A lot of clips have silence or non-Her-voice sound
This is different from background sound (playing) at the same time that Her voice. It can be solved by trimming | slicing.
Good example on solving silence problem (vad)
Harder cases contains multiple voices, while only one of them is Hers (as opposite when She immitates|makes different voices to add character and immersion). Thats more like the case for nn (or gmm model) to know Nessi's voice features and simply to recognize Her voice.