There is background sound/music when I am only interested in her voice

I researched for a few long hours what to do
some links on the topics if you want to "waste" your time as I did

Spleeter is a solution

Bad column was written before I learned to use spleeter better. I am leaving this table for historical reasons.

Good Bad
splits vocals and accompaniments from audiofile converts output files to wav (correction) with sr=44khz
fast enough for me to use output naming is weird: created directory with original filename and puts files in it: vocals.wav & accompaniment.wav (you can alter that)
no configuration required, works out of the box no configuration possible, I would like to change how some stuff works
it opened my eyes to new problems when converted to wav 25G blowed up to over 500G
??? I am not sure how it happened. But I screwed up naming and filenames stopped correlating to proper text in metadata.tsv. It is not Spleeter fault (should not be), but still.
flexible command line options to fix most of the bad on my machine it struggles with more than 8minute audio

How I run it

Stuff about spleeter forcing you to have unlimited space by converting files to wav was exaggerated
One can pass other codec type to -c flag to avoid death by asphyxiation
Spleeter can take multiple input files, but there is `MAX_ARGS` limit
I say give it 3000 batches
Example of use

spleeter separate -o output -c ogg -f {filename}_{instrument}.{codec} \
"8_minute_slices/Let's Play Awakening [Rpg Maker Horror ⧸ Demo] 8 - Ups, eine schiefgelaufene Rettungsaktion? [wX9ugDgY-iM].opus_00:13.000_00:21.000.opus_00.opus" \

Demonstration of spleeter




Everything takes a lot of space! Even file names!

It is my understanding that system ext4 (fat? | ntfs?) reserves space for names at directory creation (4KB) and extends that space if neccessary. Point is: when there is over 400k file names like
Let's Play Awakening [Rpg Maker Horror ⧸ Demo] 8 - Ups, eine schiefgelaufene Rettungsaktion? [wX9ugDgY-iM].opus_00:13.000_00:21.000.opus
in the same directory, it causes simple commands like

ls | wc -l
to run for a few seconds, or even minutes if we are talking about my poor usb drive.

I am not yet clear on the solution, but most likely I will get some short hashing method and shorten all the names with it. For example

printf "Let's Play Awakening [Rpg Maker Horror ⧸ Demo] 8 - Ups, eine schiefgelaufene Rettungsaktion? [wX9ugDgY-iM].opus_00:13.000_00:21.000.opus" | xxhsum | cut -d ' ' -f1

I changed my mind about xxhsum. In golang it is an external library. I am going with md5.

spleeter separate -o output -c ogg -f {filename}_{instrument}.{codec} \
8_minutes_slices/2f56da703d60103bf9cf5848e1b3415a_01.opus \
8_minutes_slices/da18f80180b3fac3eadba34ae21502b6_01.opus \
8_minutes_slices/0a67183fe5a554e8af52b9e243d441e8_02.opus \
8_minutes_slices/07ad09a301080ca18890f7525d43b56a_01.opus \
8_minutes_slices/aef5fc3c8ec7bc6134e2a2cc20ceeccd_02.opus \

How do I spleet all audio to 8 minute clips

Most related function looks like that

func cutOnEqualParts(filepath, outname, segment string) error {
	err := ffmpeg.Input(filepath).
				"c":                "copy",
				"map":              0,
				"segment_time":     segment,
				"f":                "segment",
				"reset_timestamps": 1,
	return err
Check the Repo for the rest.

A lot of clips have silence or non-Her-voice sound

This is different from background sound (playing) at the same time that Her voice. It can be solved by trimming | slicing.
Good example on solving silence problem (vad)

Harder cases contains multiple voices, while only one of them is Hers (as opposite when She immitates|makes different voices to add character and immersion). Thats more like the case for nn (or gmm model) to know Nessi's voice features and simply to recognize Her voice.