Just write a script and put it as a cronjob

Nothing wrong with replacing content of html tag with sed

Let us start with easiest part. Assume you already have a text. How to replace tag content with it?

#!/bin/sh

phrase=${1:-"test phrase"}
# no multilines
phrase=$(echo $phrase | tr "\n" " ")
sed -i -e "s|\(<p id=phrase>\)\(.*\)\</p>\)|\1${phrase}\3|" index.html

Generating a picture with stable diffusion

No explanation for installation would be here. If needed go and follow github page
To enable api I run it like this

./webui.sh --xformers --api
...
Running on local URL:  http://127.0.0.1:7860

On the page http://localhost:7860/docs you can see swagger docs.
endpoint /sdapi/v1/img2img is only one that needed.
image of documentation
Through experiments acceptable params "denoising_strength": 0.6, "cfg_scale": 23 were found
I would say that denoising_strength defines how much original image is respected
and cfd_scale is some value for prompt importance (1-30);

Image input and output should be a string in base64.
I suggest you copypaste from somewhere negative prompt,
unless there are some exceptional requirements.

imgPath=./nessi_avatar.jpg
sourceIMG=$(base64 -w 0 "$imgPath")
sourceIMG=$"\"$sourceIMG\""

resp=$(curl --data "{\"init_images\": ["$sourceIMG"], \"steps\":\"$steps\", \"prompt\":\"$prompt\", \"sampler_index\":\"$model\", \"sampler_name\":\"$model\", \"negative_prompt\": \"$commonNegative\", \"denoising_strength\": 0.55, \"cfg_scale\": 23}" --header 'Content-Type: application/json' http://127.0.0.1:7860/sdapi/v1/img2img)

outname="nessi_generated.jpg"

echo $resp | jq -r '.images[]' | base64 -d > $outname

Stable diffusion, whisper and vram problem

Stable diffusion allocates vram when started (or after first generated pic)
and holds some of your vram hostage.
Its possible that you wont have enough vram to run stabble diffusion and whisper at the same time I say use --device cpu to extract text with power of your cpu.

whisper --model medium --language de --device cpu todays_audio.opus

Now that I have everything (audio, text, picture) just push it to the server with rsync
And make a cronjob