Animated Stable Diffusion and Synthesized Voice Demo with Facial Movements
Short test video version 2. Stable Diffusion generated image. Altered images using inpainting to close eyes, alter smile, etc. Selected image pairs and used with Frame Interpolation for Large Motion ML model to interpolate images in between. Selected clips generated by FILM and made a loop of face movements.
Voice is synthesized Lauretta Young; most samples from 1930s movies and radio plays. Audio quality of voice samples is very poor, but rnnoise ML model did a reasonable job cleaning them up. Model is VITS fine tuned using Coqui TTS up to 1153000 steps.
Synced voice to video using Wav2Lip with the wav2lip_gan model as a 512x512 video.
https://github.com/Rudrabha/Wav2Lip
Upscaled video using Aaron Feng's massively feature rich Waifu2x GUI
https://github.com/AaronFeng753/Waifu2x-Extension-GUI
Speech is the lyrics to Jamiroquai's Virtual Insanity