Animated Stable Diffusion and Synthesized Voice Demo with Facial Movements

Channel:
Subscribers:
2,810
Published on ● Video Link: https://www.youtube.com/watch?v=BnrnJ_6wiYs



Duration: 1:58
315 views
3


Short test video version 2. Stable Diffusion generated image. Altered images using inpainting to close eyes, alter smile, etc. Selected image pairs and used with Frame Interpolation for Large Motion ML model to interpolate images in between. Selected clips generated by FILM and made a loop of face movements.

Voice is synthesized Lauretta Young; most samples from 1930s movies and radio plays. Audio quality of voice samples is very poor, but rnnoise ML model did a reasonable job cleaning them up. Model is VITS fine tuned using Coqui TTS up to 1153000 steps.

Synced voice to video using Wav2Lip with the wav2lip_gan model as a 512x512 video.
https://github.com/Rudrabha/Wav2Lip

Upscaled video using Aaron Feng's massively feature rich Waifu2x GUI
https://github.com/AaronFeng753/Waifu2x-Extension-GUI

Speech is the lyrics to Jamiroquai's Virtual Insanity




Other Videos By NanoNomad


2023-02-12Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset
2023-02-04Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab
2023-01-30YourTTS Training Discussion | Experiences, Multistage Training, Demos, Prior Training Preservation
2023-01-27Updated | Fine-Tuning YourTTS with Automated STT Datasets on Google Colab for AI Voice Cloning
2023-01-13Fine-Tune YourTTS with Near-Automated Datasets on Google Colab for AI Voice Cloning
2022-12-22Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab or Linux
2022-12-09Dreambooth and Fine Tuning for Stable Diffusion 1.5 and 2 with this Versatile Script
2022-11-30If Bill Gates could rap? AI Synthesized Voice, AI Upsampled Video | Deltron 3030's Virus
2022-11-14Training Stable Diffusion Dreambooth on Multiple Subjects for Combined Image Generation
2022-10-31Locally Train Stable Diffusion with Dreambooth using WSL Ubuntu
2022-10-25Animated Stable Diffusion and Synthesized Voice Demo with Facial Movements
2022-10-24Stable Diffusion Image to Video, Synthesized Lauretta Young 1930s voice, Wav2Lip Demo
2022-10-16Animate Images using AI with Frame Interpolation for Large Motion
2022-10-14Animated Stable Diffusion Images using Google's FILM Frame Interpolation for Large Motion demo
2022-10-07Training Textual Inversion for Stable Diffusion | Customizable AI Image Generation
2022-09-26How to Download All Styles and Objects from the Stable Diffusion Concepts Library | AI Images
2022-09-05AI Images | Installing Stable Diffusion and the Automatic1111 WebUI using Conda on Windows 10
2022-09-04AI Image Generation with Stable Diffusion Part 2 | Img2Img Transformations, Masking, Upscaling
2022-09-01AI Image Generation with Stable Diffusion | Part 1
2022-08-28Johnny Cash Delivers The Great Dictator Speech | AI Voice Demo VITS Model with Stable Diffusion art
2022-08-23Duke Nukem covering DJ Shadow feat Run The Jewels' Nobody Speak | AI Voice Synthesis



Tags:
TTS
Stable Diffusion
img2img
AI
machine learning
ai art