Automated Image Captioning with LLMs - Recognize Anything, BLIP-2, and Kosmos-2 VIDEO
Today I'm taking a look at some multi-modal large language models that can be used for automated image captioning. Rich captions can be used for training Stable Diffusion Dreambooth or LoRAs.
Recognize Anything
https://github.com/xinyu1205/recognize-anything
Kosmos-2
https://huggingface.co/microsoft/kosmos-2-patch14-224
BLIP-2 OPT-2.7B 8-bit Quantized Model by Mediocreatmybest
https://huggingface.co/Mediocreatmybest/blip2-opt-2.7b_8bit
Resources/Links/Notebook Code to Copy-Paste:
http://nanonomad.com/2023/11/19/automate-image-captioning-using-multimodal-llms/
Other Videos By NanoNomad 2024-05-17 RetroArch for iPad and iPhone now on the App Store | Installation, Setup, Quick Performance Overview 2024-05-13 Micca Speck 4K Media Player | Unboxing, Firmware Update, Setup, Demos, and Opinions 2024-05-06 Training SDXL to Generate Text Using IA3 LoRA | It's like Kai's Power Tools, I Guess? 2024-04-17 Replacing Faulty Asus Phoenix RTX 3060 GPU Cooler - It's Easy 2024-03-21 Bark TTS, Seamless Translation, RVC, Music Generation and More with the TTS Generation WebUI 2024-02-14 Train Better Stable Diffusion Models | Prep Datasets Using this Free "Magic" Image Tool 2024-02-12 Emulate a Sound Blaster in real MS-DOS on Modern Hardware | Retro Gaming on "Current" PCs 2024-01-28 How to Play Hundreds of Point-and-Click Adventures on iOS for FREE with ScummVM with NO SIDELOADING 2024-01-18 Training LoRAs and GLoRAs for Stable Diffusion 1.5 and XL Using the New Prodigy Optimizer 2024-01-03 Nick Rekieta - Role Model (Voice Parody. It's silly. It's a joke.) 2023-11-19 Automated Image Captioning with LLMs - Recognize Anything, BLIP-2, and Kosmos-2 2023-10-27 Fine-Tuning Mistral 7B using QLoRA and PEFT on Unstructured Scraped Text Data | Making it Evil? 2023-09-20 Exploring XTTS v1 and Tools to make Better Audio Datasets (the lazy way) 2023-09-01 Es spricht Deutsch | Tortoise TTS Speaking German Demo Clip | Model Download Link in Description 2023-08-18 AI Null reads Alice's Adventures in Wonderland by Lewis Carroll | Full Audiobook 2023-08-11 Remove Background Music and Enhance Speech with Free AI Tools | Avoid ContentID 2023-08-06 AI Null Reads Alice's Adventures in Wonderland by Lewis Carroll, Chapters 1 and 2 | joshcore 2023-07-30 Are Text Cleaners Making Your TTS Models Sound Bad? | TTS Model Training Tips 2023-07-08 .:Demo:. Tortoise TTS Expressive Speech narrating Norman Arkawy's 1955 Sci-Fi short "Selling Point" 2023-07-03 .::Demo::. 4 Voice Multispeaker Tortoise TTS English Fine-Tuned Model Test :: Great Dictator Speech 2023-07-01 Creepy Message about a 2003 Pandemic in China on found IBM PS/1 Pentium 66mhz PC