Multimodal Query for Images: Text/Image Multimodal Query with Negative Filter and Folder Selection
This is a practical demonstration as to how to use gpt-o4-mini-high with web search and documentation to vibe code a proof-of-concept for multimodal retrieval.
You can retrieve with either text or image input prompts, or even a hybrid search.
As Cohere embeddings are not that great with negative prompts, I also created negative text/image input filters to prevent some images from being retrieved.
You can also select the filters used to search only over certain sub-folders of your choice!
Also works if image is text or image is a sketch! Try it out!
Code can be found at:
https://github.com/tanchongmin/john-youtube/tree/main/Multimodal_Index
Dataset used:
https://www.kaggle.com/datasets/shreyapmaher/fruits-dataset-images
Download and put into a folder Fruits in same directory as this Jupyter Notebook
File structure
Current Directory
Fruits (folder)
.env
embeddings.db (automatically generated with the sqlite3 in this code)
Multimodal_Index.ipynb (this notebook)
Multimodal Embedding Used:
Cohere Embed v4: https://cohere.com/blog/embed-4
~~~
0:00 Introduction
3:24 First Prompt
9:48 Introducing SQLite Database to Cache Embeddings
16:56 Testing out Cohere Embeddings
24:10 Hybrid Image/Text Query
1:00:22 Sub-folder Filtering
1:07:03 Gradio UI
1:25:51 Not storing user input to cache
1:28:14 Negative Fitler
1:30:41 Moment of Truth - Final Testing
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin