OpenAI CLIP Embeddings: Walkthrough + Insights

Channel:

John Tan Chong Min

Subscribers:

5,450

Published on April 9, 2024 1:49:11 PM ● Video Link: https://www.youtube.com/watch?v=l7-JAzVPGJI

Category:

Walkthrough

Duration: 1:47:56

561 views

If there is one thing that has been impactful ever since its launch, it has to be CLIP Embeddings.

CLIP stands for Contrastive Language–Image Pre-training.

From Stable Diffusion to DALL-E to Robotics Tasks involving Vision and Text, CLIP bridges the gap between image and text using an embedding space common to both of them.

Granted, CLIP is not able to do everything well - it struggles with the limitations of vector embeddings - context may not be captured well.

It also struggles with limitations of the image encoder - loss of positional information with Vision Transformers.

That said, it is pretty useful for generic tasks, and my experiments with it have impressed me on its versatility to various situations.

Web-scale training does produce wonders.

~~~~

CLIP Paper: https://arxiv.org/abs/2103.00020
CLIP Code: https://github.com/openai/CLIP

Code for my experiments: https://github.com/tanchongmin/TensorFlow-Implementations/tree/main/Paper_Reviews/CLIP/CLIP%20Code
Slides: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/CLIP/CLIP_Embeddings.pdf

~~~~

0:00 Introduction
1:19 CLIP Experiments
28:37 Key Takeaways
34:57 Prediction in latent space is faster learning than in input space
39:53 Dataset
46:32 Final Architecture
48:45 CLIP Training
55:04 CLIP nference
57:42 Code details
1:01:00 Performance over 27 datasets
1:05:19 Using CLIP for Classification
1:06:50 Prompt Engineering and Ensembling to Improve Classification performance
1:11:03 CLIP is good for for datasets with limited samples
1:12:10 CLIP is bad for specialised tasks
1:16:19 Broad Training vs Specific Training
1:24:50 CLIP and Multiple Abstraction Spaces
1:30:51 Discussion

~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2024-07-12	Michael Hodel: Reverse Engineering the Abstraction and Reasoning Corpus
2024-07-02	TaskGen Conversational Class v2: JARVIS, Psychology Counsellor, Sherlock Holmes Shop Assistant
2024-06-04	CodeAct: Code As Action Space of LLM Agents - Pros and Cons
2024-05-28	TaskGen Conversation with Dynamic Memory - Math Quizbot, Escape Room Solver, Psychology Counsellor
2024-05-21	Integrate ANY Python Function, CodeGen, CrewAI tool, LangChain tool with TaskGen! - v2.3.0
2024-05-11	Empirical - Open Source LLM Evaluation UI
2024-05-07	TaskGen Ask Me Anything #1
2024-04-29	StrictJSON (LLM Output Parser) Ask Me Anything #1
2024-04-22	Tutorial #14: Write latex papers with LLMs such as Llama 3!
2024-04-16	SORA Deep Dive: Predict patches from text, images or video
2024-04-09	OpenAI CLIP Embeddings: Walkthrough + Insights
2024-03-26	TaskGen - LLM Agentic Framework that Does More, Talks Less: Shared Variables, Memory, Global Context
2024-03-18	CRADLE (Part 2): An AI that can play Red Dead Dedemption 2. Reflection, Memory, Task-based Planning
2024-03-11	CRADLE (Part 1) - AI that plays Red Dead Redemption 2. Towards General Computer Control and AGI
2024-03-05	TaskGen - A Task-based Agentic Framework using StrictJSON at the core
2024-02-27	SymbolicAI / ExtensityAI Paper Overview (Part 2) - Evaluation Benchmark Discussion!
2024-02-20	SymbolicAI / ExtensityAI Paper Overview (Part 1) - Key Philosophy Behind the Design - Symbols
2024-02-13	Embeddings Walkthrough (Part 2): Context-Dependent Embeddings, Shifting Embedding Space
2024-02-06	Embeddings Walkthrough (Part 1) - Bag of Words to word2vec to Transformer contextual embeddings
2024-01-29	V* - Better than GPT-4V? Iterative Context Refining for Visual Question Answer!
2024-01-23	AutoGen: A Multi-Agent Framework - Overview and Improvements

Channel	Latest
ابو فلاح	6 hours ago
TUI TÊN BÔ	6 hours ago
Mateusz Kaniowski	6 hours ago
Double T	6 hours ago
JAyy YT	6 hours ago
KK	6 hours ago
ogLock	6 hours ago
Nogi's - のぎーず	6 hours ago
Arifmoch	6 hours ago
Hil6175_rblx	7 hours ago
Tomochi gaming	7 hours ago
Guillaume & Kim	7 hours ago
Gaming ML	7 hours ago
DevilJynx	7 hours ago
Howhowgoose 皓皓鵝-遊戲頻道	7 hours ago
Bingtang Xiaokun	7 hours ago
Akwartz	7 hours ago
Yuukrp	7 hours ago
Jeffrey Ramos	7 hours ago
生活期哥遊戲強斐	7 hours ago
Mobar King	7 hours ago
Hijuga	7 hours ago
強斐-期哥	7 hours ago
Triskell Wolfy	7 hours ago
강자	7 hours ago