I-JEPA: Importance of Predicting in Latent Space

Channel:

John Tan Chong Min

Subscribers:

6,300

Published on July 11, 2023 8:31:25 AM ● Video Link: https://www.youtube.com/watch?v=M98OLk30dBk

Duration: 1:35:37

393 views

I-JEPA is the first implementation of the Joint Embedding Predictive Architecture (JEPA) by Yann LeCun. I am a huge fan of LeCun, and many of my AI thoughts have been powered by his views as well. However, I am not in agreement with using Vision Transformers (ViT) as the encoder, as it loses most semantic information about the spatial component of images. Furthermore, it takes a long time to learn as it does not have the relevant inductive biases for learning images (a.k.a. translational invariance).

While I-JEPA achieves quite amazing downstream task performance like on the ImageNet Top-1 prediction task, it could perhaps be better if the masked objective can be done on a CNN-like architecture instead, with self-attention layers perhaps over the post-filter outputs.

We could also explore doing a Stable-Diffusion-like conditioning, whereby the predictor module is conditioned on some text input to predict the latent space. Broad-level to specific-level conditioning, and using memory of similar latent spaces, is also something that can be explored. In the end, I believe a hierarchical architecture, going from broad to specific, with each layer of abstraction conditioning on the broader layer of abstraction above, and finally attention between all the generated layers of abstraction (or latent space) to use for prediction could be a better bet.

That said, I-JEPA is a promising first step, and I am excited to see what comes next.

~~~~~~~~~~~~~~~~~~

Slides: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/I-JEPA.pdf

Reference Materials:
I-JEPA: https://arxiv.org/abs/2301.08243
Vision Transformers: https://arxiv.org/abs/2010.11929
Swin Transformers (Transformers with hierarchy and shifting attention windows): https://arxiv.org/abs/2103.14030
MLP-Mixer (All MLP only image processing): https://arxiv.org/abs/2105.01601
Conv-Mixer (Patches with Conv layers): https://arxiv.org/abs/2201.09792
Stable Diffusion: https://arxiv.org/abs/2112.10752

~~~~~~~~~~~~~~~~~~

(0:00) Introduction
5:54 Transformers: Prediction back in input space
11:12 Prediction in Latent Space
22:25 Stable Diffusion and Latent Space
29:17 Vision Transformer (ViT)
44:57 Swin Transformer
50:12 ViT’s positional encoding may not be good!
51:38 I-JEPA
1:09:26 Discussion on how to improve I-JEPA

~~~~~~~~~~~~~~~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2023-08-23	LLM as Pattern Machines(Part 2) - Goal Directed Decision Transformers, 10-Year Plan for Intelligence
2023-08-18	Tutorial #9: Evolution Game v2: ChatGPT (Text) and Dall-E (Image) API Integration!
2023-08-17	Tutorial #8: Create a Web Scraper using ChatGPT and Selenium!
2023-08-17	Tutorial #7: Create a Chatbot with Gradio and ChatGPT!
2023-08-15	LLMs as General Pattern Machines: Use Arbitrary Tokens to Pattern Match?
2023-08-08	Tutorial #6: LangChain & StrictJSON Implementation of Knowledge Graph Question Answer with LLMs
2023-08-08	Large Language Models and Knowledge Graphs: Merging Flexibility and Structure
2023-07-31	Tutorial #5: SymbolicAI - Automatic Retrieval Augmented Generation, Multimodal Inputs, User Packages
2023-07-27	How Llama 2 works: Ghost Attention, Quality Supervised Fine-tuning, RLHF for Safety and Helpfulness
2023-07-27	Llama 2 vs ChatGPT
2023-07-11	I-JEPA: Importance of Predicting in Latent Space
2023-07-09	Gen AI Study Group Introductory Tutorial - Transformers, ChatGPT, Prompt Engineering, Projects
2023-07-03	Tutorial #5: Strict JSON LLM Framework - Get LLM to output JSON exactly the way you want it!
2023-07-01	Tutorial #4: SymbolicAI ChatBot In-Depth Demonstration (Tool Use and Iterative Processing)
2023-06-29	How do we learn so fast? Towards a biologically plausible model for one-shot learning.
2023-06-20	LLMs as a system to solve the Abstraction and Reasoning Corpus (ARC) Challenge!
2023-06-16	Tutorial #3: Symbolic AI - Symbols, Operations, Expressions, LLM-based functions!
2023-06-13	No more RL needed! LLMs for high-level planning: Voyager + Ghost In the Minecraft
2023-06-06	Voyager - An LLM-based curriculum generator, actor and critic, with skill reuse in Minecraft!
2023-06-01	Evolution ChatGPT Prompt Game - From Bacteria to.... Jellyfish???
2023-05-30	Prompt Engineering and LLMOps: Tips and Tricks

Channel	Latest
VIA X	6 hours ago
ÉducaTube	6 hours ago
Rizsuja	6 hours ago
Christopher Leon Johnson	6 hours ago
Vibhor IndianFC	6 hours ago
Si Utuh	7 hours ago
Sergejs Ivanovs	7 hours ago
Jimmy Puig	7 hours ago
REHHS	7 hours ago
PATROL CAR	7 hours ago
Smwadey	7 hours ago
SamJam	7 hours ago
The Izzys	7 hours ago
HBTang	7 hours ago
Emre Can Zorlu	7 hours ago
佐藤暖日	7 hours ago
Neutral Gaming	7 hours ago
TueurDeBikette	7 hours ago
ClickBait GamePlay	7 hours ago
SlurpTech	7 hours ago
Alatus	7 hours ago
OneFlash	7 hours ago
Aul Roch	7 hours ago
FPSThailand	7 hours ago
Blick Sport	7 hours ago