I-JEPA: Importance of Predicting in Latent Space

Subscribers:
5,330
Published on ● Video Link: https://www.youtube.com/watch?v=M98OLk30dBk



Duration: 1:35:37
393 views
11


I-JEPA is the first implementation of the Joint Embedding Predictive Architecture (JEPA) by Yann LeCun. I am a huge fan of LeCun, and many of my AI thoughts have been powered by his views as well. However, I am not in agreement with using Vision Transformers (ViT) as the encoder, as it loses most semantic information about the spatial component of images. Furthermore, it takes a long time to learn as it does not have the relevant inductive biases for learning images (a.k.a. translational invariance).

While I-JEPA achieves quite amazing downstream task performance like on the ImageNet Top-1 prediction task, it could perhaps be better if the masked objective can be done on a CNN-like architecture instead, with self-attention layers perhaps over the post-filter outputs.

We could also explore doing a Stable-Diffusion-like conditioning, whereby the predictor module is conditioned on some text input to predict the latent space. Broad-level to specific-level conditioning, and using memory of similar latent spaces, is also something that can be explored. In the end, I believe a hierarchical architecture, going from broad to specific, with each layer of abstraction conditioning on the broader layer of abstraction above, and finally attention between all the generated layers of abstraction (or latent space) to use for prediction could be a better bet.

That said, I-JEPA is a promising first step, and I am excited to see what comes next.

~~~~~~~~~~~~~~~~~~

Slides: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/I-JEPA.pdf

Reference Materials:
I-JEPA: https://arxiv.org/abs/2301.08243
Vision Transformers: https://arxiv.org/abs/2010.11929
Swin Transformers (Transformers with hierarchy and shifting attention windows): https://arxiv.org/abs/2103.14030
MLP-Mixer (All MLP only image processing): https://arxiv.org/abs/2105.01601
Conv-Mixer (Patches with Conv layers): https://arxiv.org/abs/2201.09792
Stable Diffusion: https://arxiv.org/abs/2112.10752

~~~~~~~~~~~~~~~~~~

(0:00) Introduction
5:54 Transformers: Prediction back in input space
11:12 Prediction in Latent Space
22:25 Stable Diffusion and Latent Space
29:17 Vision Transformer (ViT)
44:57 Swin Transformer
50:12 ViT’s positional encoding may not be good!
51:38 I-JEPA
1:09:26 Discussion on how to improve I-JEPA

~~~~~~~~~~~~~~~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin




Other Videos By John Tan Chong Min


2023-08-23LLM as Pattern Machines(Part 2) - Goal Directed Decision Transformers, 10-Year Plan for Intelligence
2023-08-18Tutorial #9: Evolution Game v2: ChatGPT (Text) and Dall-E (Image) API Integration!
2023-08-17Tutorial #8: Create a Web Scraper using ChatGPT and Selenium!
2023-08-17Tutorial #7: Create a Chatbot with Gradio and ChatGPT!
2023-08-15LLMs as General Pattern Machines: Use Arbitrary Tokens to Pattern Match?
2023-08-08Tutorial #6: LangChain & StrictJSON Implementation of Knowledge Graph Question Answer with LLMs
2023-08-08Large Language Models and Knowledge Graphs: Merging Flexibility and Structure
2023-07-31Tutorial #5: SymbolicAI - Automatic Retrieval Augmented Generation, Multimodal Inputs, User Packages
2023-07-27How Llama 2 works: Ghost Attention, Quality Supervised Fine-tuning, RLHF for Safety and Helpfulness
2023-07-27Llama 2 vs ChatGPT
2023-07-11I-JEPA: Importance of Predicting in Latent Space
2023-07-09Gen AI Study Group Introductory Tutorial - Transformers, ChatGPT, Prompt Engineering, Projects
2023-07-03Tutorial #5: Strict JSON LLM Framework - Get LLM to output JSON exactly the way you want it!
2023-07-01Tutorial #4: SymbolicAI ChatBot In-Depth Demonstration (Tool Use and Iterative Processing)
2023-06-29How do we learn so fast? Towards a biologically plausible model for one-shot learning.
2023-06-20LLMs as a system to solve the Abstraction and Reasoning Corpus (ARC) Challenge!
2023-06-16Tutorial #3: Symbolic AI - Symbols, Operations, Expressions, LLM-based functions!
2023-06-13No more RL needed! LLMs for high-level planning: Voyager + Ghost In the Minecraft
2023-06-06Voyager - An LLM-based curriculum generator, actor and critic, with skill reuse in Minecraft!
2023-06-01Evolution ChatGPT Prompt Game - From Bacteria to.... Jellyfish???
2023-05-30Prompt Engineering and LLMOps: Tips and Tricks