Hierarchy! The future of AI: How it helps representations and why it is important.
The world is largely hierarchical in its representation. Learning how to represent this hierarchy can lead to reuse of components and also enable solving a difficult problem iteratively - from the broad level to the specific level. This enables a divide-and-conquer approach as we split the problem into subgoals and subsubgoals.
Memory can be said to be in a vector form, with an initial reference point and some movement. I posit that as we go down the hierarchy, the movement would be mapped by some learnable function conditioned on the current state and the meta-level state, and would be more and more interpretable to the finer details.
In audio, we have residual vector quantization. In images, we have Feature Pyramid Network. In reinforcement learning, we have SayCan that can do hierarchical goal setting, bottleneck searching to find out subgoals. For Large Language Models, we have context conditioning to generate broad level plans to more specific level plans. The brain has plenty of feedback connections in addition to feedforward connections. Perhaps the feedback is to condition the bottom layers on the context of the top layers. All these various domains have some form of conditioning from the broad levels to the specific levels - perhaps that is how intelligence is generated - through a hierarchy.
Transformers may not have the feedback connections for conditioning explicitly, but I posit that the skip connections could already do such a role as we can pass the original input token embeddings unchanged (less LayerNorm which affects all tokens in the same way), and enables it to be conditioned by some context which is formed based on iterative self-attention in the same layer. Having multiple heads help with multiple ways of conditioning. As such, the hierarchical layers may actually matter a lot for Transformers.
We are just barely touching the surface on how to abstract into hierarchy, and I do not know the answer myself. Let us explore the various ways which have been done before and see if we can find an answer together!
~~~~~~~~~~~~~~~~~~~~~~~~
References:
Slides: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/Representation%20Learning.pdf
See Part 1 here: https://www.youtube.com/watch?v=cK5TaIz4-eQ
Prediction Coding Hierarchy in the Brain: https://www.nature.com/articles/s41562-022-01516-2
GLOM: How to represent part-whole hierarchies
in a neural network (Hinton): https://arxiv.org/pdf/2102.12627.pdf
A Path Towards Autonomous Machine Intelligence - Hierarchical JEPA (Yann LeCun): https://openreview.net/pdf?id=BZ5a1r-kVsf
Learning, Fast and Slow (my own paper on learning with memory +: neural networks) https://arxiv.org/abs/2301.13758
Transformers - Attention is all you need: https://arxiv.org/abs/1706.03762
Jukebox: A Generative Model for Music: https://arxiv.org/abs/2005.00341
Residual Vector Quantization: https://arxiv.org/abs/1509.05195
Feature Pyramid Networks for Object Detection (Visual): https://arxiv.org/abs/1612.03144
Generative Agents: Interactive Simulacra of Human Behavior (Memory/LLM): https://arxiv.org/abs/2304.03442
SayCan (RL/LLM): https://say-can.github.io/
Option Learning (RL): https://arxiv.org/abs/2112.03097
Action Chunking (RL): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9274316/
ARC Challenge: https://lab42.global/arc/
Memorizing Transformers: https://arxiv.org/abs/2203.08913
~~~~~~~~~~~~~~~~~~~~~~~~
0:00 Introduction and Recap
10:15 Hierarchical Representational Space
16:14 Memory Representation and Generalisation
24:00 Hierarchy and why it is important
31:45 Jukebox (Audio) - Hierarchical Conditioning on earlier layers
48:10 Residual Vector Quantization (Audio) - Creating Hierarchical Representations
1:05:38 Feature Pyramid Network (Visual) - Bottom-Up Context Building and Top-Down Context Conditioning
1:22:56 Hierarchical JEPA (RL) - Goals and Subgoals
1:28:10 Hierarchical Planning (LLMs) - Using LLMs to generate broad and specific actions
1:35:11 Hierarchical RL (RL) - Finding chunked actions, finding bottleneck states, finding subgoals
1:50:22 Hierarchical Planning (LLMs) - ARC Challenge
1:53:08 Can a Transformer perform hierarchical generation?
2:03:01 Discussion
~~~~~~~~~~~~~~~~~~~~~~~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin