Can GPT4 solve the Abstraction and Reasoning Corpus (ARC) Challenge Zero-Shot?
I've been impressed with the ability of next-token prediction to learn complex structures. The Abstraction and Reasoning Corpus (ARC) Challenge was created by François Chollet in 2019, and there is now the ARC2 Challenge by Lab42 ( https://lab42.global/arcathon/ ), which tests an agent's ability to learn from very few examples and generalize to a new input. This challenge is difficult as there is an exponential amount of possible answers (10 possible values per square), and the grid size for output is not fixed and must be inferred.
Since I do not have access to the multimodal version of GPT4, I used the json representation of the input and outputs and give some background of the ARC problems to GPT4. I then ask it to generate a broad description to ground it in a category, a detailed description to get the algorithmic steps needed (note I did not ask it to generate a program as the program and its description may not match - the description is a better bet). Then, I ask it to verify its description with the input/output samples - this step is currently not done too well and could be better done with an external code generation and execution tool. Lastly, I ask it to generate the test set's output.
Generally, it works pretty well for some small grid problems. Large grid sizes are an issue due to context token length constraints. I believe with the right inductive bias grounding based on prompting, as well as some tools given to it to better visualize the objects in the grid, GPT4 may actually be able to solve most of the ARC challenges. Attention and pattern matching are really quite powerful.
~~~~~~~~~~~~
Latest thoughts on GPT4 on ARC: https://www.youtube.com/watch?v=plVRxP8hQHY
Previous (related) video on zero-shot classification: https://www.youtube.com/watch?v=C0Eug9XpcBo
Jupyter Notebook: https://github.com/tanchongmin/ARC-Challenge/blob/main/arc_challenge.ipynb
ARCathon: https://lab42.global/arcathon/
ARC Playground: https://arc-editor.lab42.global/playground
On The Measure of Intelligence: https://arxiv.org/abs/1911.01547
AlphaCode: https://arxiv.org/abs/2203.07814
~~~~~~~~~~~~
0:00 Background of ARC Challenge
0:55 GPT4 Generation Process on Public Eval Task 157 (66e6c45b.json) [Success]
5:26 Overlay Task: Public Eval Task 158 (66f2d22f.json) [Failed]
9:10 Row and Column Removal Task: Public Eval Task 162 (68b67ca3.json) [Success]
13:16 Background Swap Task: Public Eval Task 170 (6ea4a07e.json) [Failed]
17:58 Systems and Tools-Augmentation for GPT4
21:10 ARC Challenge vs Zero-Shot Classification
~~~~~~~~~~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin