JARVIS-1: Multi-modal (Text + Image) Memory + Decision Making with LLMs in MineCraft!

Subscribers:
5,330
Published on ● Video Link: https://www.youtube.com/watch?v=JUAec-dAt5c



Duration: 1:50:16
678 views
27


JARVIS-1 is the latest way of using LLMs to solve the MineCraft environment. It has surpassed the performance of Voyager, but is slightly behind the performance of Ghost in the MineCraft (GiTM). However, it is the first of its kind to use images and text in a truly multimodal way of decision making!

There is also a curriculum generator using self-instruction with memory as a guide, and it also incorporates environmental feedback.

It has the mechanisms in place for self-learning similar to Voyager, and I think it could be better if we encode and retrieve memory more efficiently, execute sub-goals in a sequential fashion, and do the training of the controller better.

~~~~~~~~~~~~~~~~~

Slides: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/JARVIS-1.pdf

JARVIS-1 Repo (Code coming soon): https://github.com/CraftJarvis/JARVIS-1
JARVIS-1 Paper: https://arxiv.org/abs/2311.05997

MineCLIP (embedding model): https://arxiv.org/abs/2206.08853

Past videos:
Voyager: https://www.youtube.com/watch?v=Y-pgbjTlYgk
Ghost in the MineCraft: https://www.youtube.com/watch?v=_VXOczXIkks

~~~~~~~~~~~~~~~~~~

0:00 Introduction + Demo
2:31 Overview
3:34 Voyager Recap
6:20 Ghost in the MineCraft
12:11 JARVIS-1
15:19 Learning, Fast and Slow
17:33 Unlocking Entire Technology Tree
18:55 Situation-aware Planning
27:41 JARVIS-1 and Memory
38:33 Observational Space
40:02 Processing Images
46:02 Sub-goal planning
56:32 Storing and retrieving the memory
1:03:20 Generating the memories
1:10:59 Self-check
1:15:00 Result Analysis
1:20:05 Discussion

~~~~~~~~~~~~~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin




Other Videos By John Tan Chong Min


2024-02-13Embeddings Walkthrough (Part 2): Context-Dependent Embeddings, Shifting Embedding Space
2024-02-06Embeddings Walkthrough (Part 1) - Bag of Words to word2vec to Transformer contextual embeddings
2024-01-29V* - Better than GPT-4V? Iterative Context Refining for Visual Question Answer!
2024-01-23AutoGen: A Multi-Agent Framework - Overview and Improvements
2024-01-09AppAgent: Using GPT-4V to Navigate a Smartphone!
2024-01-08Tutorial #13: StrictJSON, my first Python Package! - Get LLMs to output into a working JSON!
2023-12-20"Are you smarter than an LLM?" game speedrun
2023-12-08Is Gemini better than GPT4? Self-created benchmark - Fact Retrieval/Checking, Coding, Tool Use
2023-12-04Learning, Fast and Slow: 10 Years Plan - Memory Soup, Hier. Planning, Emotions, Knowledge Sharing
2023-12-01Tutorial #12: Use ChatGPT and off-the-shelf RAG on Terminal/Command Prompt/Shell - SymbolicAI
2023-11-20JARVIS-1: Multi-modal (Text + Image) Memory + Decision Making with LLMs in MineCraft!
2023-11-20Tutorial #11: Virtual Persona from Documents, Multi-Agent Chat, Text-to-Speech to hear your Personas
2023-11-14A Roadmap for AI: Past, Present and Future (Part 3) - Multi-Agent, Multiple Sampling and Filtering
2023-11-07Learning, Fast and Slow: My Landmark Idea for fast, adaptable agents (ICDL 2023 Best Paper Finalist)
2023-11-06A roadmap for AI: Past, Present and Future (Part 2): Fixed vs Flexible, Memory Soup vs Hierarchy
2023-11-03AI & Education: Education when AI tools are smarter than us - Discussion with Kuang Wen (Part 2)
2023-11-03AI & Education: RAG Question-Answer, Test Question Generator, Autograder by Kuang Wen! (Part 1)
2023-10-31A Roadmap for AI: Past, Present and Future (Part 1)
2023-10-28Tutorial #10: StrictJSON v2 (StrictText): Handle any output - quotation marks or backslash!
2023-10-24ChatDev: Can LLM Agents really replace a software company?
2023-10-17LLMs and Robotics: An Overview by Daniel Tan!