Jiafei Duan: Uncovering the 'Right' Representations for Multimodal LLMs for Robotics

Channel:

John Tan Chong Min

Subscribers:

5,450

Published on October 8, 2024 7:07:25 AM ● Video Link: https://www.youtube.com/watch?v=koxyRNNDpzk

Duration: 0:00

291 views

Speaker Profile:
Jiafei Duan is a third-year PhD student in robotics at the University of Washington’s Paul G. Allen School of Computer Science & Engineering, where he is part of the Robotics and State Estimation Lab, co-advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on robot learning, embodied AI, foundation models, and computer vision. He is currently funded by the National Science Foundation (NSF) Graduate Research Fellowship. Previously, he was with NVIDIA Research and ASTAR Research.
http://www.duanjiafei.com/

Featured Papers:
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation: https://arxiv.org/abs/2410.00371
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics: https://arxiv.org/abs/2406.10721
Octopi: Object Property Reasoning with Large Tactile-Language Models: https://arxiv.org/abs/2405.02794

Abstract:
Recent advancements have shown the potential of multi-modal large language models (MLLMs) and large language models (LLMs) to automate several high-level tasks in robotics, such as task planning, reward function generation, action primitive code generation, and success verification. However, key questions remain: Are existing open-source and proprietary MLLMs/LLMs adequate for robotics, or is there a need for domain-specific models? What constitutes an optimal representation for robotics-focused MLLMs? Moreover, can we develop a unified MLLM fine-tuned specifically for robotics applications? In this talk, I aim to explore and address some of these questions through our recent efforts in instruction-tuning MLLMs for robotics.

~~~

0:00 Introduction
1:11 Background of Foundation Models
8:42 AHA: VLM for Reasoning over Failures
24:17 RoboPoint: VLM for Spatial Affordance Prediction (“Pointing”)
32:44 Octopi: Object Property Reasoning with Tactile-Language Models
40:18 Discussion
1:28:15 Conclusion

~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2025-06-16	Universal Filter (Part 2): Time, Akashic Records, Individual Mind-based, Body-based memory
2025-06-04	Good Vibes Only with Dylan Chia: Lyria (Music), Veo3 (Video), Gamma (Slides), GitHub Copilot (Code)
2025-03-10	Memory Meets Psychology - Claude Plays Pokemon: How It works, How to improve it
2025-02-24	Vibe Coding: How to use LLM prompts to code effectively!
2025-01-26	PhD Thesis Overview (Part 2): LLMs for ARC-AGI, Task-Based Memory-Infused Learning, Plan for AgentJo
2025-01-20	PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning
2024-12-04	AgentJo CV Generator: Generate your CV by searching for your profile on the web!
2024-11-11	Can LLMs be used in self-driving? CoMAL: Collaborative Multi-Agent LLM for Mixed Autonomy Traffic
2024-10-28	From TaskGen to AgentJo: Creating My Life Dream of Fast Learning and Adaptable Agents
2024-10-21	Tian Yu X John: Discussing Practical Gen AI Tips for Image Prompting
2024-10-08	Jiafei Duan: Uncovering the 'Right' Representations for Multimodal LLMs for Robotics
2024-09-27	TaskGen Tutorial 6: Conversation Wrapper
2024-09-26	TaskGen Tutorial 5: External Functions & CodeGen
2024-09-24	TaskGen Tutorial 4: Hierarchical Agents
2024-09-23	TaskGen Tutorial 3: Memory
2024-09-19	TaskGen Tutorial 2: Shared Variables and Global Context
2024-09-16	Beyond Strawberry: gpt-o1 - Is LLM alone sufficient for reasoning?
2024-09-11	TaskGen Tutorial 1: Agents and Equipped Functions
2024-09-11	TaskGen Tutorial 0: StrictJSON
2024-09-10	LLM-Modulo: Using Critics and Verifiers to Improve Grounding of a Plan - Explanation + Improvements
2024-09-06	TaskGen: Co-create the best open-sourced LLM Agentic Framework together!

Channel	Latest
SýrYakari	6 hours ago
Poder360	6 hours ago
Game channel MAZAVS	6 hours ago
Meot	6 hours ago
(TNP)NevrheardOfU	6 hours ago
RCD Espanyol de Barcelona	6 hours ago
ミネイ	7 hours ago
AZ三日月	7 hours ago
TWOoff	7 hours ago
RaxoR	7 hours ago
Gbs Playz Gacha	7 hours ago
XXZ GAMEPLAY	7 hours ago
TAC12	7 hours ago
iToJu	7 hours ago
Brasil de Fato	7 hours ago
rAiiPXH	7 hours ago
Hannibal07051987	7 hours ago
TcotC_boUntY	7 hours ago
PUBG MOBILE Pakistan Official	7 hours ago
Landi - Brawl Stars	7 hours ago
NEIHFAKA RIL BAWM	7 hours ago
Buzlaitir	7 hours ago
Jesse Rachael	7 hours ago
Nekrews 51	7 hours ago
Stuck Smilin'	7 hours ago