Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan - Paper Explained)

Subscribers:
284,000
Published on ● Video Link: https://www.youtube.com/watch?v=Ru23eWAQ6_E



Duration: 28:47
11,876 views
372


#saycan #robots #ai

Large Language Models are excellent at generating plausible plans in response to real-world problems, but without interacting with the environment, they have no abilities to estimate which of these plans are feasible or appropriate. SayCan combines the semantic capabilities of language models with a bank of low-level skills, which are available to the agent as individual policies to execute. SayCan automatically finds the best policy to execute by considering a trade-off between the policy's ability to progress towards the goal, given by the language model, and the policy's probability of executing successfully, given by the respective value function. The result is a system that can generate and execute long-horizon action sequences in the real world to fulfil complex tasks.

Sponsor: Zeta Alpha
https://zeta-alpha.com
Use code YANNIC for 20% off!

OUTLINE:
0:00 - Introduction & Overview
3:20 - Sponsor: Zeta Alpha
5:00 - Using language models for action planning
8:00 - Combining LLMs with learned atomic skills
16:50 - The full SayCan system
20:30 - Experimental setup and data collection
21:25 - Some weaknesses & strengths of the system
27:00 - Experimental results

Paper: https://arxiv.org/abs/2204.01691
Website: https://say-can.github.io/

Abstract:
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at this https URL

Authors: Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n




Other Videos By Yannic Kilcher


2022-07-02ARC Challenge Live Coding
2022-06-26Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos (Paper Explained)
2022-06-23Parti - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Paper Explained)
2022-06-15Did Google's LaMDA chatbot just become sentient?
2022-06-03GPT-4chan: This is the worst AI ever
2022-06-01Did I crash the NFT market?
2022-05-13[ML News] DeepMind's Flamingo Image-Text model | Locked-Image Tuning | Jurassic X & MRKL
2022-05-10[ML News] Meta's OPT 175B language model | DALL-E Mega is training | TorToiSe TTS fakes my voice
2022-05-05This A.I. creates infinite NFTs
2022-05-02Author Interview: SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
2022-04-30Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan - Paper Explained)
2022-04-26Author Interview - ACCEL: Evolving Curricula with Regret-Based Environment Design
2022-04-25ACCEL: Evolving Curricula with Regret-Based Environment Design (Paper Review)
2022-04-22LAION-5B: 5 billion image-text-pairs dataset (with the authors)
2022-04-21Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)
2022-04-17Author Interview - Transformer Memory as a Differentiable Search Index
2022-04-16Transformer Memory as a Differentiable Search Index (Machine Learning Research Paper Explained)
2022-04-10[ML News] Google's 540B PaLM Language Model & OpenAI's DALL-E 2 Text-to-Image Revolution
2022-04-06DALL-E 2 by OpenAI is out! Live Reaction
2022-04-04The Weird and Wonderful World of AI Art (w/ Author Jack Morris)
2022-04-02Author Interview - Improving Intrinsic Exploration with Language Abstractions