Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

Channel:

Google TechTalks

Subscribers:

349,000

Published on December 10, 2024 5:52:24 PM ● Video Link: https://www.youtube.com/watch?v=HQgjLWp4Lo8

Duration: 0:00

1,015 views

A Google TechTalk, presented by Meera Hahn, 2024-12-05
ABSTRACT: User prompts for generative AI models are often underspecified or open-ended, which may lead to suboptimal responses. This prompt underspecification problem is particularly evident in text-to-image (T2I) generation, where users commonly struggle to articulate their precise intent. This disconnect between the user's vision and the model's interpretation often forces users to painstakingly and repeatedly refine their prompts. To address this, we propose a design for proactive T2I agents equipped with an interface to actively ask clarification questions when uncertain, and present their understanding of user intent as an interpretable belief graph that a user can edit. We build simple prototypes for such agents and verify their effectiveness through both human studies and automated evaluation. We observed that at least 90% of human subjects found these agents and their belief graphs helpful for their T2I workflow. Moreover, we use a scalable automated evaluation approach using two agents, one with a ground truth image and the other tries to ask as few questions as possible to align with the ground truth.

Speaker Bio:
Meera Hahn is a Research Scientist at Google Deepmind, working predominantly at the intersection of computer vision and natural language processing. She joined Google in 2022 after completing her PhD at Georgia Tech. Her research interests include embodied AI, text based navigation and localization, text to image and video generation, and general multimodal AI tasks. To learn more about her research visit her homepage at https://meerahahn.github.io/

Other Videos By Google TechTalks

2025-04-15	Online Learning and Economics
2025-04-14	Go Meetup April 2025 - i18n Go Experiment
2025-04-14	Go Meetup April 2025 - Whats New in Go 1.24?
2025-04-14	Go Meetup April 2025 - Git Bisect and Go Size Analyzer
2025-04-14	Go Meetup April 2025 - Photobooth
2025-04-14	Go Meetup April 2025 - Go Protobuf
2025-02-24	Understanding LLMs Like Physicists: Observation, Hypothesis, Experimentation, and Prediction
2025-02-10	Theoretical Limitations of Multi layer Transformers
2025-01-28	Hash Functions: Bridging the Gap from Theory to Practice
2025-01-14	LLM Dataset Inference: Did you train on my dataset?
2024-12-10	Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
2024-11-22	AI Snake Oil
2024-08-15	How I Wrote 10K Lines of Go in a Weekend
2024-08-15	Supply Chain Security with Go
2024-07-30	A Multi Dimensional Online Contention Resolution Scheme
2024-07-09	Robust Distortion-free Watermarks for Language Models
2024-07-02	Is it possible to make self-adjusting data structures concurrent?
2024-06-21	Privacy Preserving ML with Fully Homomorphic Encryption
2024-06-21	The Chinese Computer: A Global History of the Information Age
2024-06-14	KAN: Kolmogorov-Arnold Networks
2024-05-27	Learning through Transient Matching in Congested Markets

Channel	Latest
Lochinho	6 hours ago
GameTrailers	6 hours ago
CasimochoTV	6 hours ago
Nuxanor	6 hours ago
Sombra oscura	6 hours ago
Nicolly Gamer	6 hours ago
Denzel Crocker	6 hours ago
Raffa Fustagno	7 hours ago
MAKE	7 hours ago
老倉育	7 hours ago
Athos	7 hours ago
PUBG MOBILE Brasil	7 hours ago
YBMJETT	7 hours ago
Flinter	7 hours ago
Vandiril	7 hours ago
xLucas	7 hours ago
DrJamari	8 hours ago
PartyZams / Brian T Cox	8 hours ago
TheDiN23	8 hours ago
DarknessOfNoor	8 hours ago
SNEVITY	8 hours ago
wolfshi gaming	8 hours ago
Warner Play	8 hours ago
Lady Nemhesis	8 hours ago
FROZEN 2.0	8 hours ago