Arjo Chakravarty: Indoor Localisation with Visual Language Models (VLMs)

Subscribers:
6,310
Published on ● Video Link: https://www.youtube.com/watch?v=pxKmklYymVI



Duration: 0:00
233 views
10


On a personal weekend project, Arjo (https://www.linkedin.com/in/arjo-chakravarty) prototyped an indoor localization system that combines semantically annotated mall floorplans with vision-language models.

He built an annotation tool to mark corridors and shops on a floorplan and preprocessed each corridor point to record which shops would be visible at various orientations.

Using the Gemini API to detect shop names in user-captured photos, the system matches the detected shops against the precomputed visibility map to infer potential positions.

The proof-of-concept shows that even with imprecise maps and text-based prompting, VLMs can localize users indoors, pointing towards richer AR and sensor-fusion enhancements.

~~~

Blog Post: https://arjo129.github.io/blog/5-7-2025-From-Photos-To-Positions-Prototyping.html

GitHub: https://arjo129.github.io/vps_localization/editor/corridor_annotation.html

~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin




Other Videos By John Tan Chong Min


2025-09-08DINOv3: One backbone, multiple image/video tasks
2025-08-18R-Zero: Self-Evolving Reasoning LLM from Zero Data
2025-08-11Reasoning without Language (Part 2) - Deep Dive into 27 mil parameter Hierarchical Reasoning Model
2025-08-04Reasoning without Language - Deep Dive into 27 mil parameter Hierarchical Reasoning Model
2025-07-28No need for symbolic programs for Math? Natural language approach to IMO
2025-07-21How many instructions can LLMs follow at once?
2025-07-15Arjo Chakravarty: Indoor Localisation with Visual Language Models (VLMs)
2025-07-14MemOS: A Paradigm Shift to Memory as a First Class Citizen for LLMs
2025-07-07Multimodal Query for Images: Text/Image Multimodal Query with Negative Filter and Folder Selection
2025-06-30Universal Filter (Part 4 - Finale): Knowledge/Memory, Reflection, Communication between Individuals
2025-06-23Universal Filter (Part 3): Learning the Filters, Universal Database, Individual Knowledge Base
2025-06-16Universal Filter (Part 2): Time, Akashic Records, Individual Mind-based, Body-based memory
2025-06-04Good Vibes Only with Dylan Chia: Lyria (Music), Veo3 (Video), Gamma (Slides), GitHub Copilot (Code)
2025-03-10Memory Meets Psychology - Claude Plays Pokemon: How It works, How to improve it
2025-02-24Vibe Coding: How to use LLM prompts to code effectively!
2025-01-26PhD Thesis Overview (Part 2): LLMs for ARC-AGI, Task-Based Memory-Infused Learning, Plan for AgentJo
2025-01-20PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning
2024-12-04AgentJo CV Generator: Generate your CV by searching for your profile on the web!
2024-11-11Can LLMs be used in self-driving? CoMAL: Collaborative Multi-Agent LLM for Mixed Autonomy Traffic
2024-10-28From TaskGen to AgentJo: Creating My Life Dream of Fast Learning and Adaptable Agents
2024-10-21Tian Yu X John: Discussing Practical Gen AI Tips for Image Prompting