On a personal weekend project, Arjo (https://www.linkedin.com/in/arjo-chakravarty) prototyped an indoor localization system that combines semantically annotated mall floorplans with vision-language models.
He built an annotation tool to mark corridors and shops on a floorplan and preprocessed each corridor point to record which shops would be visible at various orientations.
Using the Gemini API to detect shop names in user-captured photos, the system matches the detected shops against the precomputed visibility map to infer potential positions.
The proof-of-concept shows that even with imprecise maps and text-based prompting, VLMs can localize users indoors, pointing towards richer AR and sensor-fusion enhancements.
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.