Monday, June 30, 2025
CUPUM 2025
At CUPUM 2025 in London, our work on using multimodal large language models for urban information extraction was presented as part of the conference program. The paper asks how street view imagery and MLLMs can be used together to infer building attributes such as height, age, and function.
This project is part of a broader research direction on making urban analytics more accessible. Instead of treating street-level images as data that can only be analyzed through highly specialized computer vision workflows, we explore how foundation models can support structured, interpretable extraction tasks while still requiring careful validation against authoritative data.
The conference version of this work later developed into the book chapter “Evaluating the Feasibility of ChatGPT for Mapping Building Attributes” in Geography According to Foundation Models.
Links: Project page · Publication page · Talks page