Connecting Senses
The Cross-Modal Associations Between Smell and Vision in Understanding Urban Environments
Chen, Q., Poorthuis, A., & Crooks, A. (2026). Connecting Senses: The Cross‐Modal Associations Between Smell and Vision in Understanding Urban Environments..Geographical Analysis, 58(3), e70046.
Smell is a crucial yet understudied sensory dimension in urban environments, bridging tangible elements (e.g., exhaust, flowers) with intangible impacts on emotions, social interactions, and well-being. While geographical and urban research increasingly acknowledges multisensory experiences, much of geospatial analysis still emphasized the visual dimension. This research advances spatial thinking by examining cross-modal associations between smell and vision in urban environments. Specifically, we utilize advanced image processing techniques to extract visual cues from street view imagery (SVI) (i.e., Mapillary) and apply causal analysis to examine their effects on smell expectations recorded from participants. The results show that visual cues can predict smells in straightforward urban settings (e.g., parks or less densely populated areas). However, in complex urban environments, the predictive power of visual cues diminishes as diverse and overlapping scents obscure specific smells, even in visually distinct areas. These findings underscore the importance of a multisensory approach in urban analytics, enhancing our understanding of the interplay between sensory experiences and informing urban design strategies that integrate multiple senses to create engaging and inclusive environments. This is especially important for individuals with sensory impairments, such as anosmia or visual impairments, who rely on other senses to compensate for their perception of urban environments
Introduction
Urban environment is full of smells, from the aroma of coffee from café to that of fatty food from fast-food restaurant to the exhaust of vehicles, to name but a few. Such smells evoke a vivid ‘mental imagery’ (Richardson, 2013) of what we experience in a world around us. These simple but profound olfactory experiences capture spatial and social nuances, seamlessly integrated into the fabric of our daily life. Such is how humans create a sense of place (Davidson & Milligan, 2004) through smell.
Smell serves critical functions in ensuring our orderly daily life, from stimulating appetite and enforcing social communication to navigating environments and detecting hazards (e.g., Boesveldt & Parma, 2021; Nuhn et al., 2024; Stevenson, 2010). Unlike other senses, smell also deeply ties to our emotions and memories, directly connecting to the amygdala and hippocampus in our brain, which work synergistically to process emotional responses and memory retrieval (Yang & Wang, 2017; Zald & Pardo, 1997). In this context, different smells, varying in their physical and chemical properties, create dynamic and potent olfactory environments that affect individuals differently (Bratman et al., 2024). For example, the smells of lavender and citrus can induce joy and relaxation, while food aromas can trigger nostalgia or foster a sense of community and belonging, attaching personal narrative to the city and its places (e.g., Herz & Engen, 1996; Porcherot et al., 2010). Conversely, smells from traffic, industry or waste can cause discomfort and potentially lead to health risks like headaches or respiratory issues, especially for those with conditions like asthma (e.g., Piccardo et al., 2022; Zhou, 2023). Taken together, smells shape how we perceive and engage with the built environment and influence our urban experiences and well-being (Bratman et al., 2024; Xiao et al., 2020).
Despite its significance, research into smell in geographical and urban studies is still in its infancy (Wankhede et al., 2023). The olfactory stimuli only really began to enter the geographic discourse in the 1980s and 1990s, this is in contrast to the predominant way of representing and categorizing the urban environment from the visual perspective (Dodt et al., 2017). Such visual dominance approach often overlooks the rich, multisensory experiences that define urban life, specifically that of smell (Henshaw, 2013; Xiao et al., 2022). The concept of ‘smellscapes’, coined by Porteous (1985), has evolved from understanding urban places through smell to emphasize the centrality of human experiences. It now refers to ‘the smell environment perceived and understood by a person through olfactory sensation, influenced by one’s memories and past experiences, in a place specific to its context’ (Xiao et al., 2018). This evolution represents a transformative effort to incorporate olfactory experiences within the multisensory research (Spence, 2020), making a broader shift in geographical analysis that extends beyond traditional visual assessment to embrace the feeling of our urban landscapes (Böhme, 2013; Dowling et al., 2018), aligning with calls for ‘more-than-representational’ geography (Thrift, 2008) that emphasizes individualized ‘somatic reaction’ to understanding our built environment (Lorimer, 2005).
The long-recognized visual dominance in geographical and urban studies and the emerging shift toward multisensory experiences poses a core question: Can cross-modal association, that is, interactions where sensory perceptions in one modality are triggered not by their corresponding stimuli but through another sensory modality, be used to understand and interpret urban smellscapes? This inquiry does not come out of thin air; it is grounded in the fact that our senses are not isolated channels but are intertwined, collectively shaping our perceptual experiences (Spence, 2020b). For example, Gottfried et al. (2004) provided neuroscientific evidence that pairing objects with specific smells during memory formation enables later visual cues of these objects to activate smell-related brain areas, illustrating the connection between visual and olfactory memories. Similarly, Flavián et al. (2021) demonstrated that introducing congruent scents in virtual reality environments not only enhances the sensory stimulation provided by visual and auditory cues but also amplifies the mental imagery process, influencing user behavior in digital experiences.
Just as smells can trigger visual or auditory imagery, visual and auditory cues can reciprocally activate olfactory memories, giving rise to what is termed ‘olfactory imagery’ (Young, 2020). This phenomenon involves the mental recreation of smells without direct external smells stimuli, fundamentally relying on an individual’s memories and/or experiences. In a direct application to urban smellscapes, Lindborg and Liew (2021) conducted a smell walk where onsite participants described their olfactory experiences, and subsequently, video and audio recordings of the same smell walk were presented to online participants to construct an imaged smellscape. The comparison between the real and imaged smellscapes revealed that visual and/or auditory information could evoke smells, indicating the potential for cross-modal associations between sensory inputs. Although Lindborg and Liew’s work (2021) provided a valuable prototype for studying smellscapes through a cross-modal lens, scaling such a method to a city-wide level presents challenges as smell walks are often labor-intensive, time-consuming and capture only coarse spatio-temporal snapshot of the broader smellscapes, with sensitivity to group dynamics and environmental variability (Parker et al., 2024).
Recently, the use of a cross-modal lens has attracted attention from researchers investigating urban soundscapes. For example, Zhao et al. (2023) explored the potential of using street view imagery (SVI) to assess soundscapes across two cities. By analyzing visual features from images and correlating them with soundscape indicators derived from perceptual surveys, where participants rated aspects like sound intensity and quality, the study showed the ability of using images to predict sound environments. Inspired by these cross-modal studies, this paper extends this body of literature but with a novel application to smellsceps. Specfically, to what extent can visual data from SVI be used as a proxy for capturing large-scale urban smell perceptions without direct smelling? This question serves as a starting point to uncover the potential of cross-modal associations between smell and vision for city-wide smellscape study, aiming to understand how people perceive smells based on what they see.
To illustrate this potential, we utilized a large-scale dataset that consist of approximately 333K street view images collected from Mapillary (2025) in New York City (NYC) between 2014-2019. Crucially for the analysis at hand, such a dataset allows for the extraction of visual features of the urban environment across the city. In order to identify smells based on the images, first we need to identify patterns participants use to evoke smell-related memory, experience and/or imagination to construct smell perceptions based on scenes depicted in the images. To do so, we designed an online image-smell labeling survey, presenting participants with images and asking them to label perceived smells. The collected image-smell labels serve as the ground truth for subsequent large-scale automatic image-smell labeling through advanced image processing. We then applied causal analysis to uncover potential connection between visual cues and smell perception.
Direct Semantic Associations
Figure 2 presents the important features identified for four smells categories: ‘Nature’, ‘Food’, ‘Transportation & Fuel’ and ‘Industry’. These categories are generally more straightforward to deduce from visual cues. To illustrate how these features influence smell prediction on a more granular level, examples are displayed on the right side of Figure 2. Here, features contributing positively to the prediction are marked in orange while those with a negative impact are shown in purple. The values alongside each feature on the y-axis represent thresholds at which the features impact the prediction outcome for a smell either positively or negatively. Figure 3 shows the spatial distribution of images labeled with different smells.
These examples demonstrate that visual perceptions connecting to smells, both from human and computer perspectives, tend to use external characterizations or reference the sources of smells derived from visual inputs. This alignment is likely due to the ecological default mode of human perception, whereby sensations are interpreted in terms of their causes, reflecting events and actions in the environment (Herz, 2016). Similarly, we would argue that the deep learning algorithms have adopted this pattern during training process, recognizing elements within images to deduce dominant smells. As such, an automatic evaluation of smell sources upon their identification becomes feasible. This process is particularly evident in more objective urban spaces where the perception of smells can be more consistently agreed upon, such as open green spaces (e.g., parks), third-place spaces (e.g., restaurants), movement spaces (e.g., highways), and service spaces (e.g., factories), where they can evoke consistent smell perceptions related to nature, food, transportation, and industry respectively through right visual cues.
Indirect Episodic Associations
Unlike smells that have a more straightforward connection with visual markers, categories, such as ‘Waste’ and ‘Smoke’ smells, reveal more complexity in deriving smells from visual cues alone. Specifically, the ‘Waste’ category includes smells from solid and municipal wastes, and organic and bodily wastes. Although visual cues, such as trash cans, dumpsters and garbage, are noted by participants when perceiving this type of smell, especially when these features are prominent in an image, people most often rely on their personal experiences, memories or their impressions and familiarity with a location to infer waste-related smells with comments like “The dog on a walk makes me think how many times I’ve walked past dog waste”, “If this is the Hudson River, it may smell of waste due to people dumping waste into water”, “The dirty sides”, “Looks run down” (see Fig. 3e). Such insights illustrate the subjective nature of smell perception influenced by personal context.
This pattern is also observed in the ‘Smoke’ category, which refers to smells from natural burns and combustion (e.g., fumes, fireplace) or substance-related smoke (e.g., tobacco, cigars). Notes such as “Crowded people, some of which bound to be smoking”; “Someone is always smoking a cigarette outside of the theater”; “I lived in the city and these side streets smell straight like smoke”; “Parking lot where a lot of people would smoke” reveal that different people have different visual focuses on images when locating objects in relation to this type of smells. This variability in perception is also captured in our analysis, where a broad range of features contribute to these smell categories, indicting a diffuse impact across the entire context of the environment (see Fig. 4a & b). Therefore, it becomes challenging to reach a consensus on using specific visual cues to directly link to smell sources, but rather, they are often interpreted against the backdrop of social norms and experiences from people themselves. Our analysis also reveals that the predictive power of visual cues diminished for ‘Waste’ and ‘Smoke’ smells, with only around 1% of images classified under these categories. In this sense, olfactory perception is not only about detecting smells and/or smell sources but also about interpreting them within a context of past experiences and emotional resonance, which are capabilities that current deep learning algorithms lack.
Figure 4. Identified important features for smell categories that are relatively indirect to be inferred from visual cues. Left: Identified important features; Right: Examples of feature contribution in individual images.