This study aims to integrate research on multimodal congruency with the stereotype-content model to offer a novel explanation of why and when consumers respond favorably to vision-sound congruency in online service settings.
A mixed methods approach included a field study (360° panoramic desktop-virtual tour of a winery) and a laboratory study (fully immersive virtual realtiy (VR) tour of a pub). The explanatory mechanism was tested through conditional process analyses, specifically, a custom-made serial mediation model where effects of cross-modal congruency were channeled through telepresence and warmth/competence with familiarity with the service provider included as a moderator. Category knowledge and involvement were included as controls. Study 2 additionally accounted for sensory olfactory and haptic information present in the consumer’s physical location.
Congruency between vision and sound positively influences consumer intention to visit the environment in person, to purchase online and to engage in positive word-of-mouth. These effects are channeled through enhanced feelings of telepresence as well as more favorable perceptions of service provider warmth. Congruency effects are robust in the presence of additional sensory input in the offline environment and across levels of involvement and knowledge but may depend on a consumer’s familiarity with the setting.
The study offers a novel process explanation for how cross-modal congruency in online service settings influences consumer intention. Examining two specific sensory modalities and two service settings presents limitations.
The findings help service providers to better understand how perceptions of warmth and competence transmit cross-modal congruency effects, resulting in more favorable responses.
To the best of the authors’ knowledge, this work is among the first to adopt a stereotype-content and multimodal congruency perspective on consumer response to online service settings.
