Video marketing, such as YouTube, has gained enormous attention in hospitality and tourism. This study aims to develop a customized image classification model based on a deep learning algorithm and investigate the relationship between visual spatial elements and electronic word-of-mouth (eWOM) in YouTube videos, drawing on cue utilization theory.
Participants identified 31 spaces in 15 hotel YouTube videos. To develop a deep learning-based image classification model, 303,716 images were used and the trained model was applied to 5,040 YouTube videos. The frequency and duration of 31 spaces were measured, and panel regression was used to examine the associations between the spaces and eWOM.
This study identified a considerable number of statistically significant relationships between specific spatial cues and eWOM with either a positive or a negative direction. Hotel-related intrinsic cues, such as lobby, and nature-related spatial cues, including mountains and vineyards, enhance eWOM. However, generic urban spatial (e.g. street) and less diagnostic intrinsic cues, such as bedrooms, reduce eWOM.
This study applies cue utilization theory to analyze visual elements in YouTube videos by highlighting the diagnostic and non-diagnostic roles of spatial cues in relating eWOM. Moreover, this research contributes to hospitality literature by conceptualizing spatial cues in videos and empirically investigating the influence of visual elements on eWOM. Methodologically, how to apply a deep learning-based image classification model in analyzing videos is proposed.
