Effective hazard recognition on construction sites relies heavily on workers’ visual attention; however, the presence of emerging systems such as unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) may divert this attention away from hazards. Despite the growing integration of robots into construction operations, there remains a lack of empirical evidence on how these systems reallocate workers’ gaze between hazards and robotic elements.
The present study employed a virtual reality (VR) environment integrated with eye-tracking to quantify the attentional trade-offs between hazards and robots. Twenty-eight participants completed hazard identification tasks in two controlled scenarios: one without robots (baseline) and one featuring UAVs and UGVs. Fixation count and duration metrics were analyzed using Wilcoxon signed-rank tests to assess attentional allocation differences.
Results revealed significant changes in visual attention patterns when robots were present. Participants exhibited shorter fixation durations on hazards but increased fixation counts, indicating more frequent but briefer glances. Notably, within the robot-present scenario, participants fixated on robots five times longer than on hazards, highlighting a pronounced attentional shift toward robotic elements in the environment.
These findings provide new insight into the cognitive implications of human–robot interaction in construction, demonstrating that robotic presence can unintentionally compromise hazard recognition through strong attentional capture. This work offers critical evidence to guide the design of attention-aware robotic deployment strategies, targeted worker training programs and safety protocols for future human–robot collaborative construction environments.
1. Introduction
The global construction sector accounts for approximately 13% of the world’s gross domestic product (GDP) and provides employment for more than 7% of the global workforce, and in the United States alone it contributes over $950 billion to the national GDP (Bilim, 2025; Xiao et al., 2022). Yet, despite its size and significance, the sector faces persistent challenges, most notably labor shortages, low productivity and alarmingly high safety risks (Associated General Contractors and Autodesk, 2022; Golparvar-Fard et al., 2015). Construction remains one of the most dangerous professions around the globe, accounting for nearly 20% of all workplace fatalities in the United States and a disproportionately high share of fatal accidents in many other regions, including Europe and Asia (Arslan et al., 2018; Martínez-Aires et al., 2024). In 2022, the industry experienced over 1,000 fatal injuries, with the fatality rate increasing to 13.0 deaths per 100,000 workers (Bureau of Labour Statistics, 2022). A substantial proportion of these incidents stems not from mechanical failures, but from lapses in workers' ability to recognize hazards on site, such as unprotected edges, improperly secured loads, moving equipment and unsafe electrical installations, which is an inherently visual and attention-driven process.
Given these challenges, the construction industry has increasingly explored the use of advanced technologies that aim not only to physically protect workers but also to support their cognitive readiness, their mental ability to remain vigilant, recognize hazards and make safe decisions in dynamic environments (Cheng et al., 2022). Among these, robotic systems, such as unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) are introduced to perform hazardous tasks like heavy lifting, inspection and material transport (Bock, 2015; Cheon et al., 2025). While these technologies offer substantial promise, their actual adoption across the industry remains limited, highlighting the need for updated research that addresses the factors constraining broader integration. In practice, several challenges have emerged that may help explain this gap. One concern involves the physical risks introduced by mobile robotic platforms, such as potential collisions or pinch-point injuries. In addition, recent observations point to human–robot interaction (HRI) challenges that may impact safety at a cognitive level. For instance, robots may induce psychological strain, including technostress or diminished trust, particularly when their behavior is perceived as unfamiliar, autonomous or unpredictable (Adami et al., 2023; Albeaino et al., 2023a, b). Most critically, robots may impose attentional costs, functioning as visual distractors that unintentionally draw workers’ gaze away from conventional, static hazards (Mendes et al., 2022). Due to their movement, sound and novelty, robots naturally attract visual attention through exogenous salience (Jonas Jørgensen, 2021), which may interfere with the continuous scanning and situational awareness required for effective hazard identification. Evidence further suggests that such lapses can be especially pronounced among novice or less-experienced workers, who often have less stable hazard recognition patterns than experienced workers (Nafe Assafi et al., 2025; Sabir et al., 2025).
Despite growing interest in HRI within construction environments, critical gaps remain in how workers visually allocate attention between hazards and robots during safety-critical tasks. While several studies have examined the impact of robots on overall hazard recognition performance and have suggested that robotic systems may act as cognitive distractors (Hasanzadeh et al., 2017; Ibrahim et al., 2025; Namian et al., 2018a; Okpala et al., 2023), the underlying visual mechanisms driving these effects are not well understood. Specifically, eye-tracking data has not been systematically used to disentangle how attention is divided between co-located hazards and robotic agents. Disruptions in this allocation, particularly when attention is drawn away from hazards, can lead to delayed or missed hazard detection, thereby increasing the likelihood of safety incidents (Jeelani et al., 2018).
To address these limitations, the present study investigates how the presence of robotic systems alters workers’ visual attention during hazard identification tasks in a virtual setting. While fixation count and fixation duration are widely used in eye-tracking research, no prior study has applied them to quantify the attentional trade-offs between robots and hazards in an immersive, experimentally controlled construction scenario. By capturing objective gaze behavior in virtual construction settings, this study advances foundational understanding of attentional dynamics in human–robot collaboration and offers practical implications for safer robotic integration on construction sites.
The remainder of this article is organized as follows. Section 2 reviews relevant literature on construction robotics, cognitive and safety implications of human–robot interaction, and eye-tracking for visual attention analysis. Section 3 describes the research methodology, including the virtual environment design, experimental procedure and data analysis. Section 4 presents the results, Section 5 discusses theoretical and practical implications, and the final section outlines conclusions, limitations and directions for future research.
2. Literature review
The construction industry is undergoing rapid transformation as robotics and automation reshape how projects are planned, executed, and managed (Rodrigues et al., 2023). Driven by stagnant productivity, high safety risks and labor shortages, robots are increasingly used both on-site and off-site, spanning wearable, remotely operated and fully automated systems (Okpala et al., 2023). Among these, remotely operated robots have garnered considerable attention due to their ability to enhance operational efficiency, improve safety performance and expand data collection capabilities (Gheisari and Esmaeili, 2019; Siratranont et al., 2025). The growing use of these systems is transforming how construction activities are conducted, particularly through their potential to remotely navigate dangerous zones and minimize direct human exposure (Jebelli et al., 2022). As the adoption of these robotic systems accelerates, understanding their cognitive and safety implications for workers in shared human–robot workspaces has become increasingly important (Okpala et al., 2023).
2.1 Cognitive and safety implications of HRI in construction
In construction, HRI encompasses interactions between workers and various robotic platforms within shared workspaces that shape safety, productivity and workflow organization (Rodrigues et al., 2023). Prior HRI literature has examined the behavioral and cognitive implications of robot integration on workers (Chang and Hasanzadeh, 2024; Firth et al., 2022; Zhang et al., 2021). For example, Namian et al. (2021) showed that introducing UAVs can distract workers and diminish overall safety performance (Namian et al., 2021), and other studies modeled how UAVs and UGVs deployment degrades hazard recognition performance (Cheon et al., 2025; Ibrahim et al., 2025). However, only a few studies have explored how these systems impose additional cognitive loads on construction workers (Chen et al., 2023; Hussain et al., 2024b).
According to Kahneman’s (1973) capacity model of attention, attention is finite pool of cognitive resources from which all information-processing activities must draw (Batson and Wilson, 2023). Wickens et al. (2007) multiple resource theory further refines this view by differentiating resource pools along the dimensions of modalities (e.g. visual vs auditory) and processing stage, predicting that task sharing the same channel will incur structural interference and elevated mental workload (Wickens et al., 2007, 2021). Endsley’s situational-awareness (SA) model then describes how these resource constraints directly undermine the SA levels by diverting attention away from emerging hazards (Endsley, 2017). Drawing on Kahneman’s capacity model, multiple resource theory and situation awareness frameworks, these perspectives collectively suggest that in hazard-prone settings, the introduction of additional elements such as robots imposes additional visuospatial demands that foster attentional tunneling on robotic motion and delay critical hazard recognition.
Despite growing recognition of these challenges, empirical studies examining how workers’ visual attention shifts between environmental cues and robotic agents under hazardous conditions remain scarce. Existing research tends to treat hazard recognition as an aggregate outcome, without capturing the moment-to-moment attentional processes that underlie safety performance (Behzad Esmaeili, 2017; Hasanzadeh et al., 2018a). Such transient shifts of attention, particularly when workers redirect their gaze from static hazards to dynamic robotic agents, have the potential to delay hazard detection and compromise SA in high-risk construction environments (Hussain et al., 2024b; Namian et al., 2018b). These concerns are particularly relevant for novice construction workers, who, despite sometimes achieving comparable or higher detection rates, tend to exhibit less-consistent attentional control and lower confidence in hazard identification than experienced workers (Nafe Assafi et al., 2025). Addressing this gap requires methods that can systematically quantify how visual attention is distributed between co-located hazards and robots in shared workspaces.
2.2 Eye tracking technologies for visual attention analysis
Eye-tracking has emerged as a robust methodological tool in construction safety research, enabling the measurement of workers’ visual attention with high temporal and spatial resolution (Sudhakaran et al., 2025). As a proxy for cognitive engagement, it allows researchers to track how workers acquire and process visual information while performing safety-relevant tasks (Hussain et al., 2024b). Unlike traditional observational methods or post hoc surveys, eye-tracking captures real-time gaze behavior, offering an unobtrusive yet precise means to evaluate SA, hazard recognition and attentional distribution (Carter and Luke, 2020; Hasanzadeh et al., 2017). Fundamental gaze metrics such as fixation count and fixation duration have been consistently validated as indicators of cognitive load and visual prioritization (Rayner, 2009). Fixation refers to the stationary periods during which the eyes absorb visual information; longer durations generally indicate deeper cognitive processing, while frequency reflects attentional salience or the need for repeated scanning (Rayner, 2009). Various cognitive attentional processes are linked to fixation duration, frequency and count, highlighting their importance in how humans process and attend to visual information. These measures are particularly meaningful in the context of construction, where workers must detect diverse and spatially distributed hazards under time and task pressure.
Several studies have leveraged VR-based eye-tracking technology to explore workers' attentional allocation abilities, examining how they distribute their attentional resources toward hazardous areas in construction scenarios (Kim et al., 2022, 2023). Additionally, research has probed the relationship between SA and visual attention under hazard conditions (Hasanzadeh et al., 2018a; Hussain et al., 2024b). Similarly, few studies have used advanced eye-tracking to analyze workers’ visual search patterns in immersive 360-degree construction scenarios (Lee et al., 2024; Ogunseiju et al., 2024). However, as shown in Table 1, existing eye-tracking studies have examined either hazard-focused Area of interest (AOIs) or robot-focused AOIs, typically treating each in isolation within the construction safety context. AOIs are predefined regions of the visual scene used to aggregate and analyze gaze behavior, yet no study has systematically analyzed the attention competition between hazards and robots AOIs within the same experimental framework to understand how robotic presence affects safety-critical hazard detection. A study by (Albeaino et al., 2023a) investigated the attentional effects of drone proximity on construction professionals’ performance using VR and real-time eye-tracking data. Although their findings confirmed that drones draw visual attention, their analysis focused solely on aggregate gaze time directed at the UAV and did not assess how this impacted hazard monitoring or differentiate between hazards and robotic elements. Consequently, the extent to which robotic presence compromises hazard recognition remains underexplored. To address this gap, the present study employs VR-based eye-tracking to systematically quantify how the presence of robotic agents alters moment-to-moment visual engagement with environmental hazards, providing the foundational evidence necessary for cognitively compatible human–robot workflows.
Summary of eye-tracking studies in construction and human–robot interaction: areas of interest analysis
| Study . | HRI context-eye tracking . | AOI’s . | ||
|---|---|---|---|---|
| . | . | Hazards . | Robots . | Others . |
| Hussain et al. (2024b) | N/A – HMD Tracking | ✓ | N/A | ✗ |
| Kim et al. (2023) | N/A – Vive Pro Eye Tracking | ✓ | N/A | ✗ |
| Hasanzadeh et al. (2017) | N/A – EyeLink II | ✓ | N/A | ✗ |
| Gervasi et al. (2025) | Robot Assembly – Tobii Pro Fusion and Glasses 3 | ✗ | ✓ | ✓ |
| Dai et al. (2023) | Robotic Arm – Multi Model Tracking Sensors | ✗ | ✓ | ✓ |
| Albeaino et al. (2023a) | Drones – Vive Pro Eye Tracking | ✗ | ✓ | ✓ |
| Albeaino et al. (2023b) | Drones – Vive Pro Eye Tracking | ✗ | ✓ | ✗ |
| Liang et al. (2024) | Robotic Arm – Eye Tracker Pupil Labs | ✓ | ✗ | ✓ |
| This Study | Misc Robots – Vive Pro Eye Tracking | ✓ | ✓ | ✗ |
| Study . | HRI context-eye tracking . | AOI’s . | ||
|---|---|---|---|---|
| . | . | Hazards . | Robots . | Others . |
| Hussain et al. (2024b) | N/A – HMD Tracking | ✓ | N/A | ✗ |
| Kim et al. (2023) | N/A – Vive Pro Eye Tracking | ✓ | N/A | ✗ |
| Hasanzadeh et al. (2017) | N/A – EyeLink II | ✓ | N/A | ✗ |
| Gervasi et al. (2025) | Robot Assembly – Tobii Pro Fusion and Glasses 3 | ✗ | ✓ | ✓ |
| Dai et al. (2023) | Robotic Arm – Multi Model Tracking Sensors | ✗ | ✓ | ✓ |
| Albeaino et al. (2023a) | Drones – Vive Pro Eye Tracking | ✗ | ✓ | ✓ |
| Albeaino et al. (2023b) | Drones – Vive Pro Eye Tracking | ✗ | ✓ | ✗ |
| Liang et al. (2024) | Robotic Arm – Eye Tracker Pupil Labs | ✓ | ✗ | ✓ |
| This Study | Misc Robots – Vive Pro Eye Tracking | ✓ | ✓ | ✗ |
3. Research methodology
This study employed a controlled VR user-centered experimental design to investigate the trade-off in visual attention between construction hazards and robots, measured through eye-tracking fixation metrics. The VR environment was developed in two sequential phases: Baseline scenarios (without robots) and HRI scenarios (with robots). Participants experienced both phases in a controlled setting while their visual attention on predefined AOIs was continuously tracked to capture potential cognitive trade-offs. Based on the limited-capacity attention and situation awareness theories, we formulated the following hypotheses regarding how robotic presence would alter workers’ visual attention during hazard identification tasks.
The introduction of robots would reduce visual attention (fixation count and fixation duration) allocated to hazards compared to baseline scenarios without robots.
Within the HRI scenario, a measurable trade-off would emerge, with visual attention divided between hazards and robots, demonstrating a shift toward robot-focused fixations.
These hypotheses are grounded in limited-capacity models of attention and situation awareness, which posit that introducing salient robotic agents into visually demanding environments reduces the resources available for hazard monitoring and reallocates gaze toward robots (Batson and Wilson, 2023; Endsley, 2017; Wickens et al., 2007). The overall methodological flow of the study, from virtual environment development to experimental implementation, data analysis and implications, is illustrated in Figure 1, which presents the research framework. The following sections detail participant recruitment, eye-tracking data collection and the analytical procedures employed to quantify the data.
3.1 Development of virtual environment
This study developed a virtual environment to simulate construction sites with realistic hazards, equipment and robotic systems, including UAVs and UGVs. The experimental setup consisted of two interconnected phases. Each phase included two scenes, designed to reflect critical hazards commonly responsible for injuries and fatalities on construction sites (Figure 2).
VR-based scenarios illustrating the absence (baseline) and presence (HRI condition) of robotic systems
VR-based scenarios illustrating the absence (baseline) and presence (HRI condition) of robotic systems
Baseline Scenarios (without robots)
Scene 1: Simulated work-at-height conditions, highlighting fall hazards, the leading cause of construction fatalities (Khan et al., 2023).
Scene 2: Simulated scenarios emphasizing struck-by hazards during lifting operations with a mobile crane and forklift, consistently ranked as the second leading cause of construction fatalities (CPWR, 2024).
HRI Scenarios (with robots)
Scene 1: Replicated the fall-from-height scenario from the Baseline phase but incorporated UAVs and UGVs, introducing robot-related visual stimuli and potential cognitive distractions.
Scene 2: Replicated the struck-by hazard scenario from the Baseline phase, augmented with UAVs and UGVs to simulate additional complexity and potential distraction in hazard identification.
To ensure a valid comparison, the hazard environment was carefully balanced across both conditions. Each scenario contained the same number of identifiable hazards and robots, with severity and intensity kept consistent as much as possible. This design ensured that observed performance differences could be attributed primarily to the presence of robots rather than variations in the number of hazards or risk level. The VR environment was developed in Unity with C# scripting, importing construction components from the Unity Asset Store and Blender, and animating workers, equipment, and sound effects (e.g. drones and machinery) to enhance realism. The system was deployed on a Dell® Precision 5,820 Tower workstation equipped with an Intel® Core™ i9-10900X processor (3.70 GHz, 20 logical cores), 128 GB RAM, and an NVIDIA® RTX A4500 GPU, paired with an HTC® Vive Pro headset for immersive interaction. Figure 2 illustrates representative scenes of each phase from the developed virtual environment.
AOIs were defined in unity as three-dimensional bounding regions enclosing each hazard object and each robot, implemented as tight rectangular volumes around hazards (Missing guard rails, suspended loads, etc.) and compact volumes around robotic bodies (UGVs, UAVs, robotic arm, etc.). To ensure the validity and realism of the VR environment, including size and placement of AOIs, five construction safety experts with advanced degrees in construction or occupational safety and health, publications related to worker safety or HRI, and at least five years of experience, evaluated the scenes. Their feedback led to adding more hazards (e.g. removed guardrails) and more realistic robot and equipment sounds and interactions, and to using teleportation instead of physical walking to reduce VR sickness and improve comfort. Custom C# scripts in Unity were developed to record fixation count and duration for each AOI, enabling precise tracking of visual attention to hazards and robots and supporting analysis of the trade-off in attention allocation.
3.2 Experimental setup and data collection
Participants were recruited from the College of Architecture and the College of Engineering at a University. Participants first completed a pre-screening form confirming: (1) no VR discomfort or motion sickness, (2) normal or corrected vision and (3) some construction-related experience. The study protocol was approved by the university's Institutional Review Board (IRB), and all participants provided informed consent prior to participation. Data confidentiality was maintained through anonymization and secure storage of records.
A priori power analysis was conducted using G*Power 3.1.9.7 to determine the required sample size for a within-subjects design comparing two experimental conditions. Assuming a medium effect size (d = 0.5), a significance level of α = 0.05 and a desired statistical power of 0.80, the analysis indicated that at least 23 participants were necessary to detect significant differences (Faul et al., 2007; Kang, 2021; Nenna et al., 2023). Ultimately, 32 participants were initially recruited, with 4 participants excluded due to incomplete data, resulting in a final sample of 28 participants that met the power requirements for evaluating differences in fixation count and fixation duration.
To ensure participants possessed a baseline understanding of construction hazards before engaging in the eye-tracking experiment, a structured pre-intervention protocol was implemented. Participants were first recruited and asked to complete a demographic survey capturing relevant background information. After reviewing the study protocol and signing an informed consent form, they viewed a 20-min OSHA safety training video designed to introduce common construction hazards. Following the video, participants completed a 10-question safety quiz to assess their understanding. Only those who achieved a score of 80% or higher were permitted to proceed to the experimental phase. Participants scoring below this threshold were required to revisit the video before retaking the quiz.
Participants were then introduced to the HTC® Vive Pro head-mounted display (HMD), which integrates a Tobii Pro® eye-tracking system, and trained with a sample VR scenario until they were comfortable with navigation and interaction (Figure 3). Before the experimental phase, participants were allowed to familiarize themselves with the VR system and environment. This period enabled them to practice using the controls, explore the virtual scenes and ask questions until they felt comfortable. Upon completing the familiarization session, participants engaged in a VR-based hazard identification task for up to 15 min in each experimental phase. They were randomly assigned the order of phase (Baseline or HRI) to control for potential order effects. In each phase, participants explored two construction scenes (Fall and Struck-by) and identified safety hazards, while fixation count and fixation duration were continuously recorded using the integrated eye-tracking system.
3.3 Data analysis
Eye-tracking data were processed to extract fixation-related features, which serve as key indicators of information acquisition and visual processing. The primary metrics analyzed included fixation count, the number of times participants fixated on an AOI and fixation duration, defined as the average total time spent fixating within that AOI (Albeaino et al., 2023b; Martinez-Marquez et al., 2021). Custom Python scripts were developed to automatically identify, extract and compute these metrics for each participant, with AOIs corresponding to safety hazards and robotic agents. Fixation count and duration were logged whenever the gaze remained within a given AOI for at least 100 milliseconds, based on the threshold commonly used in eye-tracking research (Bjørneseth et al., 2012). The resulting data were exported as CSV files for subsequent statistical analysis. Prior to inferential testing, the data were imported into IBM SPSS Statistics. Several assumptions were systematically checked to ensure appropriateness of the analytical approach. Since each participant experienced both conditions (Baseline and HRI), the data were inherently dependent, warranting within-subjects comparisons. The Shapiro–Wilk test indicated that fixation durations met the normality assumption, whereas fixation counts did not. To use a single, consistent analytical approach across both metrics and avoid mixing parametric and nonparametric tests within the same within-subject design, we therefore applied Wilcoxon signed-rank tests to all paired comparisons. Both fixation count and duration are ratio-level measures, suitable for parametric or nonparametric tests. Boxplots and standardized residuals were examined to ensure no undue influence from extreme values. Because the analyses focused on a small set of a priori hypotheses rather than exploratory post-hoc testing, we report exact p-values and effect sizes without applying additional family-wise corrections.
The choice of fixation count and fixation duration as primary metrics aligns with vigilance and automation research, where changes in gaze allocation and dwell time are widely used to operationalize attentional resource competition and monitoring performance (e.g. Batson and Wilson, 2023; Wickens et al., 2007). By implementing a within-subject design that varies only the presence of robots while holding the hazard environment constant, the study follows established approaches in automation and vigilance experiments that manipulate secondary task salience to quantify its impact on safety-critical monitoring tasks. To examine potential order effects, we conducted 2 (Baseline-first vs HRI-first) × 2 (Scenario: Baseline vs HRI) mixed ANOVAs for fixation duration and fixation count, which showed robust main effects of Scenario in both cases and no reversal of the Baseline–HRI differences across start orders.
4. Results
Table 2 presents the demographic characteristics of the sample. Participants were young adults aged 18–34 years, predominantly male (78.6%) and ethnically diverse. Educational attainment was high, with over 85% holding bachelor's or advanced degrees. Most participants had construction-related academic backgrounds (60.7%) or engineering disciplines (28.6%), though field experience was limited, with 57.1% reporting 0–1 year of construction experience. Notably, over 80% had received some form of construction safety training, primarily OSHA certifications.
Demographic information of participants
| Demographics . | Options . | Frequency . | Percent (%) . |
|---|---|---|---|
| Age | 18–24 | 13 | 46.43% |
| 25–34 | 15 | 53.57% | |
| Gender | Male | 22 | 78.57% |
| Female | 6 | 21.43% | |
| Race | Latino | 2 | 7.14% |
| Non-Hispanic | 26 | 92.86% | |
| Ethnicity | Asian | 19 | 67.86% |
| Black or African American | 1 | 3.57% | |
| White | 6 | 21.43% | |
| Middle Eastern | 2 | 7.14% | |
| Education | High school | 3 | 10.71% |
| Some college | 1 | 3.57% | |
| Bachelor's degree | 8 | 28.57% | |
| Graduate or professional degree | 16 | 57.14% | |
| Discipline or specialization | |||
| Construction related | 17 | 60.71% | |
| Engineering | 8 | 28.57% | |
| Construction experience | Others | 3 | 10.71% |
| 0–1 years | 16 | 57.14% | |
| Safety training | 1–5 years | 12 | 42.86% |
| OSHA10 | 13 | 46.43% | |
| OSHA30 | 4 | 14.29% | |
| Others | 6 | 21.43% | |
| None | 5 | 17.86% |
| Demographics . | Options . | Frequency . | Percent (%) . |
|---|---|---|---|
| Age | 18–24 | 13 | 46.43% |
| 25–34 | 15 | 53.57% | |
| Gender | Male | 22 | 78.57% |
| Female | 6 | 21.43% | |
| Race | Latino | 2 | 7.14% |
| Non-Hispanic | 26 | 92.86% | |
| Ethnicity | Asian | 19 | 67.86% |
| Black or African American | 1 | 3.57% | |
| White | 6 | 21.43% | |
| Middle Eastern | 2 | 7.14% | |
| Education | High school | 3 | 10.71% |
| Some college | 1 | 3.57% | |
| Bachelor's degree | 8 | 28.57% | |
| Graduate or professional degree | 16 | 57.14% | |
| Discipline or specialization | |||
| Construction related | 17 | 60.71% | |
| Engineering | 8 | 28.57% | |
| Construction experience | Others | 3 | 10.71% |
| 0–1 years | 16 | 57.14% | |
| Safety training | 1–5 years | 12 | 42.86% |
| OSHA10 | 13 | 46.43% | |
| OSHA30 | 4 | 14.29% | |
| Others | 6 | 21.43% | |
| None | 5 | 17.86% |
4.1 Comparison of attentional allocation toward hazards
To assess the effect of UAVs and UGVs on participants’ visual attention in terms of fixation count and fixation duration on hazard AOIs were analyzed under both conditions: Baseline and HRI. A Wilcoxon signed-rank test revealed a statistically significant difference in fixation durations on hazards between the two conditions (Z = - 4.08, p < 0.001, r = 0.77), indicating a large effect size. The results confirm that the presence of robots significantly reduced the average fixation duration on hazards, suggesting that participants allocated less sustained visual attention to hazards in the HRI scenario (Figure 4). Findings are consistent with H1, indicating that the introduction of robots reduced the depth of workers’ visual engagement with safety-critical hazards compared to the baseline condition without robots. In practical terms, workers spent less time processing each hazard when robots were present, pointing to more superficial scanning of hazardous areas and a higher likelihood of delayed or missed hazard recognition.
Similarly, a Wilcoxon signed-rank test was conducted to assess differences in fixation count on hazards between the two conditions. The analysis revealed a statistically significant difference in fixation counts in the HRI scenario (Z = 4.33, p < 0.001, r = 0.82), representing a large effect size (Figure 5). This suggests that robotic presence led to fewer discrete glances at hazards, potentially reflecting divided attention as participants monitored both hazards and robotic movements. The reductions in both fixation duration and fixation count show that robots not only capture workers’ gaze but also diminish how often and how thoroughly workers visually interrogate hazards, which can undermine consistent hazard monitoring on robot-equipped construction sites.
4.2 Attention allocation trade-off between hazards and robots
To quantify the attentional trade-off introduced by robotics, participants' fixations on hazards and robots were compared within the HRI scenario only. A Wilcoxon signed-rank test revealed a statistically significant difference in fixation duration between hazard AOIs and the robot (Z = −4.62, p < 0.001, r = 0.87) (Figure 6). This large effect-size indicates that participants spent substantially more time looking at robots than at hazards, suggesting that robots may act as strong visual attractors in safety-critical environments. These findings support H2 by demonstrating an attentional trade-off in the HRI scenario, where robots captured longer fixation durations than hazards and thereby drew limited visual resources away from safety-critical elements.
Average fixation duration on hazards and robots. Source: Authors’ own work
In contrast, the Wilcoxon test on fixation counts showed a statistically significant difference between hazards and robots (Z = 4.62, p < 0.001, r = 0.87) (Figure 7). Participants fixated more frequently on hazards than robots, yet their attention lingered longer on robots, which could compromise hazard recognition efficiency. This imbalance suggests that, in robot-rich environments, workers may preserve some level of SA of hazards but allocate their deeper cognitive processing to robots instead, which can compromise effective hazard recognition.
Average fixation count on hazards and robots. Source: Authors’ own work
5. Discussion
This study aimed to quantify how human visual attention shifts between construction hazards and robots during a VR-based hazard identification task. The results showed that the presence of robots significantly reduced both the frequency and duration of hazard fixations, indicating an overall decline in attention to safety hazards. Relative to the Baseline condition, mean hazard fixation duration dropped by 39.6% and mean fixation count declined by 41.1% during the HRI condition. This pattern appears to emerge from two underlying processes. First, robots’ dynamic motion served as potent bottom-up cues that automatically seized attentional resources, reducing the capacity for prolonged hazard monitoring. Second, the added cognitive load of concurrently tracking moving robots appears to have induced a more superficial scanning strategy rather than comprehensive hazard sweeps, as suggested by the parallel declines in mean and median metrics.
Within HRI scenarios, participants exhibited a clear redistribution of visual resources: fixation durations on robots were over five times longer than on hazards, while they glanced at hazards more frequently but for much shorter durations. This divergence, sustained robot fixations contrasted with fleeting hazard glances, suggests a check-back behavior: participants briefly scanned hazards to maintain SA but devoted disproportionate processing time to robots. Such fragmented scanning likely reflects divided attentional capacity under dual demands, illustrating how salient robotic stimuli can undermine the depth of hazard recognition even when hazards still attract initial glances.
Our findings align with construction safety research, which shows that hazard recognition performance correlates with fixation behavior as workers scan their environment for multiple hazards (Cheng et al., 2022; Liang et al., 2024). In VR-based hazard recognition studies (Luo et al., 2025), eye-tracking metrics successfully framed hazard detection and cognitive processing in immersive simulations. Studies on HRI and eye-tracking further elucidate these patterns. For example, Upasani et al. (2024) found that eye-tracking reliably reflected increased cognitive load during physical collaboration with robots, with longer fixations linked to interactions of higher difficulty (Upasani et al., 2024). Similarly, Liang et al. (2024) observed participants allocating visual attention toward collaborative robots during assembly tasks, consistent with our observation that robot presence draws gaze (Liang et al., 2024). Furthermore, emerging research on the interplay between top-down (goal-directed) and bottom-up (stimulus-driven) attention systems in VR hazard recognition (Zhang et al., 2021) emphasizes that salient stimuli can override task-oriented scanning goals (Uddin et al., 2020). In our scenario, the robot provided a bottom-up cue that diverted attention from the top-down task of hazard recognition, evidence of attentional competition in action. Yet, interestingly, Hasanzadeh et al. (2018b) argued that if workers concentrate too exclusively on hazards, they might lose broader SA in a dynamic environment (Hasanzadeh et al., 2018b). These results help bridge these viewpoints in that participants appeared to maintain SA by not fixating solely on one element; rather, they distributed gaze between the robot and hazards. However, this pattern was associated with a trade-off, as each hazard received only brief attention. These findings contribute to the literature by suggesting how a new type of distraction may quantitatively influence established eye-tracking metrics.
5.1 Theoretical contributions
From a theoretical standpoint, this study advances understanding of cognitive safety by revealing how automation influences workers' attention allocation in dynamic, hazard-rich environments. The shift in visual focus toward robots at the expense of hazard observation provides empirical support for limited-capacity attention models, exemplifying the resource competition described by multiple resource theory (Wu et al., 2022). In particular, the results underscore the interplay between bottom-up and top-down attention: the salient movements of robots (a bottom-up stimulus) can capture a disproportionate share of gaze, momentarily overriding the top-down goal of scanning for hazards.
This insight calls for refinement in attention allocation models for dynamic work settings, suggesting they account for novel stimuli like autonomous robots that can re-prioritize workers’ attentional focus. The findings carry particular significance for novice construction workers, who represent a critical demographic in our study and the broader industry. Research demonstrates that novice workers often exhibit different attentional strategies compared to experienced professionals: while they may be more sensitive to hazards and show higher hazard recognition rates than experienced workers (Zhou et al., 2024), they also tend to be less confident in hazard identification (Hussain et al., 2024a). In the context of human–robot interaction, novice workers may be especially vulnerable to attentional capture by robots due to several factors. First, their limited construction experience means they have not yet developed the internalized scanning patterns that experts use to efficiently assess environments. Second, the novelty effect of robotic systems may be more pronounced among novices, who lack prior exposure to automation on construction sites. However, this vulnerability also presents an opportunity. Unlike experienced workers who may have established potentially outdated scanning habits, novices can be trained from the outset to maintain hazard vigilance in the presence of robotic systems.
5.2 Practical implications
The attentional trade-offs identified in this research have important implications for the way robots are being deployed and operate on real jobsites. The findings suggest that mobile robots should be considered as potential distractions, schedulers and planning teams should consider routing strategies that limit the intensive robotic activity during high-demand tasks, including critical lifting operations or work at height. Similarly, these types of training may help workers practice hazard search patterns in the presence of robots, especially for novices who can get feedback on missed hazards when they focus more on robotic agents. Therefore, based on the findings, the attention-aware deployment strategies, including robot paths, speeds, signals and interface elements, could be adjusted to reduce unnecessary visual distraction in congested or high-risk areas and to make hazard-related cues more noticeable than noncritical robot movements. Moreover, construction HRI environments should incorporate measures to augment human SA, for instance, through shared-control systems or assistive warning mechanisms, so that the introduction of robots does not unintentionally compromise hazard detection and overall safety on-site. Recent work also shows that workers’ emotional responses to robots can shape human–robot team productivity, underscoring that cognitive and affective reactions to robotic systems jointly influence performance in construction HRI settings (Baek et al., 2024). The integration of these cognitive considerations into construction HRI marks a step toward more holistic frameworks that balance automation benefits with the imperative of maintaining human attention on critical safety hazards.
6. Conclusion
This VR-based study demonstrates that the introduction of robots into a construction scenario produces a clear shift in participants’ visual attention. In the HRI condition, participants showed markedly longer fixation durations on robots compared to hazards, indicating a pronounced shift in attentional engagement. Participants’ gazes were frequently drawn toward the moving robots, evidenced by higher fixation counts and longer fixation durations on the robotic agents. This pattern reveals an attentional trade-off: the robots acted as salient stimuli that captured visual attention, which may reduce vigilance toward safety-critical hazards and could have safety implications on site, although real-world effects remain to be confirmed beyond this controlled VR setting. By quantifying how robots divert human attention, the study provides new insight into the cognitive implications of HRI in construction, underscoring that the benefits of robotic assistance may come with unintended attentional costs. More broadly, the findings highlight that as construction environments become increasingly automated, understanding how humans allocate attention in the presence of robotic agents may be as critical as improving the technical capabilities of the robots themselves.
This study has several limitations that need to be acknowledged. First, the participant sample consisted of university students with a majority of 0–1-year construction experience, which may introduce sampling bias and limit the generalizability of the findings. Although this novice student sample limits generalization to experienced workers, it represents a critical at-risk group because new construction workers experience disproportionately high injury rates (Whisner, 2023). This sample may not reflect the visual search strategies and attentional patterns of experienced construction workers who routinely conduct on-site inspections and work alongside robots. However, focusing on novice participants provides useful baseline insight into early-stage attentional shifts in robot-present environments and offers reference points for comparing more experienced workers in subsequent research. Future studies should recruit participants from multiple organizations, regions and job roles to test the practical applicability of these results. Second, using the same dynamic VR scenes in both the Baseline and HRI phases may have introduced familiarity effects. Participants’ prior exposure to the environment without robots could have allowed them to anticipate hazard locations or scene dynamics during the robot-assisted trials, potentially influencing their fixation patterns. However, counterbalancing the order of hazard presentations and randomizing the order of interaction across participants helped mitigate systematic learning effects, which can preserve the internal validity of attentional comparisons. Additionally, because each VR session lasted 30 min, this study reflects participants’ initial visual responses, but does not capture longer-term effects such as habituation or attentional fatigue that may emerge in extended real-world operations (Hussain et al., 2024b; Kim et al., 2023). The VR environment captured some key features of a construction site; these findings should be interpreted with caution when applied to real jobsites, where demands and perceived risks are higher. The observed shift in gaze toward robots may reflect an underlying cognitive tendency. In such settings, the same salience-driven gaze shift toward robots may compete even more strongly with hazard monitoring or, alternatively, be dampened by stronger safety norms and experience. Accordingly, these findings underscore the need to prepare workers and systems for the cognitive demands that come from increased automation. Future research may examine whether attentional trade-offs persist among experienced professionals and under prolonged, varied conditions, to determine their relevance for real-world safety outcomes, using field-based eye-tracking and behavioral performance measures.
These findings also carry important practical implications. Safety training programs may benefit from incorporating robot-centric training scenarios, and project teams may design robot behaviors and interfaces to reduce unnecessary visual distraction. Such attention-aware approaches could support site managers in coordinating robotic operations more safely, guide training developers in designing targeted hazard-recognition exercises, and inform robotics integration specialists in optimizing robot motion and signaling strategies. As robotic technologies expand across construction environments worldwide, integrating these considerations into deployment and training practices may contribute to measurable improvements in worker safety outcomes.
Ethics statement
This study was reviewed and approved by the Texas A&M University Institutional Review Board (IRB ID: STUDY2023-0062). All participants were informed about the purpose of the study, and informed consent was obtained prior to participation. The research was conducted in accordance with relevant institutional and ethical guidelines. The authors used Perplexity (an AI tool) for grammar checking and language refinement of the manuscript.








