Skip to Main Content
Article navigation
Purpose

Indoor environmental quality (IEQ) influences occupants’ satisfaction, health, and performance, and is especially consequential in educational settings where it can affect well-being and cognitive outcomes. This study aims to evaluate whether a multimodal artificial intelligence approach, specifically a Multimodal Transformer (MulT), can estimate current IEQ conditions in real-world educational spaces more effectively than conventional approaches that rely primarily on single-modality physical measurements. The work targets real-time, holistic IEQ estimation that better reflects how multiple environmental cues co-occur in occupied rooms.

Design/methodology/approach

Data were collected in four educational-space scenarios: a faculty conference room, a hybrid laboratory with machinery, and two standard classrooms. Time-lapse RGB images and synchronized sensor measurements (air temperature, relative humidity, CO2, TVOCs, PM1, PM2.5, PM10, and occupancy rate) were recorded at 5-min intervals for 7–8 days per scenario. A MulT architecture was trained to fuse images and sensor streams and estimate IEQ-related variables in a single forward pass. The pipeline, model design, training regimen, and evaluation protocol were specified to support reproducibility.

Findings

Across 4,945 paired image–sensor samples, the proposed MulT model achieved approximately mean squared error (MSE) = 2.99 and mean absolute error (MAE) = 0.88 on a held-out test set. Test performance closely matched validation results, indicating robust generalization across the measured scenarios. The results show that multimodal fusion can accurately estimate concurrent IEQ factors under real operational conditions, supporting the feasibility of near real-time IEQ assessment in educational environments. The reported workflow and evaluation setup enable direct comparison in future studies and benchmarking across alternative architectures or sensing configurations.

Originality/value

This work contributes a reproducible, real-world demonstration of MulT modeling for concurrent, real-time estimation of IEQ factors in educational settings using synchronized visual and environmental sensing. Unlike conventional single-modality approaches, the method integrates room imagery with physical measurements to capture contextual cues that accompany IEQ variation. The approach is transferable to other indoor environments and can serve as a foundation for operational deployment, including alert and decision-support frameworks when combined with time-series forecasting and explicit performance thresholds. The study provides a structured baseline for multimodal IEQ research and practical monitoring systems.

Licensed re-use rights only
You do not currently have access to this content.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.
Pay-Per-View Access
$39.00
Rental

or Create an Account

Close Modal
Close Modal