Signal processing and machine learning approaches for drinking behavior detection using PPG waveforms

Lin, Chun-Ling; Chao, Shu-Hung; Lin, Chen-Chun; Wang, Chien-Jen; Shen, Ming-Chih

doi:10.1108/ATSIP-10-2025-0101

Adequate hydration is essential for cardiovascular stability, thermoregulation and cognitive performance, yet current assessment methods are invasive or laboratory-dependent, limiting their use for continuous monitoring. This study proposes a noninvasive approach for classifying drinking behavior using photoplethysmography (PPG)-derived waveform features. PPG signals from 155 participants were collected under standardized conditions and grouped by self-reported fluid intake (2–4, 4–6 and >6 cups). Fiducial-point-based timing intervals, amplitude ratios and morphological descriptors were extracted to quantify cardiovascular changes. To address class imbalance, up-sampling, synthetic minority over-sampling technique (SMOTE), generative adversarial network-based feature synthesis with physiological consistency checks and image-based augmentation were applied. Model performance, evaluated using an 80:20 stratified hold-out repeated across ten runs, achieved high accuracy with up-sampling and image-based augmentation (97% and 91%; AUC ≈ 1.00 and 0.99), while SMOTE performed comparably (AUC 0.95). Kruskal–Wallis analysis confirmed significant hydration-related differences in time-domain (e.g. systolic-to-diastolic ratio) and amplitude-domain indices (e.g. pulse amplitude index), with small-to-moderate effect sizes. These results demonstrate that PPG features can capture hydration-related cardiovascular variability and support behavioral classification. The framework offers a foundation for wearable, real-time hydration monitoring and future integration with objective intake measures and multi-site PPG acquisition.

1. Introduction

Adequate hydration is essential for cardiovascular regulation, thermoregulation and cognitive function, with even mild dehydration (1–2% body mass loss) impairing attention and endurance (Rahman et al., 2023; Kumar et al., 2024). Continuous monitoring is particularly important for older adults, children and individuals in hot or demanding environments (Shorten and Khoshgoftaar, 2019; Intakes SCotSEoDR, Electrolytes PoDRIf, Water, 2005; EFSA Panel on Dietetic Products N, Allergies, 2010). However, conventional methods – such as plasma osmolality, urine-specific gravity and bioelectrical impedance – are invasive or lab-dependent, limiting real-world use (Lin et al., 2023; Charlton et al., 2019; Gupta et al., 2022; Chen et al., 2023; Liu et al., 2024).

Recent advances in wearable sensing position photoplethysmography (PPG), widely used in smartwatches, as a noninvasive alternative. Beyond heart rate and oxygen saturation, PPG waveform morphology reflects vascular tone, arterial stiffness and blood volume changes (Hwang et al., 2021; Kiyasseh et al., 2020; Li et al., 2022; Lopez-Paz and Oquab, 2016; Sagi and Rokach, 2018), all influenced by hydration status (Sengur et al., 2018; Herbrich, 2001; Huang et al., 2014). Unlike accelerometers that detect only gross motion (Mucherino et al., 2009), PPG enables simultaneous detection of drinking behavior and hemodynamic responses, providing richer physiological context.

Nevertheless, existing PPG hydration studies primarily focus on biochemical estimation (Şengür and Turhan, 2018) or cardiovascular monitoring in unrelated settings (Awad and Khanna, 2015; Zeebaree et al., 2019; Ting et al., 2011), often under controlled conditions rather than real-world drinking behavior. To address this gap, we propose a behavioral signal processing framework for classifying drinking behavior from PPG waveforms. Our approach integrates fiducial-point timing, amplitude ratios and morphology-based features with multimodal data augmentation techniques – including SMOTE, GAN-based synthesis and image transformations – to mitigate class imbalance while preserving physiological plausibility (Leung, 2007; Rogova, 2008; Kotsovsky and Batyuk, 2022; Safavian and Landgrebe, 1991; Şengür and Karabatak, 2018; McKight and Najab, 2010).

By benchmarking multiple classifiers with automated hyperparameter optimization, this study evaluates the feasibility and challenges of PPG-based hydration monitoring, aiming to lay the groundwork for wearable systems capable of passive drinking detection and personalized health interventions.

2. Related work

Hydration assessment traditionally relies on plasma osmolality and urine-specific gravity – gold standards but too invasive for continuous use (Lin et al., 2023; Charlton et al., 2019; Nose et al., 1988). Bioelectrical impedance analysis is noninvasive (Gupta et al., 2022; Chen et al., 2023) yet limited by equipment calibration and posture control (Liu et al., 2024). These constraints have spurred interest in wearable hydration monitoring.

PPG, widely used in smartwatches, captures vascular changes linked to plasma volume, arterial compliance and peripheral resistance (Hwang et al., 2021; Kiyasseh et al., 2020). Hydration affects these parameters, altering PPG waveform morphology and metrics like pulse wave velocity (PWV) (Hwang et al., 2021; Kiyasseh et al., 2020; Lopez-Paz and Oquab, 2016; Sagi and Rokach, 2018; Sengur et al., 2018; Herbrich, 2001; Huang et al., 2014; Noakes, 2007). While prior studies connected PPG features with dehydration or fluid shifts (Hwang et al., 2021; Şengür and Turhan, 2018; Noakes, 2007; Buda et al., 2018), most relied on controlled laboratory settings and did not address real-time drinking behavior. This gap limits both ecological validity and clinical relevance (Lopez-Paz and Oquab, 2016; Sagi and Rokach, 2018; Awad and Khanna, 2015; Zeebaree et al., 2019; Ting et al., 2011; Blagus and Lusa, 2013; Gopal et al., 2021).

Behavioral data sets also face class imbalance. Methods like synthetic minority over-sampling technique (SMOTE) (Leung, 2007), generative adversarial networks (GANs) (Rogova, 2008; Kotsovsky and Batyuk, 2022) and image-based augmentation (Safavian and Landgrebe, 1991; Şengür and Karabatak, 2018) mitigate this issue. SMOTE interpolates synthetic samples, while GANs capture nonlinear patterns but raise concerns about physiological validity. Recent work suggests hybrid strategies, combining SMOTE with GANs or constraining GAN outputs within physiological ranges (Leung, 2007; Rogova, 2008). Our study adopts these strategies to maintain data diversity and biological plausibility.

Image-based augmentation is also widely used when converting time series into 2D forms. Rahman et al. (2023) reviewed ECG augmentation, highlighting geometric transforms such as rotation and scaling; Kumar et al. (2024) surveyed medical image augmentation, emphasizing generalization; and Shorten and Khoshgoftaar (2019) noted their effectiveness in maintaining physiological interpretability. These studies justify our controlled application of geometric transformations on PPG-derived features.

3. Methodology

All processing steps in this study, including signal filtering, feature processing, data augmentation and model optimization, which were carried out in MATLAB R2024b (MathWorks) using the statistics and machine learning toolbox and the deep learning toolbox.

Each step of the analysis was structured as a separate module to support data integrity and reproducibility. This design allowed intermediate results to be examined independently, such as comparing raw versus filtered signals or augmented versus original data sets.

Several augmentation methods were applied, each with a distinct role. SMOTE generated new samples through linear interpolation in feature space, GAN introduced nonlinear variations while retaining physiological plausibility and image-based augmentation enhanced the spatial representation of feature relationships. Together, these approaches aimed to address class imbalance while reducing the risk of overfitting to synthetic data.

3.1 Data collection

3.1.1 Participants and drinking behavior measurement.

This study included 155 participants (70 men and 85 women) between the ages of 21 and 68. All participants signed written informed consent forms and were not informed of the specific research hypothesis. The study protocol was reviewed and approved by the Institutional Review Board of Chung Shan Medical University Hospital (Approval No. CSMUH21057).

PPG signals were recorded from each participant between 2:00 PM and 3:00 PM while they were in a standardized resting state to minimize physiological variation. After the recordings, participants completed a structured questionnaire to report the total amount of water they had consumed since waking that day. These self-reported values were used as an indicator of fluid intake behavior.

At the start, participants were divided into four groups based on their reported water intake: less than 2 cups, 2–4 cups, 4–6 cups and more than 6 cups. Because the number of participants in the “less than 2 cups” group was too small, this group was excluded from further analysis. The final classification focused on three intake ranges: 2–4 cups (Label 1), 4–6 cups (Label 2) and more than 6 cups (Label 3). These ranges were guided by established hydration recommendations, which suggest daily intakes of 2.0–2.7 liters for women and 2.5–3.7 liters for men. The selected cutoffs were intended to reflect realistic consumption levels by early afternoon and to provide a practical framework for classifying drinking behavior (Intakes SCotSEoDR, Electrolytes PoDRIf, Water, 2005; EFSA Panel on Dietetic Products N, Allergies, 2010).

Because this study recruited participants from a community-based recruitment, using self-reported water intake was a practical approach. However, we recognize that this method can be affected by recall bias, differences in estimating cup volume and social desirability. Future studies should consider more objective ways to track fluid intake, such as using smart bottles, wearable sensors or ecological momentary assessment (EMA) methods, to capture drinking events in real time and improve accuracy. This limitation and its implications for labeling reliability are further discussed in the discussion section.

3.1.2 Photoplethysmography signal acquisition.

PPG signals were collected using a COMGO fingertip arterial pulse recorder (Model: COMGO150TR-ECG; Giant Power Technology Biomedical Corp., Taiwan) (Lin et al., 2023), a commercially available medical-grade monitoring device designed for cardiovascular waveform acquisition. The device uses dry-contact optical sensing with BLE 4.0 transmission and is factory-calibrated to ensure consistent performance across units. All devices used in this study were obtained from the same production batch, operated under identical firmware and followed the manufacturer’s standard quality-control procedures. Serial numbers were not recorded because the units were functionally identical and calibrated prior to shipment.

A standardized acquisition protocol was applied across participants. Each PPG recording lasted 120 s, which provides sufficient duration for stable waveform acquisition and reliable extraction of fiducial points such as systolic peaks, dicrotic notches and diastolic feet. All measurements were performed in a controlled laboratory environment, with participants seated quietly at rest. The fingertip sensor was placed on the index finger and participants were instructed to keep their hand relaxed on a table surface, avoid speaking and minimize finger motion throughout the recording. The sampling rate was set to 200 Hz, offering adequate temporal resolution for pulse morphology analysis.

To minimize short-term physiological variability, participants were asked to avoid strenuous activity prior to measurement. Fingertip acquisition was selected due to its typically high perfusion index and reliable waveform quality. Because optical PPG can be affected by peripheral vasoconstriction, the room temperature was maintained within a controlled range to ensure consistent signal quality across recordings.

3.2 Signal preprocessing and feature extraction

The raw PPG signals were preprocessed to ensure consistent quality across participants. A zero-phase, second-order Butterworth bandpass filter with a passband of 0.5–15 Hz was applied to remove baseline drift and suppress high-frequency artifacts while preserving physiologically meaningful cardiac components. This passband retains the fundamental heart-rate frequency and its primary harmonics – corresponding to approximately 30–900 beats per minute – which are essential for peak detection, fiducial identification and pulse morphology analysis. This filtering step provides an effective balance between noise reduction and waveform preservation for subsequent feature extraction.

To normalize signal amplitude across participants and reduce inter-subject variability due to skin tone, perfusion or sensor positioning, the filtered signals $x_{f} (t)$ were standardized according to:

x_{n} (t) = \frac{x_{f} (t) - μ}{σ},

(1)

where $μ$ and $σ$ denote the mean and standard deviation of the filtered signal, respectively. This normalization procedure facilitates robust feature extraction and ensures comparability across subjects, particularly when combining data from different acquisition sessions.

3.2.1 Beat detection.

Beat segmentation was performed using the open-source PulseAnalyse toolbox by Charlton et al. (2019) which provides validated methods for fiducial detection and artifact rejection. Key fiducial points – specifically pulse onsets (⁠ $t_{{o n s e t}_{i}}$ ⁠) and systolic peaks (⁠ $t_{{p e a k}_{i}}$ ⁠) – were identified using:

t_{{o n s e t}_{i}} = a r g \min_{t \in [t_{start}, t_{end}]} \frac{d S (t)}{d t}

(2)

t_{{p e a k}_{i}} = a r g \max_{t \in [t_{start}, t_{end}]} S (t),

(3)

where $S (t)$ the PPG signal and $[t_{s t a r t}$ ⁠, $t_{e n d}$ ] defines the time boundaries for a given pulse. To account for individual variability in heart rate, the analysis window length 𝑚was dynamically adjusted according to:

m = [0.005 \times F_{s}, 0, 100 \times F_{s}],

(4)

where $F_{s} = 200 H z$ is the sampling frequency. This adaptive windowing ensured accurate capture of cardiac cycles across a range of resting and elevated heart rates.

3.2.2 Artifact detection and peak refinement.

Amplitude-based thresholds were applied to filter out spurious peaks, retaining only those within a physiologically plausible range:

{T h A}_{l o w} = p r c t i l e ({A m p}_{v a l i d}, 95) \times 0.6

(5)

{T h A}_{h i g h} = p r c t i l e ({A m p}_{v a l i d}, 95) \times 1.8,

(6)

where ${A m p}_{v a l i d}$ is the vector of preliminary peak amplitudes. Peak localization was further refined by re-identifying the maximum within each pulse cycle:

t_{r e f i n d {p e a k}_{i}} = a r g \max_{t \in [t_{{onset}_{i}}, t_{{onset}_{i + 1}}]} S (t)

(7)

This refinement step minimized errors due to small oscillations or artifacts near the systolic peak, which could otherwise propagate into time-domain indices.

3.2.3 Signal quality assessment.

A 10-second sliding window (2-second overlap) was used to evaluate signal quality. Beat-to-beat intervals were calculated as follows:

Δ t_{b e a t} = m e d i a n (t_{i + 1} - t_{i}),

(8)

where $t_{i}$ and $t_{i + 1}$ represent the onset times of two consecutive beats.

Each normalized pulse waveform $x_{i}$ was compared to a subject-specific template waveform $T$ ⁠, defined as:

T = \frac{1}{n} \sum_{i = 1}^{n} \frac{x_{i}}{|| x_{i} ||}

(9)

Morphological similarity was quantified via cross-correlation:

r_{i} = \frac{T \times x_{i}}{‖|| T‖ || ‖x_{i} ||‖}

(10)

Pulses with $r_{i} > {T h}_{p p g}$ (default 0.86) were retained, ensuring that low-quality beats – often linked to transient finger movement or sensor misalignment – were excluded prior to feature extraction.

3.2.4 Derivative calculations.

First- and second-order derivatives were computed using the Savitzky–Golay filter (Gupta et al., 2022) to highlight rapid transitions in the waveform:

\frac{d S (t)}{d t} = \lim_{Δ t \to 0} \frac{S (t + Δ t) - S (t)}{Δ t}

(11)

\frac{d^{2} S (t)}{d t} = \lim_{Δ t \to 0} \frac{\frac{d S (t + Δ t)}{d t} - \frac{d S (t)}{d t}}{Δ t}

(12)

These derivatives facilitated the detection of inflection points related to vascular compliance changes.

3.2.5 Fiducial point identification.

Fiducial landmarks – including the systolic peak (p1), dicrotic notch (dic) and diastolic foot (dia) – were identified using slope-based heuristics and derivative analysis:

t_{f i d u c i a l} = a r g \min_{t \in [t_{onset}, t_{end}]} \frac{d S (t)}{d t}

(13)

Secondary landmarks (first and second inflection points, maximum slope) were also extracted to provide finer resolution of vascular dynamics.

3.2.6 Pulse wave indices.

A total of 52 pulse-wave features were extracted from each cleaned PPG cycle, covering time-domain, amplitude-domain, slope-derived and morphology-based indices. All features and their mathematical definitions are summarized in supplemental Table 1, grouped according to physiological relevance (e.g. cardiac timing, diastolic filling, vascular stiffness, reflection indices, amplitude ratios).

Table 1.

Summary of Kruskal–Wallis test results and effect sizes (η²) for features significantly differentiating daily water intake groups

Feature	p-value	η² (effect size)
T	0.0487	0.028
IHR	0.0308	0.034
t_dia	0.0169	0.043
t_ratio	0.0009	0.084
A2	0.0107	0.049
IPA	0.0005	0.091

These indices were computed from primary fiducial landmarks – including the systolic peak, dicrotic notch, diastolic wave and baseline intersections – and included commonly used vascular markers such as the stiffness index, reflection index (RI), augmentation pressure, augmentation index (AI) and peripheral amplitude ratios [e.g. A1, A2, index of pulse amplitude (IPA)]. Time-based descriptors such as pulse duration T, systolic and diastolic phase durations (t_sys, t_dia) and phase ratios (e.g. t_ratio, Δt) were also derived to capture hydration-related cardiovascular adjustments.

Hydration level affects plasma volume and vascular resistance, leading to changes in systolic duration, diastolic filling time and relative wave amplitudes. Therefore, additional morphology-derived metrics – such as arterial gain indices (AGI) variants, slope-based measures (e.g. slope_b_c, slope_b_d) and width parameters (width1, width2) – were included to provide a comprehensive characterization of arterial pulse contour. All 52 indices were used as the candidate feature set in later feature-ranking analysis (Section 3.4), where chi-square ranking (fscchi2) selected the top k features (k = 5–30) for model evaluation. No additional features beyond this set were introduced in the classification stage.

Although supported by prior literature (Lin et al., 2023; Chen et al., 2023), these indices may still be influenced by factors such as age, ambient temperature or transient stress. Standardized acquisition protocols were applied to limit such effects; however, future studies should incorporate covariate adjustment or longitudinal normalization to better isolate hydration-specific physiological responses. PWV, a well-established marker of arterial stiffness, was not included because single-site fingertip PPG cannot reliably estimate pulse transit time. Accurate PWV computation requires dual-site PPG or ECG–PPG synchronization, which was beyond the scope of the present study but remains an important direction for future work.

3.3 Data augmentation

In this research, we began by addressing the problem of class imbalance in the data set when classifying drinking behavior from PPG-derived features. Participants were initially divided into four self-reported intake categories (<2 cups, 2–4 cups, 4–6 cups and >6 cups). Because the <2 cups group had too few cases for meaningful analysis, it was excluded. The final data set therefore consisted of three groups: Label 1 (2–4 cups, n = 31), Label 2 (4–6 cups, n = 62) and Label 3 (>6 cups, n = 46). Despite this adjustment, the differences in group size were still substantial, raising the risk of bias in model training and limiting generalizability.

Several strategies were applied to reduce this imbalance. Down-sampling equalized all groups at 31 cases (total = 93) by randomly reducing larger groups, while up-sampling expanded each group to 62 cases (total = 186). In addition, synthetic data approaches were tested, including SMOTE (Liu et al., 2024), GAN-based augmentation (Hwang et al., 2021) and image-based augmentation (Kiyasseh et al., 2020), all designed to generate physiologically consistent samples.

In every experiment, an 80:20 stratified train-test split was used to preserve class proportions while maintaining sufficient training data. Augmented samples were always kept in the same fold as their original cases to avoid data leakage and minimize overfitting. Data set distributions before and after augmentation are shown in Figure 1.

Figure 1.

A composite heatmap compares normalised original, G A N, S M O T E, and image augmented data across 3 classes and outputs.

View large Download slide

The composite heatmap displays multiple normalised datasets for 3 classes and augmented methods. Normalised Original Data Total, without Output displays Samples against Features 1 to 52. Normalised Original Data for Class 1, Class 2, and Class 3 display Samples against Features 1 to 52. G A N Data for Class 1 shows Original 31 Synthetic 31, G A N Data for Class 2 shows Original 62 Synthetic 0, and G A N Data for Class 3 shows Original 46 Synthetic 16. G A N Augmented Data Total, without Output displays Samples against Features 1 to 52, and All Output Data Original and G A N Data displays Output against Samples with 3 distinct bands. S M O T E Data for Class 1 shows Original 31 Synthetic 31, S M O T E Data for Class 2 shows Original 62 Synthetic 0, and S M O T E Data for Class 3 shows Original 46 Synthetic 16. S M O T E Data Total, without Output displays Samples against Features 1 to 52, and All Output Data Original and S M O T E displays Output against Samples with 3 distinct bands. Image Augmented Data for Class 1 shows Original 31 Synthetic 119, Image Augmented Data for Class 2 shows Original 62 Synthetic 88, and Image Augmented Data for Class 3 shows Original 46 Synthetic 184. Image Augmented Data Total, without Output displays Samples against Features 1 to 52, and All Output Data Original and Augmented displays Output against Samples with 3 distinct bands. Each heatmap plots Samples on the vertical axis and Features 1 to 52 on the horizontal axis, while output panels plot Output on the horizontal axis and Samples on the vertical axis.

Visualization of the PPG-derived feature data set before and after rebalancing

Note(s): Original min–max normalized data, shown for all samples and individually for the three drinking-behavior classes, illustrating the initial imbalance (31, 62, 46 samples). GAN-augmented data, where synthetic feature vectors were generated until each class contained 62 samples (orig | synth counts shown per class). SMOTE-augmented data, which interpolated minority classes to 62 samples per class. Image-based augmentation, where 1-D feature vectors were reshaped into 4 × 13 matrices and perturbed using small geometric transforms to produce 150 samples per class. For all methods, the rightmost bar plots show the resulting class-label distributions. Heatmaps depict 52-dimensional feature vectors across samples, illustrating how each augmentation strategy modifies class balance while preserving global feature structure

Figure 1.

View large Download slide

The composite heatmap displays multiple normalised datasets for 3 classes and augmented methods. Normalised Original Data Total, without Output displays Samples against Features 1 to 52. Normalised Original Data for Class 1, Class 2, and Class 3 display Samples against Features 1 to 52. G A N Data for Class 1 shows Original 31 Synthetic 31, G A N Data for Class 2 shows Original 62 Synthetic 0, and G A N Data for Class 3 shows Original 46 Synthetic 16. G A N Augmented Data Total, without Output displays Samples against Features 1 to 52, and All Output Data Original and G A N Data displays Output against Samples with 3 distinct bands. S M O T E Data for Class 1 shows Original 31 Synthetic 31, S M O T E Data for Class 2 shows Original 62 Synthetic 0, and S M O T E Data for Class 3 shows Original 46 Synthetic 16. S M O T E Data Total, without Output displays Samples against Features 1 to 52, and All Output Data Original and S M O T E displays Output against Samples with 3 distinct bands. Image Augmented Data for Class 1 shows Original 31 Synthetic 119, Image Augmented Data for Class 2 shows Original 62 Synthetic 88, and Image Augmented Data for Class 3 shows Original 46 Synthetic 184. Image Augmented Data Total, without Output displays Samples against Features 1 to 52, and All Output Data Original and Augmented displays Output against Samples with 3 distinct bands. Each heatmap plots Samples on the vertical axis and Features 1 to 52 on the horizontal axis, while output panels plot Output on the horizontal axis and Samples on the vertical axis.

Visualization of the PPG-derived feature data set before and after rebalancing

Note(s): Original min–max normalized data, shown for all samples and individually for the three drinking-behavior classes, illustrating the initial imbalance (31, 62, 46 samples). GAN-augmented data, where synthetic feature vectors were generated until each class contained 62 samples (orig | synth counts shown per class). SMOTE-augmented data, which interpolated minority classes to 62 samples per class. Image-based augmentation, where 1-D feature vectors were reshaped into 4 × 13 matrices and perturbed using small geometric transforms to produce 150 samples per class. For all methods, the rightmost bar plots show the resulting class-label distributions. Heatmaps depict 52-dimensional feature vectors across samples, illustrating how each augmentation strategy modifies class balance while preserving global feature structure

3.3.1 Synthetic minority over-sampling technique.

SMOTE (Liu et al., 2024) was implemented using MATLAB with k = 5 nearest neighbors (NN) determined via Euclidean distance. For each minority class instance $x_{i}$ ⁠, a synthetic sample $x_{s}$ was generated by interpolating between $x_{i}$ and a randomly selected neighbor $x_{k}$ using a factor $λ \in (0, α])$ ⁠, where $α = 0.1$ ⁠:

x_{s} = x_{i} + λ \times (x_{k} - x_{i})

(14)

This method preserved the local structure of the minority class and reduced the risk of overfitting. Borderline-SMOTE was also tested but introduced instability, so standard SMOTE was used. Validation with t-SNE visualization and histogram analysis confirmed that the synthetic samples followed the same distribution as the original data. As a result, SMOTE increased both minority classes to 62 samples each (total = 186), as shown in Figure 1(b). This approach has been widely applied to physiological tabular data sets (Gupta et al., 2022) and is particularly effective for maintaining class-specific feature patterns in small biomedical data sets.

3.3.2 Generative adversarial network -based augmentation.

A GAN (Hwang et al., 2021) framework was implemented using MATLAB’s dlnetwork architecture to increase data diversity while maintaining physiological plausibility. The generator transformed 100-dimensional noise vectors into synthetic PPG-derived feature vectors using two fully connected layers with ReLU activation functions and a final tanh activation. The discriminator consisted of two hidden layers with Leaky ReLU (slope = 0.2) and a sigmoid output layer.

The generator received a 100-dimensional random noise vector and passed it through two fully connected hidden layers (sizes: 128 and 52) before applying a tanh activation to output physiologically plausible synthetic vectors. The discriminator processed 52-dimensional inputs through three layers (128 → 64 → 1) with Leaky ReLU and sigmoid activations to assess sample authenticity. Adversarial training was conducted using the Adam optimizer (learning rate = 0.0002, β₁ = 0.5, β₂ = 0.999), with binary cross-entropy loss for both networks. An L2 regularization term (λ = 10) was added to the generator’s loss:

L_{G} = B C E (D (G (z)), 1) + λ \times || {‖G (z)‖ ||}^{2},

(15)

where $G (z)$ denotes the generated sample from noise vector $z$ ⁠, $D (G (z))$ is the discriminator’s prediction for the generated sample and $|| {‖G (z)‖ ||}^{2}$ represents the squared L2 norm of the output.

The discriminator loss was computed as follows:

L_{D} = B C E (D (x_{r e a l}), 1) + B C E (D (G (z)), 0)

(16)

where $x_{r e a l}$ denotes real PPG-derived vectors. Stabilization techniques included label smoothing (real labels = 0.9), minibatch discrimination and gradient clipping (norm cap = 1.0). Training proceeded for 1,000 epochs with batch size = 32. Generated outputs were validated via cosine similarity, distributional checks and comparison against original physiological indices (e.g. systolic–diastolic intervals, normalized amplitudes); outliers were removed in post-processing. GAN augmentation expanded all classes to 62 samples (total = 186), as illustrated in Figure 1(c).

GAN-based augmentation is increasingly applied in biomedical signal processing because it can capture nonlinear relationships and generate physiologically realistic data (Kotsovsky and Batyuk, 2022). To ensure reliability, all generated feature vectors were converted back into representative PPG waveforms and visually inspected. Key landmarks – such as systolic peaks, dicrotic notches and diastolic feet – were checked for consistency with real signals, confirming their suitability for downstream analysis.

In this study, we implemented a single, unconditional GAN that was trained on the pooled set of normalized PPG-derived feature vectors from all participants, without using class labels during adversarial training. This design allowed the GAN to learn the global structure of the feature distribution rather than class-specific manifolds. After convergence, the trained generator was used purely as a class-balancing tool. For each drinking-behavior label, we computed the existing number of real samples and generated additional synthetic vectors only for the underrepresented classes until all classes matched the size of the majority class (62 samples). Accordingly, Label 1 (2–4 cups) increased from 31 to 62 samples (31 real + 31 synthetic) and Label 3 (>6 cups) increased from 46 to 62 samples (46 real + 16 synthetic), whereas Label 2 (4–6 cups) remained at 62 real samples without any GAN-generated data. This strategy ensured that GAN-based augmentation did not dominate the majority class and restricted synthetic data generation to minority classes only, thereby preventing imbalance amplification while maintaining physiological plausibility.

To characterize whether GAN-generated samples remained within a physiologically reasonable feature space, we conducted two post-hoc validation steps. First, cosine similarity (Li et al., 2022) was computed between each synthetic vector and all real samples after z-score normalization; vectors with similarity <0.2 were labeled as outliers, resulting in 30 of 31 (96.8%) synthetic samples being retained. Second, marginal feature distributions were compared using the two-sample Kolmogorov–Smirnov test (Lopez-Paz and Oquab, 2016). As expected, most features showed significant distributional differences, reflecting both the small number of synthetic samples and the unconditional nature of the GAN. These checks were used for characterization only and not as filtering steps within the augmentation pipeline.

3.3.3 Image-based augmentation.

In the image-based approach (Kiyasseh et al., 2020), one-dimensional PPG feature vectors were reshaped into two-dimensional matrices (e.g. 4 × 13) to preserve spatial adjacency among physiologically related features. MATLAB’s imageDataAugmenter applied small geometric transformations – rotation (±0.01 radians), scaling (±5%) and translation (±0.01 units) – to mimic real-world variability from sensor placement and subject posture changes. The transformation process is described as follows:

T (x) = S (R (x + δ))

(17)

Here, δ denotes spatial shift, R denotes rotation and S denotes scaling. Cosine similarity analysis and expert review confirmed that these augmentations preserved the interpretability of PPG signals. The approach follows well-established image-based augmentation methods, commonly used in both biosignal and medical imaging – for example, systematic ECG augmentation (Chen et al., 2023) and geometric transforms in general medical imaging (Sagi and Rokach, 2018). Parameter ranges were selected to allow plausible morphological variations while avoiding distortion of physiologically relevant frequency-domain features. Broader discussions of geometric augmentation can be found in recent reviews (Şengür and Karabatak, 2018). Using this method, each class was expanded to 150 samples (total = 450), as shown in Figure 1(d).

3.3.4 Additional resampling strategies.

Two additional resampling strategies were applied to improve balance and reduce model bias. The first strategy was down-sampling, where all groups were reduced to 31 samples – the size of the smallest group – by randomly removing data from the larger groups (total = 93 samples). This approach achieved strict class balance but at the expense of discarding potentially useful information and reducing data set diversity.

The second strategy was a hybrid resampling strategy, which aimed to balance classes while retaining representativeness. Here, all classes were adjusted to 62 samples (total = 186) by up-sampling the minority classes (Labels 1 and 3) and down-sampling the majority class (Label 2). The hybrid method was formulated as follows:

X_{r e a s a m p l e d = \{\begin{matrix} u p a s a m p l e (X_{i}), i f n_{i} < N_{t a r g e t} \\ d o w n s a m p l e (X_{i}), i f n_{i} > N_{t a r g e t} \end{matrix}},

(18)

where $X_{i}$ is the original sample set for class $i$ ⁠, $n_{i}$ is the sample count and $N_{t a r g e t} = 62$ ⁠. Both strategies were applied independently in separate experiments to examine their influence on classification fairness and model stability. The hybrid approach preserved greater variability in minority classes while maintaining balance.

The effects of the augmentation and resampling methods are shown in Figure 1: original data, SMOTE (62 samples per class), GAN (62 samples per class) and image-based augmentation (150 samples per class). All methods reduced differences between classes while keeping meaningful variability, resulting in a well-balanced data set important for fair and generalizable classification of drinking behavior. This framework also lays the groundwork for applying noninvasive, PPG-based hydration monitoring in future wearable and behavior-aware digital health systems.

3.4 Model selection and evaluation

Model selection was carried out using an automated benchmarking strategy rather than manual trial-and-error. MATLAB’s fitcauto function was used to systematically evaluate a range of supervised classifiers, including ensemble methods (Sagi and Rokach, 2018; Sengur et al., 2018), kernel classifiers (Herbrich, 2001; Huang et al., 2014), k-nearest neighbor (k-NN) (Mucherino et al., 2009; Şengür and Turhan, 2018), support vector machine classifiers (SVM) (Awad and Khanna, 2015; Zeebaree et al., 2019), naive Bayes classifiers (Ting et al., 2011; Leung, 2007), neural network classifiers (Rogova, 2008; Kotsovsky and Batyuk, 2022) and decision tree classifiers (Safavian and Landgrebe, 1991; Şengür and Karabatak, 2018). This procedure provided a consistent basis for comparing model performance and avoided bias toward any single algorithm family. Classifier selection was performed using MATLAB’s fitcauto, which automatically chooses the classifier that best fits the training distribution. Because the optimal classifier may vary across balancing conditions, only the classifier associated with the overall best-performing configuration is reported in the main manuscript.

Hyperparameter optimization for each candidate model was performed using Bayesian optimization with the Asynchronous Successive Halving Algorithm, specified by OptimizeHyperparameters = “all”. Parallel execution (UseParallel = true) reduced computation time while maintaining consistent search procedures across algorithms.

To mitigate overfitting, the data set was split into 80% training and 20% testing using stratified sampling (cvpartition) and the procedure was repeated ten times with different random seeds. In each run, all 52 pulse-wave indices defined in Section 3.2.6 were ranked using chi-square analysis (fscchi2) and classifiers were evaluated using subsets of the top k features (k = 5–30). This design enabled a systematic examination of whether different algorithms perform best with compact feature sets or broader input representations.

Model performance was assessed using a multi-metric evaluation framework to ensure balanced and fair comparisons across models:

A c c u r a c y = \frac{Number of Correct Predictions}{Total Number of Predictions} \times 100

(18)

T P R = \frac{True Positives}{F a l s e N e g a t i v e s + T r u e P o s i t i v e s}

(19)

F P R = \frac{False Positives}{F a l s e P o s i t i v e s + T r u e N e g a t i v e s}

(20)

P r e c i s i o n = \frac{True Positives}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s}

(21)

R e c a l l = \frac{True Positives}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(22)

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(23)

A multi-metric evaluation framework was applied to assess model performance, ensuring that classification was not only accurate but also balanced in terms of sensitivity and specificity. This approach addressed both the correct detection of positive cases and the reduction of false alarms.

In the absence of an external validation data set, we conducted repeated stratified evaluations combined with hyperparameter tuning. This approach helped reduce the risk of overfitting and provided a sound basis for comparing model performance across different algorithms. While external validation was not included in the current study, future work will apply this framework to data sets collected under varied recording conditions and devices to better assess its real-world applicability in wearable hydration monitoring systems.

The choice of a repeated 80:20 stratified hold-out rather than k-fold cross-validation (CV) was motivated by the highly imbalanced class distribution: applying k-fold CV would generate folds with very small minority-class subsets and risk synthetic-sample leakage when using SMOTE or GANs. The repeated-holdout strategy ensured stable and leakage-free evaluation.

To ensure full transparency in connecting feature ranking to model performance, the exact top-k features selected for each best-performing model configuration are provided in supplemental Table 4.

3.5 Statistical analysis

We assessed whether the extracted PPG features differed significantly across the three drinking behavior categories – Label 1 (2–4 cups), Label 2 (4–6 cups) and Label 3 (>6 cups) – using the Kruskal–Wallis test (McKight and Najab, 2010). This non-parametric alternative to ANOVA was selected because it is robust to the non-normal distributions often observed in physiological data.

Seven representative features were examined: pulse duration (T), instantaneous heart rate (IHR), diastolic time (t_dia), systolic-to-diastolic ratio (t_ratio), second peak amplitude (A2) and IPA. Extreme outliers were controlled by applying a quantile-based trimming procedure, which removed the top and bottom 1% of values for each feature:

c l e a n D a t a = s o r t e d D a t a (r e m o v e T h r e s h o l d + 1 : e n d - r e m o v e T h r e s h o l d),

(21)

where $r e m o v e T h r e s h o l d = f l o o r (0.01 \times n)$ and 𝑛 is the total number of observations. This retained 98% of the most representative data while preserving distributional integrity.

The Kruskal–Wallis test (McKight and Najab, 2010) was used to evaluate whether the k independent drinking-behavior groups originated from the same underlying distribution. It tests the null hypothesis that the population medians are identical across groups, where the alternative indicates that at least one group median differs. The test statistic is defined as follows:

H = \frac{12}{N (N + 1)} \sum_{i = 1}^{k} n_{i} (\bar{R_{i}} - \bar{R})^{2},

(22)

where $n_{i}$ is the number of observations in group $i$ ⁠, $N = \sum n_{i}$ is the total sample size, $\bar{R_{i}}$ is the mean rank of group $i$ and $\bar{R} = (N + 1) / 2$ is the overall mean rank. Under the null hypothesis, $H$ approximately follows a chi-square distribution with $k - 1$ degrees of freedom. Features with p <0.05 were subjected to Bonferroni-corrected post-hoc pairwise comparisons and effect sizes were quantified using:

η^{2} = \frac{H - k + 1}{N - K},

(23)

which estimates the proportion of variance in ranks attributable to group differences. This analysis identified specific pairwise differences (e.g. Label 1 versus Label 3), offering insight into how drinking behavior may influence cardiovascular waveform characteristics.

PPG-derived features were chosen and evaluated to capture differences between groups while staying consistent with known physiological patterns. We focused on measures such as systolic duration and wave amplitude, which are known to change with fluid intake. This selection process ensured that the input variables represented physiologically meaningful changes and provided a solid statistical basis for later classification modeling and interpretability.

4. Results

4.1 Photoplethysmography signal preprocessing, normalization and peak detection

The preprocessing pipeline improved the quality of the raw PPG signals and prepared them for feature extraction. As shown in Figure 2, baseline drift and high-frequency noise in the original waveforms were corrected using signal inversion and bandpass filtering (0.5–15 Hz). Signal inversion ensured that waveform polarity matched physiological expectations and filtering removed motion artifacts and other unwanted noise. Finally, amplitude normalization was applied to place all signals on the same scale across participants, allowing consistent comparisons and providing a stable foundation for feature extraction.

Figure 2.

Three line graphs show raw, filtered, and normalized P P G signals plotted against time in seconds.

View large Download slide

Three line graphs titled raw P P G signal, filtered P P G signal, and normalized P P G signal display amplitude against time in seconds. Raw P P G signal spans 0 to 120 seconds with amplitude scaled by 10 to the power 4 and fluctuates around negative 5 point 05 multiplied by 10 to the power 4 with gradual increase after 50 seconds. Filtered P P G signal spans 0 to 100 seconds with amplitude ranging approximately from negative 200 to 200 and shows periodic oscillations with increased amplitude after 60 seconds. Normalized P P G signal spans 0 to 100 seconds with amplitude ranging from negative 1 to 1 and shows regular pulse waves with increased amplitude after 60 seconds.

PPG signal preprocessing and normalization

Figure 3 illustrates the performance of the PulseAnalyse algorithm in peak detection, artifact rejection and signal quality assessment. As shown in Figure 3(a), noisy cycles present in the raw gray waveforms were effectively removed, while clean pulse waves retained after preprocessing are highlighted in blue. For most participants, more than 90% of segments were preserved, demonstrating the robustness of the preprocessing pipeline.

Figure 3.

Two line graphs show identification of high quality pulse waves and fiducial points on a P P G signal.

View large Download slide

Two line graphs titled identifying high quality pulse waves P W s and P P G signal with identified fiducial points display amplitude against time in seconds. The first graph includes a legend listing raw signal, high quality P W s, and pulse onsets, with markers indicating pulse onsets along the waveform. The second graph spans 1 to 10 seconds and includes a legend listing P P G signal, systolic peak, dicrotic notch, diastolic point, and onset, with corresponding markers indicating peaks, notches, diastolic points, and onsets across successive pulse cycles.

Peak detection and fiducial point identification

Figure 3(b) presents a representative waveform with four physiologically relevant fiducial points accurately detected: the Systolic Peak (green), the Dicrotic Notch (blue), the Diastolic Foot (black) and the Onset (red). These landmarks capture key cardiovascular events, including maximum arterial expansion, aortic valve closure and the onset of the cardiac cycle. Their reliable identification enabled precise measurement of timing parameters such as systolic duration and pulse transit time.

From these fiducial points, we derived morphological features, including pulse duration, systolic-to-diastolic phase ratios and amplitude-based indices such as the RI and AI. The combination of a high sampling rate (>100 Hz), stringent artifact rejection and visual verification ensured that these features were physiologically consistent and reproducible. This result confirms the stability of fiducial point detection even under conditions with noise or relatively lower resolution, providing a solid basis for subsequent classification of drinking behavior.

4.2 Feature importance and relationship with hydration levels

This study compared PPG-derived pulse wave features across three self-reported daily water intake groups: Label 1 (2–4 cups), Label 2 (4–6 cups) and Label 3 (>6 cups). As shown in Figure 4 and Table 1, the Kruskal–Wallis test identified several features with significant between-group differences, with effect sizes (η²) ranging from small to moderate. Post hoc Bonferroni-corrected comparisons further revealed distinct hemodynamic profiles for each group.

Figure 4.

Six box plots show significant features by water condition using Kruskal Wallis test for 3 groups.

View large Download slide

Six box plots titled significant features by water condition Kruskal Wallis display index against water condition groups 1, 2, and 3 for T, I H R, t da, t ratio, A 2, and I P A. Reported values include p equals 0.0487 and eta squared equals 0.028 for T, p equals 0.0308 and eta squared equals 0.034 for I H R, p equals 0.0169 and eta squared equals 0.043 for t da, p equals 0.0009 and eta squared equals 0.084 for t ratio, p equals 0.0107 and eta squared equals 0.049 for A 2, and p equals 0.0005 and eta squared equals 0.091 for I P A.

Illustrates the differences in key indices across drinking behavior groups

In the time domain, pulse duration (T, p = 0.0487, η² = 0.028) was shorter in the 2–4 cups group compared with the 4–6 cups group, suggesting that moderate hydration may help the heart complete its cycle more efficiently. IHR (p = 0.0308, η² = 0.034) was higher and more stable in the 4–6 cups group, possibly reflecting better autonomic control. Diastolic time (t_dia, p = 0.0169, η² = 0.043) increased as intake rose, while the systolic-to-diastolic ratio (t_ratio, p = 0.0009, η² = 0.084) showed the clearest separation, rising steadily with greater fluid intake and indicating improved balance between the two phases of the cardiac cycle.

Amplitude-domain features showed comparable patterns. Both the second peak amplitude (A2, p = 0.0107, η² = 0.049) and the IPA (p = 0.0005, η² = 0.091) were higher in well-hydrated groups, pointing to greater vascular flexibility and increased pulse pressure. The findings indicate a clear and physiologically meaningful link between hydration level and cardiovascular function, suggesting that hydration status could be monitored through noninvasive approaches such as wearable sensing.

Figure 5 provides an extended analysis (Rahman et al., 2023; Lin et al., 2023) of two vascular compliance indices, width1 and width2, stratified by gender and age subgroups. Among males younger than 40 years, width1 was significantly lower in the 4–6 cups group compared with the 2–4 cups group (p = 0.0393), indicating enhanced arterial elasticity associated with moderate hydration. Although width2 did not reach statistical significance in any subgroup, the age- and gender-specific patterns observed suggest an underlying physiological effect that merits further investigation.

Figure 5.

Sixteen box plots show width 1 and width 2 by hydration levels across age and gender groups.

View large Download slide

Sixteen box plots titled width 1 and width 2 by age and gender display index against hydration levels 1, 2, and 3 for age less than 40, 40 to 50, 50 to 60, and greater than 60, separated by male and female. Width 1 panels report p values 0.0323, 0.1435, 0.5277, and 0.9214 for males and 0.5034, 0.8735, 0.6066, and 0.5564 for females. Width 2 panels report p values 0.1591, 0.0744, 0.2106, and 0.4765 for males and 0.3251, 0.2831, 0.2694, and 0.3780 for females, with eta squared values displayed in each panel.

Summarizes the Kruskal–Wallis results for width1 and width2 across gender and age

Beyond these subgroup findings, several PPG-derived morphological indices – most notably the systolic-to-diastolic ratio (t_ratio), the IPA and the second peak amplitude (A2) – emerged as promising noninvasive markers of hydration status. Because these measures can be obtained without blood sampling or other invasive procedures, they hold potential for integration into intelligent, real-time health monitoring systems capable of tracking hydration status and cardiovascular dynamics in daily life.

4.3 Classification performance with augmentation strategies

Figure 6 and Supplemental Table 2 compare the performance of five class-balancing strategies – down-sampling, up-sampling, SMOTE, GAN-based augmentation and image-based augmentation – evaluated using an 80:20 stratified hold-out across ten independent runs. In each run, the test set retained the original class proportion, yielding 6–6–6, 12–13–12 and 30–30–30 samples for the three classes under different balancing configurations, as shown in Supplemental Table 3. These test-set counts reflect the natural class distribution under stratified sampling and were not modified by augmentation; augmentation affected only the training set. To reduce the imbalance in the original data set (31, 62 and 46 samples per class), several rebalancing strategies were applied prior to model training. Down-sampling equalized all training classes at 31 cases each (total = 93) by randomly removing excess samples. Up-sampling increased each training class to 62 cases (total = 186).

Figure 6.

Five parts a to e show bar charts, accuracy charts, R O C curves, and confusion matrices for sampling methods.

View large Download slide

Five parts compare down sampling to minimum sample size, up sampling to maximum sample size, S M O T E, G A N, and image based augmentation. Part a shows a bar chart of class distribution with support for classes 1, 2, and 3, a bar chart of accuracy percent against feature number 5 to 30 with error bars, an R O C curve plotting true positive rate against false positive rate with A U C values 0.7847, 0.8889, and 0.8318, and a confusion matrix of predicted class against true class with counts 3, 3, 6, 1, and 5. Part b shows a bar chart of class distribution, a bar chart of accuracy percent against feature number 5 to 30 with error bars, an R O C curve with A U C values 1.0, 0.9936, and 0.9982, and a confusion matrix with counts 12, 13, and 11. Part c shows a bar chart of class distribution, a bar chart of accuracy percent against feature number 5 to 30 with error bars, an R O C curve with A U C values 0.9712, 0.9067, and 0.9522, and a confusion matrix with counts 12, 12, and 7. Part d shows a bar chart of class distribution, a bar chart of accuracy percent against feature number 5 to 30 with error bars, an R O C curve with A U C values 0.8233, 0.75, and 0.8185, and a confusion matrix with counts 10, 9, and 5. Part e shows a bar chart of class distribution, a bar chart of accuracy percent against feature number 5 to 30 with error bars, an R O C curve with A U C values 0.9944, 0.9906, and 0.9872, and a confusion matrix with counts 30, 25, and 27.

Comparison of class distribution, feature-wise accuracy, ROC curves and confusion matrices for five sampling/augmentation methods: down-sampling, up-sampling, SMOTE, GAN and image-based augmentation. The feature-wise accuracy plots represent the mean accuracy (± standard error) across ten independent runs under each sampling/augmentation strategy. For clarity, the figure titles display the best-performing configuration and the corresponding ROC curves and confusion matrices shown are also from this best single run

For synthetic approaches, SMOTE generated interpolated vectors, expanding minority classes to 62 samples each (total = 186); GAN-based augmentation produced synthetic vectors under adversarial training, likewise achieving 62 samples per class; and image-based augmentation reshaped 1-D features into 4 × 13 grids and applied small geometric transformations, expanding each class to 150 samples (total = 450). The effects of these four strategies are visualized in Figure 1: (a) original data, (b) SMOTE-expanded data, (c) GAN-expanded data and (d) image-augmented data.

Accuracy curves in Figure 6 report the mean and standard error across the ten runs for each feature subset (top k chi-square–ranked features, k = 5–30). Panel titles identify the best-performing configuration. ROC curves and confusion matrices illustrate the final selected model obtained through hyperparameter search; because these reflect a single model rather than averaged outcomes, error bars are not shown. Supplemental Table 2 provides class-specific AUC values for all best models.

Down-sampling reduced each training class to 31 samples, while the smallest test-set class contained 6 samples, contributing to high variance. This configuration achieved 78% accuracy with 8 features (macro-AUC = 0.83), but the aggressive loss of training data substantially reduced physiological variability and weakened discrimination.

Up-sampling preserved all original samples and expanded each class to 62 training samples, yielding 12–13 test samples per class. This strategy produced the strongest overall results, achieving 97% accuracy and a macro-AUC of 1.00 using the top 8 features.

For the synthetic-data approaches, SMOTE expanded the minority classes to 62 training samples each, producing a stratified test set containing 13, 12 and 12 samples for the three classes. Under this configuration, the model achieved 84% accuracy using 17 features, with a macro-AUC of 0.93. Class-level performance showed high precision for classes 1 and 3 but substantially reduced recall for class 3 (0.58), indicating that SMOTE’s interpolated vectors sometimes blended feature boundaries and weakened class-specific discrimination.

GAN-based augmentation also expanded each training class to 62 samples, yielding a test-set distribution of 12, 12 and 13 samples. This method produced the weakest performance among all balancing strategies, reaching 65% accuracy with 11 features and a macro-AUC of 0.84. In particular, recall for the highest-intake group was markedly poor (class 3 recall = 0.38). Post-hoc physiological validation described in Section 3.3.2 confirmed that several GAN-generated feature vectors deviated from expected PPG morphology – such as inconsistent systolic–diastolic timing or attenuated amplitude structure – thereby explaining the instability and reduced discriminative performance.

In contrast, image-based augmentation, which expanded each class to 150 training samples, resulted in substantially more stable classification behavior. The corresponding stratified test set contained 30 samples per class and the method achieved 91% accuracy using 21 features, with a macro-AUC of 0.99. The small geometric transformations applied to the reshaped 4 × 13 feature matrices – rotation (±0.01 rad), scaling (±5%) and translation (±0.01 units) – preserved the morphological interpretability of the signals while simulating realistic acquisition variability. This contributed to consistently high precision and recall across all three classes and made image-based augmentation the second-best performing method after up-sampling.

Across all resampling and augmentation strategies, the optimal classifier configuration varied depending on the data balancing mechanism. Under the up-sampling strategy, which achieved the best overall performance (97% accuracy, macro-AUC = 1.00), MATLAB’s fitcauto selected an error-correcting output codes (ECOC) classifier with linear SVM base learners. For image-based augmentation, which yielded the second-best performance (91% accuracy, macro-AUC = 0.99), the optimal model was an ECOC classifier using a decision-tree ensemble as the base learner.

For the remaining strategies, different ECOC configurations were identified as optimal. Specifically, the best-performing models for SMOTE were ECOC classifiers with linear SVM base learners, whereas for down-sampling and GAN-based augmentation, ECOC classifiers with decision-tree ensemble base learners were selected. These differences reflect how each balancing strategy alters the effective training distribution and, consequently, the classifier architecture favored by automated model selection.

The exact top-k Chi-square–selected feature subsets and the corresponding key hyperparameters for all five strategies – including base learner type, ECOC coding scheme, kernel scale or ensemble size, regularization settings, binary loss function, class priors and the number of trained base learners – are summarized in supplemental Table S4, ensuring full transparency and reproducibility of the reported results.

5. Discussion

5.1 Interpretation of photoplethysmography features in relation to drinking behavior

This section discusses how PPG-derived time- and amplitude-domain features reflect cardiovascular responses to different levels of daily water intake. The updated findings in Figures 4 and 5 show that higher hydration was generally associated with improved vascular compliance and more balanced hemodynamic function, in line with previous evidence linking hydration to circulatory stability and arterial elasticity (Nose et al., 1988; Noakes, 2007).

In the time domain, both pulse duration (T) and diastolic time (t_dia) were significantly longer in the 4–6 cups group compared with lower intake groups, suggesting that moderate hydration supports more effective ventricular filling and an extended cardiac cycle. The systolic-to-diastolic ratio (t_ratio) showed the clearest separation (p = 0.0007, η² = 0.084), indicating its potential as a sensitive marker of cardiovascular phase balance under different hydration conditions.

Amplitude-domain measures further reinforced these findings. Interestingly, the A2 and IPA indices not only differentiated hydration groups but also showed greater variability in younger adults, suggesting possible age-related modulation that warrants deeper study. The second systolic peak amplitude (A2) and the IPA were consistently higher in well-hydrated groups (p < 0.011), consistent with greater vascular elasticity and increased pulse pressure volume.

Subgroup analysis (Figure 5) revealed demographic-specific patterns. For males under 40 years old, width1, a surrogate measure of vascular compliance, which was significantly lower in the 4–6 cups group than in the 2–4 cups group (p = 0.0393), pointing to greater arterial elasticity with moderate hydration in younger adults. Although width2 did not reach statistical significance, consistent trends across subgroups suggest underlying physiological differences that deserve further study.

All fiducial points, such as systolic peak, dicrotic notch, diastolic foot and onset, which were derived from signals sampled above 100 Hz and validated by visual inspection, ensuring morphological accuracy. This strengthens confidence in the robustness of the extracted features, even when working with relatively low-resolution PPG signals.

Overall, these results confirm that PPG waveform features provide physiologically interpretable links between hydration behavior and cardiovascular function. Beyond statistical associations, they also point toward practical applications, including noninvasive hydration monitoring through wearable devices, personalized hydration management in clinical settings and performance tracking in sports or occupational health.

5.2 Ground truth considerations

In this study, self-reported daily water intake was used as the behavioral ground truth for class labeling. While practical and commonly adopted in early-stage behavioral sensing research, self-reported intake is subject to several limitations, including recall bias, variability in the perceived size of a “cup,” and differences in participants’ estimation accuracy. These factors introduce noise into the labeling process and may partially contribute to the overlap observed across groups. It is important to note that the goal of the present work is drinking-behavior classification rather than biochemical hydration assessment; thus, self-reported intake provides an appropriate proxy for establishing preliminary behavioral categories. Nevertheless, future studies should incorporate objective intake-monitoring tools – such as smart water bottles, EMA, urinary osmolality or short-term body-weight changes – to strengthen label reliability and improve physiological interpretability.

5.3 Impact of data augmentation on classification

This study compared several data balancing and augmentation methods and confirmed that data augmentation plays a key role in improving classification performance. As shown in Figure 6 and Supplemental Table 2, up-sampling and image-based augmentation produced the most reliable results. Up-sampling reached an accuracy of 97% with a macro-AUC of 1.00, while image-based augmentation achieved 91% accuracy with a macro-AUC of 0.99. Both the approaches maintained class balance and preserved the physiological characteristics of PPG-derived features, while also introducing controlled variability that is essential for behavioral classification tasks such as hydration monitoring.

Up-sampling worked well because it retained the full set of original samples, leading to stable model behavior. This agrees with earlier reports that balanced data sets support more reliable health-related classification (Buda et al., 2018). Image-based augmentation also showed strong performance. By applying simple geometric changes such as rotation, scaling and translation, it maintained waveform morphology while simulating realistic variations like changes in sensor placement (Shorten and Khoshgoftaar, 2019). These results highlight its potential value for real-world deployment in wearable monitoring systems.

SMOTE interpolation offered reasonable performance (84% accuracy, macro-AUC = 0.93), as it enriched minority classes while keeping local neighborhood structure. However, we noted that SMOTE may introduce physiologically less realistic interpolations in complex bio-signals, so parameter tuning is important (Blagus and Lusa, 2013). Down-sampling, in contrast, reduced accuracy to 78% (macro-AUC = 0.83), mainly because of the loss of informative data.

GAN-based augmentation showed the weakest performance (65% accuracy, macro-AUC = 0.84), with particularly poor recall in some classes (e.g. 0.38 for Class 3). Closer inspection revealed that many generated signals lacked key morphological features such as systolic peaks and dicrotic notches, limiting their physiological validity and classifier utility.

Our findings indicate that augmentation strategies that preserve waveform morphology and physiological meaning, especially up-sampling and image-based augmentation, which are the most effective for classifying drinking behavior.

Beyond improving performance in the present study, these approaches also provide practical direction for designing future preprocessing pipelines in hydration monitoring. In addition, they reflect the broader emphasis in biomedical machine learning on realism and interpretability (Gopal et al., 2021).

Because augmentation procedures in this study were applied only to the training set, traditional k-fold CV would risk synthetic-sample leakage across folds. For this reason, a repeated stratified 80:20 evaluation was used to ensure leakage-free assessment and future studies with larger data sets may further explore k-fold validation as an alternative strategy.

6. Conclusion and future work

This study confirmed that PPG-derived waveform features can reflect cardiovascular changes linked to daily drinking behavior, offering a physiologically grounded and noninvasive way to assess hydration status. The time-domain indices (T, t_dia, t_ratio) and amplitude-domain metrics (A2, IPA) showed clear differences across hydration groups, with effect sizes indicating meaningful associations. The observed prolongation of diastolic time is consistent with the physiological expectation that hydration increases plasma volume, which in turn reduces vascular resistance. Subgroup analysis could further suggest demographic influences, such as enhanced arterial elasticity in younger males with moderate fluid intake.

In the area of data processing, this study also evaluated several strategies to address class imbalance in behavioral classification. These approaches that preserved waveform morphology and physiological plausibility, most notably up-sampling and image-based augmentation, consistently achieved the strongest performance, reaching high accuracy with balanced precision–recall profiles. In contrast, GAN-based augmentation showed weaker physiological fidelity, while down-sampling led to considerable information loss, underscoring the importance of augmentation methods that could remain consistent with bio-signal characteristics. We also acknowledge that training a single unconditional GAN, rather than class-conditional GANs, may limit the class-specific fidelity of synthetic samples; this represents a methodological limitation and a potential direction for future improvement.

These findings suggest that hydration-aware wearable technologies could benefit directly from the incorporation of PPG-based analysis. Our findings indicate that PPG-derived features are not only statistically discriminative but also physiologically meaningful markers of drinking behavior. Embedding interpretable PPG feature analysis into devices such as smartwatches, fitness bands or mobile health platforms could enable real-time, noninvasive monitoring of fluid status. Such systems would be especially valuable for groups vulnerable to dehydration, including older adults, athletes and individuals exposed to physically demanding or high-temperature environments. Equally important will be the continued optimization of augmentation pipelines for physiological signals to ensure robustness under real-world noise and data limitations. In addition, the use of self-reported daily water intake as the behavioral ground truth introduces inherent uncertainty due to recall variability and cup-size estimation differences; thus, future work should incorporate objective measurements to strengthen label reliability.

Looking ahead, several directions for future research can be outlined. Validation should be extended to larger and more diverse populations to improve the generalizability of the findings. Integration of additional bio-signals, such as ECG or sweat electrolyte measurements, may enrich contextual understanding and provide complementary physiological markers of hydration. Longitudinal studies are needed to capture individual adaptation patterns over time, which would help refine and personalize hydration models.

The contributions of this study can be summarized across three dimensions. From an academic perspective, this work advances PPG from a conventional cardiovascular monitoring tool to a physiological indicator of hydration status, thereby addressing a critical gap in noninvasive assessment methods. From an engineering perspective, the study provides a systematic comparison of data augmentation strategies, emphasizing the preservation of physiological plausibility and signal integrity. This offers a valuable reference for future researchers dealing with small-sample or imbalanced biomedical data sets. From an application perspective, the findings lay a foundation for the development of wearable hydration-sensing technologies, highlighting their potential to drive progress in smart healthcare, particularly in real-time monitoring and personalized health management.

References

Awad

,

M.

and

Khanna

,

R.

(

2015

), “Support vector machines for classification”,

Efficient Learning Machines

,

Springer

, pp.

39

-

66

.

Google Scholar

Crossref

Blagus

,

R.

and

Lusa

,

L.

(

2013

), “

SMOTE for high-dimensional class-imbalanced data

”,

BMC Bioinformatics

, Vol.

14

, pp.

1

-

16

.

Google Scholar

PubMed

Buda

,

M.

,

Maki

,

A.

and

Mazurowski

,

M.A.

(

2018

), “

A systematic study of the class imbalance problem in convolutional neural networks

”,

Neural Networks: The Official Journal of the International Neural Network Society

, Vol.

106

, pp.

249

-

259

.

Google Scholar

Crossref

PubMed

Charlton

,

P.H.

,

Mariscal Harana

,

J.

,

Vennin

,

S.

,

Li

,

Y.

,

Chowienczyk

,

P.

and

Alastruey

,

J.

(

2019

), “

Modeling arterial pulse waves in healthy aging: a database for in silico evaluation of hemodynamics and pulse wave indexes

”,

American Journal of Physiology. Heart and Circulatory Physiology

, Vol.

317

No.

5

, pp.

H1062

-

H1085

.

Google Scholar

Crossref

PubMed

Chen

,

Y.-X.

,

Tseng

,

C.-K.

,

Kuo

,

J.-T.

,

Wang

,

C.-J.

,

Chao

,

S.-H.

,

Kau

,

L.-J.

,

Hwang

,

Y.S.

and

Lin

,

C.L.

(

2023

), “

Fatigue estimation using peak features from PPG signals

”,

Mathematics

, Vol.

11

No.

16

, p.

3580

.

Google Scholar

Crossref

EFSA Panel on Dietetic Products N, Allergies

(

2010

), “

Scientific opinion on dietary reference values for water

”,

EFSA Journal

, Vol.

8

, p.

1459

.

Gopal

,

B.

,

Han

,

R.

,

Raghupathi

,

G.

,

Ng

,

A.

,

Tison

,

G.

and

Rajpurkar

,

P.

(

2021

), “

3KG: contrastive learning of 12-lead electrocardiograms using physiologically-inspired augmentations

”,

Machine Learning for Health

,

PMLR

, pp.

156

-

167

.

Google Scholar

Gupta

,

S.

,

Singh

,

A.

and

Sharma

,

A.

(

2022

), “

Denoising and analysis of PPG acquired from different body sites using Savitzky Golay filter

”,

TENCON 2022-2022 IEEE Region 10 Conference (TENCON)

,

IEEE

, pp.

1

-

4

.

Google Scholar

Crossref

Herbrich

,

R.

(

2001

),

Learning Kernel Classifiers: Theory and Algorithms

,

MIT Press

.

Google Scholar

Crossref

Huang

,

P.-S.

,

Avron

,

H.

,

Sainath

,

T.N.

,

Sindhwani

,

V.

and

Ramabhadran

,

B.

(

2014

), “

Kernel methods match deep neural networks on Timit

”,

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

,

IEEE

, pp.

205

-

209

.

Google Scholar

Crossref

Hwang

,

D.Y.

,

Taha

,

B.

and

Hatzinakos

,

D.

(

2021

), “

PBGAN: learning PPG representations from GAN for time-stable and unique verification system

”,

IEEE Transactions on Information Forensics and Security

, Vol.

16

, pp.

5124

-

5137

.

Google Scholar

Crossref

Intakes SCotSEoDR, Electrolytes PoDRIf, Water

(

2005

),

Dietary Reference Intakes for Water, Potassium, Sodium, Chloride, and Sulfate

,

National Academies Press

.

Kiyasseh

,

D.

,

Tadesse

,

G.A.

,

Thwaites

,

L.

,

Zhu

,

T.

and

Clifton

,

D.

(

2020

), “

PlethAugment: GAN-based PPG augmentation for medical diagnosis in low-resource settings

”,

IEEE Journal of Biomedical and Health Informatics

, Vol.

24

No.

11

, pp.

3226

-

3235

.

Google Scholar

Crossref

PubMed

Kotsovsky

,

V.

and

Batyuk

,

A.

(

2022

), “

Feed-forward neural network classifiers with bi-threshold-like activations

”,

2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT)

,

IEEE

, pp.

9

-

12

.

Google Scholar

Kumar

,

T.

,

Brennan

,

R.

,

Mileo

,

A.

and

Bendechache

,

M.

(

2024

), “

Image data augmentation approaches: a comprehensive survey and future directions

”,

IEEE Access

, Vol.

12

.

Google Scholar

Leung

,

K.M.

(

2007

),

Naive Bayesian Classifier

,

Polytechnic University Department of Computer Science/Finance and Risk Engineering

, pp.

123

-

156

.

Google Scholar

Li

,

X.

,

Metsis

,

V.

,

Wang

,

H.

and

Ngu

,

A.H.H.

(

2022

), “

TTS-GAN: a transformer-based time-series generative adversarial network

”,

International Conference on Artificial Intelligence in Medicine

,

Springer

, pp.

133

-

143

.

Google Scholar

Crossref

Lin

,

C.-L.

,

Tseng

,

C-K.

,

Wang

,

J.-R.

,

Chao

,

S.-H.

,

Hwang

,

Y.-S.

and

Kau

,

L.-J.

(

2023

), “

Development of smart cardiovascular measurement system using feature selection and machine learning models for prediction of sleep deprivation, cold hands and feet, and Shanghuo syndrome

”,

Measurement

, Vol.

221

, p.

113441

.

Google Scholar

Crossref

Liu

,

Q.

,

Yang

,

C.

,

Yang

,

S.

,

Kwong

,

C.F.

,

Wang

,

J.

and

Zhou

,

N.

(

2024

), “

Photoplethysmography-based non-invasive blood pressure monitoring via ensemble model and imbalanced data set processing

”,

Physical and Engineering Sciences in Medicine

, Vol.

47

No.

4

, pp.

1

-

15

.

Google Scholar

Crossref

Lopez-Paz

,

D.

and

Oquab

,

M.

(

2016

), “

Revisiting classifier two-sample tests

”,

arXiv preprint

arXiv:161006545

.

Google Scholar

McKight

,

P.E.

and

Najab

,

J.

(

2010

), “

Kruskal‐Wallis test. The Corsini encyclopedia of psychology

”, pp.

1

-

10

, doi:

https://doi.org/10.1002/9780470479216.corpsy0491

.

Google Scholar

Mucherino

,

A.

,

Papajorgji

,

P.J.

and

Pardalos

,

P.M.

(

2009

), “K-nearest neighbor classification”,

Data Mining in Agriculture

,

Springer

, pp.

83

-

106

.

Google Scholar

Crossref

Noakes

,

T.D.

(

2007

), “

Does dehydration impair exercise performance?

”,

Medicine and Science in Sports and Exercise

, Vol.

39

, pp.

1209

-

1217

.

Google Scholar

PubMed

Nose

,

H.

,

Mack

,

G.W.

,

Shi

,

X.

and

Nadel

,

E.R.

(

1988

), “

Role of osmolality and plasma volume during rehydration in humans

”,

Journal of Applied Physiology (Bethesda, Md.: 1985)

, Vol.

65

No.

1

, pp.

325

-

331

.

Google Scholar

Crossref

PubMed

Rahman

,

M.M.

,

Rivolta

,

M.W.

,

Badilini

,

F.

and

Sassi

,

R.

(

2023

), “

A systematic survey of data augmentation of ECG signals for AI applications

”,

Sensors

, Vol.

23

No.

11

, p.

5237

.

Google Scholar

Crossref

PubMed

Rogova

,

G.

(

2008

), “Combining the results of several neural network classifiers”,

Classic Works of the Dempster-Shafer Theory of Belief Functions

,

Springer

, pp.

683

-

692

.

Google Scholar

Crossref

Safavian

,

S.R.

and

Landgrebe

,

D.

(

1991

), “

A survey of decision tree classifier methodology

”,

IEEE Transactions on Systems, Man, and Cybernetics

, Vol.

21

No.

3

, pp.

660

-

674

.

Google Scholar

Crossref

Sagi

,

O.

and

Rokach

,

L.

(

2018

), “

Ensemble learning: a survey

”,

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

, Vol.

8

No.

4

, p.

e1249

.

Google Scholar

Şengür

,

D.

and

Karabatak

,

S.

(

2018

), “

Data mining techniques based students achievements analysis

”,

Turkish Journal of Science and Technology

, Vol.

13

, pp.

53

-

59

.

Google Scholar

Şengür

,

D.

and

Turhan

,

M.

(

2018

), “

Prediction of the action identification levels of teachers based on organizational commitment and job satisfaction by using k-nearest neighbors method

”,

Turkish Journal of Science and Technology

, Vol.

13

, pp.

61

-

68

.

Google Scholar

Sengur

,

D.

,

Turhan

,

M.

and

Karabatak

,

S.

(

2018

), “

Prediction of the school administrators, who attended an action learning course, based on their conflict-handling styles: a data mining approach

”,

International Online Journal of Educational Sciences

, Vol.

10

.

Google Scholar

Shorten

,

C.

and

Khoshgoftaar

,

T.M.

(

2019

), “

A survey on image data augmentation for deep learning

”,

Journal of Big Data

, Vol.

6

No.

1

, pp.

1

-

48

.

Google Scholar

Crossref

Ting

,

S.

,

Ip

,

W.

and

Tsang

,

A.H.

(

2011

), “

Is naive Bayes a good classifier for document classification

”,

International Journal of Software Engineering and Its Applications

, Vol.

5

, pp.

37

-

46

.

Google Scholar

Zeebaree

,

D.Q.

,

Haron

,

H.

,

Abdulazeez

,

A.M.

and

Zebari

,

D.A.

(

2019

), “

Machine learning and region growing for breast cancer segmentation

”,

2019 International Conference on Advanced Science and Engineering (ICOASE)

,

IEEE

, pp.

88

-

93

.

Google Scholar

Crossref

Supplementary material

The supplementary material for this article can be found online.

2026

Emerald Publishing Limited

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence

Signal processing and machine learning approaches for drinking behavior detection using PPG waveforms

1. Introduction

2. Related work

3. Methodology

3.1 Data collection

3.1.1 Participants and drinking behavior measurement.

3.1.2 Photoplethysmography signal acquisition.

3.2 Signal preprocessing and feature extraction

3.2.1 Beat detection.

3.2.2 Artifact detection and peak refinement.

3.2.3 Signal quality assessment.

3.2.4 Derivative calculations.

3.2.5 Fiducial point identification.

3.2.6 Pulse wave indices.

3.3 Data augmentation

3.3.1 Synthetic minority over-sampling technique.

3.3.2 Generative adversarial network -based augmentation.

3.3.3 Image-based augmentation.

3.3.4 Additional resampling strategies.

3.4 Model selection and evaluation

3.5 Statistical analysis

4. Results

4.1 Photoplethysmography signal preprocessing, normalization and peak detection

4.2 Feature importance and relationship with hydration levels

4.3 Classification performance with augmentation strategies

5. Discussion

5.1 Interpretation of photoplethysmography features in relation to drinking behavior

5.2 Ground truth considerations

5.3 Impact of data augmentation on classification

6. Conclusion and future work

References

Supplementary material

Email Alerts

Cited By

Signal processing and machine learning approaches for drinking behavior detection using PPG waveforms Open Access

1. Introduction

2. Related work

3. Methodology

3.1 Data collection

3.1.1 Participants and drinking behavior measurement.

3.1.2 Photoplethysmography signal acquisition.

3.2 Signal preprocessing and feature extraction

3.2.1 Beat detection.

3.2.2 Artifact detection and peak refinement.

3.2.3 Signal quality assessment.

3.2.4 Derivative calculations.

3.2.5 Fiducial point identification.

3.2.6 Pulse wave indices.

3.3 Data augmentation

3.3.1 Synthetic minority over-sampling technique.

3.3.2 Generative adversarial network -based augmentation.

3.3.3 Image-based augmentation.

3.3.4 Additional resampling strategies.

3.4 Model selection and evaluation

3.5 Statistical analysis

4. Results

4.1 Photoplethysmography signal preprocessing, normalization and peak detection

4.2 Feature importance and relationship with hydration levels

4.3 Classification performance with augmentation strategies

5. Discussion

5.1 Interpretation of photoplethysmography features in relation to drinking behavior

5.2 Ground truth considerations

5.3 Impact of data augmentation on classification

6. Conclusion and future work

References

Supplementary material

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Signal processing and machine learning approaches for drinking behavior detection using PPG waveforms