Spam detection using sentence-BERT with attention-enhanced GRU and SVM

Boussaad, Leila

doi:10.1108/ACI-02-2025-0035

Purpose

This paper introduces a cutting-edge framework for spam detection that integrates Sentence-BERT embeddings, an attention-augmented Gated Recurrent Unit (GRU) and a Support Vector Machine (SVM) for refined classification.

Design/methodology/approach

Our approach involves generating richly contextualized embeddings via Sentence-BERT. These embeddings are processed by a GRU network enhanced with a self-attention mechanism to effectively capture intricate long-range dependencies and salient features. The final classification is performed by an SVM, leveraging its robust capacity for binary decision-making between spam and non-spam categories. Additionally, data augmentation is employed using the T5 model to generate paraphrased spam instances, enhancing dataset diversity and model robustness. The evaluation framework includes precision, recall, and F-measure metrics for comprehensive validation.

Findings

The proposed model demonstrates exceptional performance, achieving a notable accuracy of 99.88%, with a 95% confidence interval of [99.70%, 99.99%], highlighting its strong effectiveness in spam detection.

Originality/value

The integration of Sentence-BERT, attention-augmented GRU, and SVM presents a novel combination for spam detection, effectively capturing intricate dependencies and enhancing classification accuracy. The use of T5-based data augmentation further fortifies the model's robustness, offering valuable insights into advanced spam detection methodologies.

1. Introduction

Emails have become a fundamental medium for communication, yet spam remains a widespread problem that affects both email and text messaging. Fraudulent actors often use deceptive tactics, such as familiar area codes or impersonation, to extract personal or financial information, a practice now termed “smishing” [1]. Recent data reveal a dramatic increase in spam activity, with reported messages increasing from 1.27 million in September 2021 to 10.89 billion in August 2022, and financial losses reaching USD 10 billion in 2021 [1]. These trends highlight the urgent need for advanced detection strategies to protect digital communications.

Various methods have been explored to combat SMS spam. Traditional approaches, such as blacklisting and heuristic filters, offer basic protection but are often inadequate against sophisticated attacks. Recent efforts emphasize artificial intelligence, using machine learning to analyze and classify messages more effectively. Success in AI-based detection is highly dependent on text representation, feature extraction, and classification algorithms. Previous studies have also demonstrated the benefits of hybrid feature selection [2] and content-based filtering [3] in enhancing spam detection performance.

This article presents a significant advancement in spam detection, utilizing advanced text analysis methods. Our approach leverages Sentence-BERT embeddings, an attention-enhanced Gated Recurrent Unit (GRU) network, and a Support Vector Machine (SVM) for classification. By capturing both contextual and sequential dependencies in the text, our method processes these features through the GRU for enhanced performance. Furthermore, we use the T5 model for data augmentation to improve the balance and robustness of the dataset. Our key contributions include.

Sentence-BERT Embeddings: We generate rich contextual embeddings using Sentence-BERT, capturing nuanced semantic relationships within the text.
Attention-Enhanced GRU: By integrating an attention mechanism into a GRU network, our model better identifies important sequential patterns that distinguish spam from ham.
SVM Classification: A Support Vector Machine is used for the final classification, capitalizing on the expressive features extracted by the GRU–attention architecture.
Data Augmentation with T5: We apply the T5 model to augment the dataset with paraphrased spam messages, enhancing balance and robustness in representing diverse spam expressions.

2. Related works

Numerous studies have addressed phishing and smishing detection using various machine learning and deep learning methods. In Ref. [4], the “SmiDCA” model extracts 39 features from smishing messages, achieving accuracies of 96.40% for English and 90.33% for non-English messages, and maintaining an accuracy of 96%, 16% after dimensionality reduction. Likewise [5], presents a comprehensive model targeting smartphone vulnerabilities, combining an SMS content analyzer (Naive Bayes), a URL filter, a source code analyzer, and an APK download detector, achieving 96.29% accuracy.

Joo et al. [6] integrate message monitor, content analyzer, classifier, and knowledge base for spam detection. Deep learning approaches in Refs. [7, 8] use CNN and LSTM models, achieving accuracy above 98%. A Hidden Markov Model with word weighting, as proposed in Ref. [9], captures distributional patterns for effective spam filtering. More recent architectures, including Transformers [10] and Bi-GRU-CNN hybrids [11], leverage contextual and sequential modeling to achieve 99% accuracy.

Traditional models still show strong performance. For example, combining Multinomial Naive Bayes with Linear SVC [12] achieves 98.38% accuracy after preprocessing, while a GRU-based classifier [13] reaches 99%. These results highlight the continued effectiveness of both conventional and neural methods.

In parallel, robust feature selection and content-based filtering have gained prominence. Tarek and Abd-El-Hafeez [2] introduces a hybrid feature selection method with automatic threshold detection, while [14] leverages frequent and associated itemsets for improved classification. Mahmoud and Abd-El-Hafeez [15] extends this approach by optimizing the selection of correlated frequent items.

Content-based filtering has also been applied in security contexts. Studies [3, 16] show its effectiveness in detecting inappropriate multimedia content, with a focus on precision, scalability, and real-time processing; principles that can also benefit spam detection. Other work includes extracting images and URLs for web sanitization [17] and using interactive knowledge graphs for topic extraction [18], improving both semantic transparency and classifier interpretability.

Recent studies reflect a growing emphasis on hybrid spam detection systems combining deep semantic embeddings and classical classifiers. Sharma et al. [19] reviewed deep learning applications in sentiment and spam classification, highlighting the effectiveness of architectures such as GRU-SVM and BERT-SVM. In a related direction, Lopez-Joya et al. [20] analyzed the internal structure of social bots and proposed a Bi-GRU model enhanced with self-attention and interpretability layers. Batiuk and Dosyn [21] combined Sentence-BERT embeddings with XGBoost for textual classification in social networks, reinforcing the robustness of semantic feature extraction. Additionally, Sayeed and Dutta [22] applied BERT and GRU architectures for detecting malicious URLs in brushing scam campaigns, showing the adaptability of these models to cybersecurity threats.

Building on these prior advances, our work proposes a unified spam detection framework that combines recent innovations in contextual embeddings, sequential modeling, and classification strategy. Our proposed framework brings together three complementary components: (1) Sentence-BERT embeddings to capture rich and contextual semantic information, (2) a GRU enhanced with self-attention to better model sequential patterns and highlight important features, and (3) an SVM classifier that replaces softmax for more refined decision boundaries. This combination, further supported by T5-based data augmentation, forms a unified architecture that offers strong performance and improved generalization. To the best of our knowledge, this specific integration has not yet been explored in the context of spam detection.

3. Background

3.1 Sentence-BERT model

Sentence-BERT (SBERT) [23] extends BERT [24] by using siamese and triplet architectures to produce fixed-size, semantically meaningful sentence embeddings; typically of 384 dimensions. Unlike BERT, which outputs token-level embeddings requiring aggregation, SBERT is fine-tuned to generate sentence vectors directly, optimized for tasks like semantic textual similarity (STS), clustering, and retrieval.

With around 110 million parameters in its base version, SBERT maps similar sentences close together in the embedding space, enabling efficient comparison. It combines semantic precision with computational efficiency and has shown strong performance across various NLP tasks [25, 26].

3.2 XLNet model

XLNet [27] integrates autoregressive and autoencoding objectives via Permutation Language Modeling (PLM), which maximizes the expected log-likelihood over all word orderings. This removes the need for [MASK] tokens and allows bidirectional context without data corruption.

Based on a transformer with 340 million parameters and 768-dimensional embeddings, XLNet effectively captures long-range dependencies. It excels in tasks like text classification, translation, and sentiment analysis [28–30], and its permutation-based training enhances contextual understanding.

3.3 Gated Recurrent Unit (GRU)

The Gated Recurrent Unit (GRU) [31] is a recurrent neural network (RNN) variant designed to capture long-range dependencies while mitigating the vanishing gradient problem. Unlike traditional RNNs, GRUs employ gating mechanisms; specifically, an update gate and a reset gate; that regulate the flow of information, as illustrated in Figure 1.

Figure 1

A diagram shows the structure of a GRU cell with gates, activation functions, operations, and output. .

View large Download slide

The diagram shows the basic structure of a G R U cell. On the left, the inputs are labeled h subscript t minus 1 at the top and x subscript t at the bottom. From h subscript t minus 1, an orange arrow flows downward and splits: one branch moves to a multiplication node labeled “asterisk” with a blue arrow from U subscript r, feeding into a gray box labeled “sigma.” Another branch continues rightward to a multiplication node labeled “asterisk” with an incoming blue arrow from U subscript z, also leading into a gray box labeled “sigma.” The output of the first “sigma” is labeled r subscript t, and the output of the second “sigma” is labeled z subscript t. A blue arrow from x subscript t flows rightward and splits: one branch connects to W subscript r, feeding into the first “sigma,” and another connects to W subscript z, feeding into the second “sigma.” A further blue arrow from x subscript t flows rightward toward label W and into a gray box labeled “tanh.” From h subscript t minus 1, an orange arrow also flows rightward to a multiplication node labeled “asterisk” with input r subscript t, feeding into the “tanh” box. The output from “tanh” is labeled h prime subscript t. A dark blue arrow labeled U also enters the “tanh” box. The output of h prime subscript t is multiplied by z subscript t and also connected with a subtraction labeled “negative 1,” forming part of the next calculation. The upper path shows an orange arrow from h subscript t minus 1 going rightward into a multiplication node labeled “asterisk,” combining with z subscript t, and flowing into a summation node labeled “plus.” The output of the summation node is labeled h subscript t, with two orange arrows exiting on the right and above.

The basic structure of a GRU cell. Source: Adapted from Vasilev (2019) [32]

At each time step t, the reset gate r_t determines how much of the previous activation h_t−1 to consider:

r_{t} = σ (W_{r} x_{t} + U_{r} h_{t - 1} + b_{r})

(1)

Using r_t, the candidate activation ${\tilde{h}}_{t}$ is computed as:

{\tilde{h}}_{t} = \tanh (W_{h} x_{t} + r_{t} ⊙ (U_{h} h_{t - 1}) + b_{h})

(2)

The update gate z_t then controls how much of ${\tilde{h}}_{t}$ is used versus the previous activation:

z_{t} = σ (W_{z} x_{t} + U_{z} h_{t - 1} + b_{z})

(3)

Finally, the new activation h_t is obtained by blending h_t−1 and ${\tilde{h}}_{t}$ ⁠:

h_{t} = (1 - z_{t - 1}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t}

(4)

GRUs offer efficient training due to their simplified structure, achieving performance comparable to LSTMs without separate memory cells. They are widely adopted in applications like natural language processing [33], speech recognition [34], and time series analysis [35].

3.4 Attention mechanism

The attention mechanism [36] plays a crucial role in deep learning models, particularly in NLP and machine translation, by enabling selective focus on different segments of an input sequence.

It operates by computing attention scores based on the similarity between Query and Key vectors, which are normalized through a softmax function and used to weight the Value vectors. The weighted sum forms the output, emphasizing the most relevant parts of the input [36].

This mechanism enhances the model's ability to manage long-range dependencies and intricate relationships, significantly improving tasks like sequence-to-sequence translation [37].

4. Proposed approach description

The spam detection approach outlined in this study comprises a systematic four-step process, as illustrated in Figure 2.

Figure 2

A flowchart shows modules of a spam detection approach with data augmentation, representations, modeling, and classification.

View large Download slide

The diagram shows a horizontal flowchart with different modules involved in the proposed spam detection approach. The flowchart starts from the left with an icon of an envelope with a notification mark labeled “Text message.” A rightward arrow points to a text box labeled “1- Data augmentation.” Another rightward arrow points to the next text box labeled “2- Sentence-Bert contextualized representations.” A rightward arrow connects to the next text box labeled “3- Sequential modeling with G R U and self-attention.” A downward arrow points to another text box labeled “4- Classification (SVM).” A leftward arrow leads to a section labeled “Prediction:” with two bullet points below it: “Spam” and “Not spam.”

Different modules involved in the proposed spam detection approach. Source: Created by the author

Beginning with an initial preprocessing and data augmentation step (Step 1 in Figure 2), text messages are augmented using the T5 model and undergo thorough cleaning to enhance dataset balance and integrity.
In the subsequent stage (Step 2 in Figure 2), contextualized representations or embeddings are generated for the augmented data using Sentence-BERT. This crucial task captures deep semantic nuances and contextual dependencies effectively.
Advancing to Step 3 in Figure 2, the obtained enriched embeddings serve as input for a Gated Recurrent Unit (GRU) network with an attention mechanism. The GRU network processes these embeddings to capture sequential relationships within the text data, further enhancing the model's understanding of dependencies between words.
The final stage (Step 4 in Figure 2) involves the classification of the processed embeddings using a Support Vector Machine (SVM). This step determines whether a message is spam or ham based on the features extracted by the preceding stages.

For a comprehensive understanding of each stage in the proposed approach, detailed descriptions are provided in the subsequent subsections.

4.1 Data augmentation

To mitigate class imbalance, we applied the T5 model [38] to generate paraphrased versions of spam messages. Known for text-to-text transformations, T5 rephrased original texts into semantically equivalent variants. This enriched the spam class, improved balance, and increased the model's robustness by exposing it to diverse spam patterns.

4.2 Sentence-BERT representation

To encode messages, we used Sentence-BERT (SBERT) [23], which generates contextualized embeddings that capture semantic relationships. Specifically, we employed the sentence-transformers/bert-base-nli-mean-tokens model, fine-tuned for semantic similarity and well-suited for spam detection.

Let T be a text message and GenerateSentenceBertEmbeddings the embedding function. The resulting vector is:

Embedding = GenerateSentenceBertEmbeddings (T)

(5)

This yields a 384-dimensional embedding that encodes semantic nuances of T, facilitating accurate classification.

To prepare embeddings for downstream processing, we applied z-score normalization:

NormalizedEmbedding = \frac{Embedding - μ}{σ}

(6)

where μ and σ are the mean and standard deviation of the embedding vector. This standardization ensures zero mean and unit variance, improving model performance.

4.3 Sequential modeling with GRU and self-attention mechanism:

The embeddings generated by Sentence-BERT and normalized serve as inputs to a Gated Recurrent Unit (GRU) network, which is enhanced with an attention mechanism. The self-attention mechanism is integrated into the GRU to improve the model's focus on relevant parts of the embedding sequence.

Below is a simplified schematic of the self-attention mechanism integrated within the GRU, using the embeddings generated by Sentence-BERT as input sequences:

Algorithm 1.

Key component of the proposed architecture:

Input: 384-dimensional normalized embeddings generated by Sentence-BERT.

Query, Key, Value Vectors:

Compute Query vector Q, Key vector K, and Value vector V from input embeddings.

Attention Scores:

Compute attention scores A as softmax(Q ⋅ K^⊤).

Weighted Values:

Compute weighted values WV as A ⋅ V.

GRU Processing:

Feed weighted values WV into the GRU for sequential modeling.

Output:

Use the final embeddings, enhanced by self-attention, for classification by the SVM.

4.4 Final classification with SVM

The final classification is handled by a Support Vector Machine (SVM) rather than a softmax or neural classifier. This choice leverages the SVM's strength in binary classification with high-dimensional embeddings like those from Sentence-BERT. As a margin-based method, SVMs offer strong generalization with minimal tuning and are less prone to overfitting. Decoupling feature extraction (via attention-enhanced GRU) from classification improves modularity and interpretability. SVMs also provide theoretical robustness and resilience to noise; key advantages in spam detection, where misclassification can be costly.

4.5 Hyperparameter tuning

We utilized ResearchGridSV for hyperparameter tuning to determine the optimal configuration for our models. Table 1 summarizes the selected hyperparameters and their respective values for each component of the proposed model.

Table 1

Hyperparameters used for the GRU, self-attention, and SVM models

Model component	Hyperparameter	Value
GRU	Number of units	128
	Dropout rate	0.5
	Batch size	64
Self-attention	Attention heads	8
SVM	Kernel	Linear
SVM	C	0.1
Training	Learning rate	0.001
Training	Epochs	10

Source(s): Created by the author

To determine these values, we explored a range of configurations using grid search on the validation splits. For example, we tested GRU units in {64, 128, 256}, dropout rates in {0.2, 0.5}, and SVM C values in {0.01, 0.1, 1.0}. Across these settings, the model's performance remained consistently high, with only minor variations in F1-score and accuracy. This stability suggests that the proposed architecture is robust to reasonable changes in hyperparameter values.

5. Experimental evaluation

5.1 Dataset description

We used the publicly available SMS Spam Collection Dataset [39], which contains 5,572 English SMS messages (747 spam, 4,825 ham). This benchmark, compiled by Almeida et al. [39], is widely used for spam detection.

An initial cleaning step removed 403 exact duplicates, resulting in 5,169 messages (747 spam, 4,422 ham). Due to class imbalance (13.4% spam), we augmented the spam class using the T5 transformer model for paraphrasing, increasing the dataset to 8,560 messages (3,735 spam, 4,825 ham). A second cleaning removed 347 duplicate or near-duplicate entries introduced during augmentation, yielding a final set of 8,213 unique messages (3,697 spam, 4,516 ham). This balanced distribution (approximately 45% spam) helps reduce bias during training.

All preprocessing was completed before data splitting. We used stratified 5-fold cross-validation to preserve class ratios in each fold. A summary of the dataset statistics before and after augmentation is provided in Table 2.

Table 2

Dataset statistics before and after augmentation and cleaning

Dataset version	Total messages	Spam	Ham
Original (raw)	5,572	747	4,825
After initial cleaning	5,169	747	4,422
After augmentation	8,560	3,735	4,825
Final version (cleaned again)	8,213	3,697	4,516

Source(s): Created by the author

5.2 Evaluation metrics

We evaluated model performance using five standard classification metrics: accuracy, precision, recall, F1-score, and the ROC curve with AUC, alongside the confusion matrix for detailed class-level insights.

Accuracy: Proportion of correct predictions.

Accuracy = \frac{TP + TN}{Total Predictions}

(7)

Precision: Proportion of correctly predicted positives.

Precision = \frac{TP}{TP + FP}

(8)

Recall (Sensitivity): Proportion of actual positives correctly identified.

Recall = \frac{TP}{TP + FN}

(9)

F1-Score: Harmonic mean of precision and recall.

F 1 - Score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(10)

Confusion Matrix: Cross-tabulates predicted vs. actual labels, highlighting classification errors.
ROC Curve & AUC: The ROC curve plots recall vs. false positive rate. The AUC quantifies class separability; higher values indicate better discrimination.

5.3 Experimental methodology description

We evaluated the generalization ability of our spam detection model using 5-fold stratified cross-validation, which splits the dataset into five equal parts while preserving class distribution. In each fold, four parts are used for training and one for testing, ensuring every sample is used for both purposes. This strategy maximizes data utilization, exposes the model to varied data distributions, reduces overfitting, and yields stable performance metrics by averaging results across folds.

Beyond the core evaluation, we conducted targeted experiments:

Individual Configuration Evaluation: We tested GRU alone, GRU with attention, and GRU with attention combined with SVM. LightGBM and XGBoost were also assessed as alternative classifiers with GRU + attention. Hyperparameters for each classifier were tuned via grid search.
Embedding Comparison: We compared XLNet and SBERT embeddings across configurations, highlighting SBERT's contribution to performance gains.
Holistic Model Evaluation: All configurations and embeddings were assessed using accuracy, precision, recall, and F1-score, providing a comprehensive performance overview.

This evaluation framework ensures consistent assessment across all configurations and embedding strategies.

6. Experimental results and discussion

The results from our experiments, summarized in Table 3, provide detailed information on accuracy, precision, recall, and F1-score for each configuration and embedding.

Table 3

Classifier accuracies with SBERT and XLNet embeddings, in %

Embedding	Classifier	Accuracy	Precision	Recall	F1-score
XLNet	C1	99.09	99.73	98.24	98.98
	C2	99.57	100	99.05	99.52
	C3	99.03	99.86	97.97	98.91
	C4	99.09	100	97.97	98.98
	C5	99.76	99.86	99.59	99.73
SBERT	C1	99.57	99.86	99.19	99.53
	C2	99.63	100	99.19	99.59
	C3	99.39	99.86	98.78	99.32
	C4	99.63	100	99.19	99.59
	C5	99.88	100	99.73	99.86

Embedding	Classifier	Accuracy	Precision	Recall	F1-score
XLNet	C1	99.09	99.73	98.24	98.98
	C2	99.57	100	99.05	99.52
	C3	99.03	99.86	97.97	98.91
	C4	99.09	100	97.97	98.98
	C5	99.76	99.86	99.59	99.73
SBERT	C1	99.57	99.86	99.19	99.53
	C2	99.63	100	99.19	99.59
	C3	99.39	99.86	98.78	99.32
	C4	99.63	100	99.19	99.59
	C5	99.88	100	99.73	99.86

Note(s): C1: GRU; C2: GRU with attention; C3: GRU + Attention + LightGBM; C4: GRU + Attention + XGBoost; C5: GRU + Attention + SVM

Source(s): Created by the author

6.1 Discussion

Based on the results presented in Table 3, several insights can be drawn regarding the performance of classifiers with SBERT and XLNet embeddings:

The GRU model, coupled with XLNet embeddings, achieves an impressive accuracy of 99.09%. Despite this high accuracy, there is a slight trade-off between precision (99.73%) and recall (98.24%). This indicates that while the model is highly accurate in identifying spam, it may miss a few instances. On the other hand, when paired with SBERT embeddings, the GRU model demonstrates a higher accuracy of 99.57%. The precision (99.86%) and recall (99.19%) are well-balanced, suggesting a more effective spam detection model with SBERT.
The GRU with attention model, when using XLNet embeddings, shows excellent performance with an accuracy of 99.57%. The precision is perfect at 100%, and the recall is also high at 99.05%, making it a robust choice for spam detection. Similarly, with SBERT embeddings, the GRU with attention model maintains high performance with an accuracy of 99.63%, a perfect precision of 100%, and a recall of 99.19%.
The combination of GRU with attention and LightGBM, in combination with XLNet embeddings, exhibits a high precision of 99.86% but at the cost of a lower recall of 97.97%. This implies that while it effectively identifies spam, it may produce some false negatives. When paired with SBERT embeddings, GRU with attention and LightGBM performs better with an accuracy of 99.39%, a balanced precision of 99.86%, and an improved recall of 98.78%.
Similarly to the combination of GRU with attention and LightGBM, the combination of GRU with attention and XGBoost with XLNet embeddings maintains a strong precision (100%) but shows a lower recall (97.97%). This balanced performance may still miss some spam messages. With SBERT embeddings, GRU with attention and XGBoost shows strong performance with an accuracy of 99.63%, and balanced precision (100%) and recall (99.19%).
The combination of GRU with attention and SVM with XLNet embeddings shows exceptional performance, achieving the highest accuracy of 99.76% and a balanced precision-recall trade-off (99.86% and 99.59%, respectively).
The proposed method, leveraging SBERT embeddings, outperforms all other experiments, achieving an outstanding accuracy of 99.88% and a harmonious precision-recall balance (100% and 99.73%, respectively). This indicates the superior capability of the proposed method in accurately detecting spam with minimal false negatives.

Also, the Figures 3–5 provide a clear overview of the achieved results.

Figure 3

A scatter plot compares the accuracy of Sentence-B E R T and X L Net across five configurations.

View large Download slide

The plot is titled “Comparison of Accuracy among Configurations using Sentence - B E R T and X L Net.” The horizontal axis is labeled “Configurations” and is marked with five categories from left to right as follows: “C 1,” “C 2,” “C 3,” “C 4,” and “C 5.” The vertical axis is labeled “Accuracy” and ranges from 98.75 to 100.25 in increments of 0.25 units. A horizontal dashed line is drawn at 99.88. The graph shows two sets of data points, red squares and blue circles. A legend at the bottom right indicates that the red squares represent “Sentence - B E R T,” blue circles represent “X L Net,” and the dashed line represents “Best Accuracy: 99.88 percent.” The coordinates of all the data points are as follows: The coordinates of all the factors are as follows: At C 1, Sentence - B E R T lies at 99.57 percent, and X L Net lies at 99.09 percent. At C 2, Sentence - B E R T lies at 99.63 percent, and X L Net lies at 99.57 percent. At C 3, Sentence - B E R T lies at 99.39 percent, and X L Net lies at 99.03 percent. At C 4, Sentence - B E R T lies at 99.63 percent, and X L Net lies at 99.09 percent. At C 5, Sentence - B E R T lies at 99.88 percent, and X L Net lies at 99.76 percent. To the right of the graph, the configurations are described as follows: “C 1: G R U,” “C 2: G R U plus Attention,” “C 3: G R U plus Attention plus Light G B M,” “C 4: G R U plus Attention plus X G Boost,” and “C 5: G R U plus Attention plus S V M.”

Accuracy comparison: SBERT vs. XLNet embeddings. Source: Created by the author

Figure 4

A scatter plot compares the precision of Sentence-B E R T and X L Net across five configurations.

View large Download slide

The plot is titled “Comparison of Precision among Configurations using Sentence - B E R T and X L Net.” The horizontal axis is labeled “Configurations” and is marked with five categories from left to right as follows: “C 1,” “C 2,” “C 3,” “C 4,” and “C 5.” The vertical axis is labeled “Precision” and ranges from 99.4 to 100.4 in increments of 0.2 units. A horizontal dashed line is drawn at “100.” The graph shows two sets of data points, red squares and blue circles. A legend at the bottom right indicates that the red squares represent “Sentence - B E R T,” blue circles represent “X L Net,” and the dashed line represents “Best Precision: 100 percent.” The coordinates of all the data points are as follows: At C 1, Sentence - B E R T lies at 99.86 percent, and X L Net lies at 99.73 percent. At C 2, both Sentence - B E R T and X L Net lie at 100 percent. At C 3, both Sentence - B E R T and X L Net lie at 99.86 percent. At C 4, both Sentence - B E R T and X L Net lie at 100 percent. At C 5, Sentence - B E R T lies at 100 percent, and X L Net lies at 99.86 percent. To the right of the graph, the configurations are described as follows: “C 1: G R U,” “C 2: G R U plus Attention,” “C 3: G R U plus Attention plus Light G B M,” “C 4: G R U plus Attention plus X G Boost,” and “C 5: G R U plus Attention plus S V M.”

Precision comparison: SBERT vs. XLNet embeddings. Source: Created by the author

Figure 5

A scatter plot compares recall of Sentence-B E R T and X L Net across five configurations.

View large Download slide

The plot is titled “Comparison of Recall among Configurations using Sentence - B E R T and X L Net.” The horizontal axis is labeled “Configurations” and is marked with five categories from left to right as follows: “C 1,” “C 2,” “C 3,” “C 4,” and “C 5.” The vertical axis is labeled “Recall” and ranges from 97.5 to 100.0 in increments of 0.5 units. A horizontal dashed line is drawn at “99.73.” The graph shows two sets of data points, red squares and blue circles. A legend at the bottom left indicates that the red squares represent “Sentence - B E R T,” blue circles represent “X L Net,” and the dashed line represents “Best Recall: 99.73 percent.” The coordinates of all the data points are as follows: At C 1, Sentence - B E R T lies at 99.19 percent, and X L Net lies at 98.24 percent. At C 2, Sentence - B E R T lies at 99.19 percent, and X L Net lies at 99.05 percent. At C 3, Sentence - B E R T lies at 98.78 percent, and X L Net lies at 97.97 percent. At C 4, Sentence - B E R T lies at 99.19 percent, and X L Net lies at 97.97 percent. At C 5, Sentence - B E R T lies at 99.73 percent, and X L Net lies at 99.59 percent. To the right of the graph, the configurations are described as follows: “C 1: G R U,” “C 2: G R U plus Attention,” “C 3: G R U plus Attention plus Light G B M,” “C 4: G R U plus Attention plus X G Boost,” and “C 5: G R U plus Attention plus S V M.”

Recall comparison: SBERT vs. XLNet embeddings. Source: Created by the author

From Figures 3–5, several key observations can be made about the performance of different configurations using SBERT and XLNet embeddings:

SBERT embeddings consistently outperform XLNet in precision, recall, and accuracy, confirming their effectiveness for spam detection.
GRU with attention delivers strong, reliable results with both embedding types, demonstrating its versatility in sequential modeling.
Replacing the softmax layer in GRU with attention by SVM, LightGBM, or XGBoost further improves performance, offering a more effective approach to spam classification.

These findings underscore the efficacy of the proposed method, particularly when leveraging SBERT embeddings, showcasing its potential for advanced spam detection applications.

Further, we plotted the Receiver Operating Characteristic (ROC) curve and calculated the Area Under the Curve (AUC), as shown in Figure 6. The AUC value for our model is 0.99.

Figure 6

A line graph shows the R O C curve with an orange line at the top, a diagonal dashed line, and an area of 0.99.

View large Download slide

The horizontal axis is labeled “False Positive Rate” and ranges from 0.0 to 1.0 in increments of 0.2 units. The vertical axis is labeled “True Positive Rate” and ranges from 0.0 to 1.0 in increments of 0.2 units. The graph shows two lines: a solid line and a dashed line. A legend at the bottom right indicates that the solid line represents “R O C curve (area equals 0.99).” The solid line begins at (0.0, 0.0), rises steeply upward to (0.0, 0.99), and then runs horizontally near the top to end at (1.0, 1.0). The dashed line begins at (0.0, 0.0) and increases linearly to end at (1.0, 1.0). Note: All numerical data values are approximated.

ROC curve for the spam detection model. Source: Created by the author

The near-perfect AUC score of 0.99 highlights the robustness and reliability of our model in accurately identifying spam messages. This exceptional AUC value demonstrates the model's strong discriminative ability, confirming its effectiveness in distinguishing between spam and non-spam messages.

Furthermore, to gain deeper insight into the classification results, confusion matrices for the GRU with attention model using different classifiers (SVM, LightGBM, XGBoost) are presented in Figure 7. This figure compares the results for SBERT and XLNet embeddings.

Figure 7

A figure of ten set confusion matrices compares Sentence-B E R T and X L Net embeddings across five configurations.

View large Download slide

The figure shows ten confusion matrices for all tested configurations in two rows. The top row is labeled “Sentence - B E R T embeddings” and contains five square matrices labeled “(A) C 1,” “(B) C 2,” “(C) C 3,” “(D) C 4,” and “(E) C 5.” Each matrix has “Actual” on the vertical axis with categories “H a m” and “S p a m,” and “Prediction” on the horizontal axis with categories “H a m” and “S p a m.” In (A) C 1, the cells show: H a m - H a m: 901, H a m - S p a m: 1, S p a m - H a m: 6, S p a m - S p a m: 734. In (B) C 2, the cells show: H a m - H a m: 902, H a m - S p a m: 0, S p a m - H a m: 6, S p a m - S p a m: 734. In (C) C 3, the cells show: H a m - H a m: 901, H a m - S p a m: 1, S p a m - H a m: 9, S p a m - S p a m: 731. In (D) C 4, the cells show: H a m - H a m: 902, H a m - S p a m: 0, S p a m - H a m: 6, S p a m - S p a m: 734. In (E) C 5, the cells show: H a m - H a m: 902, H a m - S p a m: 0, S p a m - H a m: 2, S p a m - S p a m: 738. The bottom row is labeled “X L Net embeddings” and contains five square matrices labeled “(F) C 1,” “(G) C 2,” “(H) C 3,” “(I) C 4,” and “(J) C 5.” Each matrix has the same axis labels. In (F) C 1, the cells show: H a m - H a m: 900, H a m - S p a m: 2, S p a m - H a m: 13, S p a m - S p a m: 727. In (G) C 2, the cells show: H a m - H a m: 902, H a m - S p a m: 0, S p a m - H a m: 7, S p a m - S p a m: 733. In (H) C 3, the cells show: H a m - H a m: 901, H a m - S p a m: 1, S p a m - H a m: 15, S p a m - S p a m: 725. In (I) C 4, the cells show: H a m - H a m: 902, H a m - S p a m: 0, S p a m - H a m: 15, S p a m - S p a m: 725. In (J) C 5, the cells show: H a m - H a m: 901, H a m - S p a m: 1, S p a m - H a m: 3, S p a m - S p a m: 737.

Confusion matrices for all tested configurations. C1: GRU, C2: GRU + Attention, C3: GRU + Attention + LightGBM, C4: GRU + Attention + XGB, C5: GRU with attention and SVM. Source: Created by the author

The analysis of these matrices reveals noteworthy patterns. Across all methods, there was a high success rate in correctly identifying the 902 non-spam messages. However, a significant challenge was encountered in the classification of spam messages, with all configurations exhibiting a considerable number of false positives. For instance, the GRU with attention and LightGBM configuration misclassified 15 spam messages as non-spam. Notably, our proposed approach with Sentence-BERT embeddings achieved perfect identification of all true positives but encountered difficulty in correctly classifying two spam messages. This highlights the robustness of our method.

6.2 Computational efficiency

Beyond classification performance, inference speed is crucial for real-time spam detection. While training demands significant computational resources, deployment primarily depends on how quickly the system processes new messages.

On a machine with an Intel i7 processor, 32 GB RAM, and an NVIDIA GeForce GTX 950M GPU, the proposed method using SBERT embeddings processed each message in about 31 ms, compared to 45 ms for XLNet embeddings. These values fall within the typical range reported for transformer-based models (10–50 ms, depending on hardware and configuration) [40, 41].

Overall, the SBERT variant offers the best trade-off between accuracy and speed, making it well-suited for high-throughput, real-time spam detection.

6.3 Comparison with prior work

To better position the effectiveness of our method within the context of existing research, we conducted a comparative evaluation with several recent state-of-the-art approaches tested on similar spam detection datasets, documented in the related works section and described in Refs. [4, 7–9, 11–13]. Detailed descriptions of these methods are provided in the Related Works section. This comparison serves to highlight the performance improvements achieved by our proposed framework in terms of classification accuracy.

Table 4 presents a summary of recent methods selected from the literature, including deep learning and transformer-based models. The comparative results clearly demonstrate that our model outperforms these approaches, achieving the highest reported accuracy of 99.88%. This performance gain can be attributed to the synergy between contextualized embeddings (SBERT), attention-enhanced sequential modeling (GRU with attention) and a robust classifier (SVM), all reinforced through data augmentation.

Table 4

Accuracy comparison of the proposed approach with previous methods

Authors	Accuracy
Sonowal et al. (2018) [4]	96.4%
Ghourabi et al. (2020) [7]	98.4%
Roy et al. (2020) [8]	99.4%
Xia et al. (2021) [9]	96.9%
Silpa et al. (2023) [12]	98.4%
Wanda (2023) [13]	99.0%
Gupta et al. (2022) [11]	99.7%
Our approach	99.88%

Authors	Accuracy
Sonowal et al. (2018) [4]	96.4%
Ghourabi et al. (2020) [7]	98.4%
Roy et al. (2020) [8]	99.4%
Xia et al. (2021) [9]	96.9%
Silpa et al. (2023) [12]	98.4%
Wanda (2023) [13]	99.0%
Gupta et al. (2022) [11]	99.7%
Our approach	99.88%

Source(s): Created by the author

6.4 Statistical evaluation of model performance

To rigorously assess the performance and reliability of our spam detection configurations, we employed both the bootstrap method and the Friedman test, offering complementary insights into model stability and the significance of observed differences.

6.4.1 Confidence intervals via bootstrap

We used a non-parametric bootstrap approach to estimate 95% confidence intervals (CIs) for each configuration's accuracy. This involved generating 500 bootstrap samples by resampling the test data with replacement and computing accuracy for each. The resulting distribution was used to derive empirical CIs.

Table 5 presents the CIs for both embedding strategies:

Table 5

Bootstrap 95% confidence intervals for accuracy, 500 resamples

Configuration	XLNet 95% CI	SBERT 95% CI
C1	(98.6602, 99.5128)	(99.2083, 99.8782)
C2	(99.2083, 99.8782)	(99.3301, 99.8782)
C3	(98.5993, 99.5128)	(99.0256, 99.7564)
C4	(98.6602, 99.5128)	(99.3301, 99.8782)
C5	(99.5128, 99.9391)	(99.6955, 99.9990)

Configuration	XLNet 95% CI	SBERT 95% CI
C1	(98.6602, 99.5128)	(99.2083, 99.8782)
C2	(99.2083, 99.8782)	(99.3301, 99.8782)
C3	(98.5993, 99.5128)	(99.0256, 99.7564)
C4	(98.6602, 99.5128)	(99.3301, 99.8782)
C5	(99.5128, 99.9391)	(99.6955, 99.9990)

Note(s): C1: GRU; C2: GRU with attention; C3: GRU + Attention + LightGBM; C4: GRU + Attention + XGBoost; C5: GRU + Attention + SVM

Source(s): Created by the author

From a statistical perspective, the overlap between the SBERT and XLNet intervals is minimal for the top-performing configurations, indicating a consistent advantage for SBERT embeddings. The narrow CI for C5 (GRU + Attention + SVM) reflects low variance across resamples, while the broader range for C3 (GRU + Attention + LightGBM) suggests higher sensitivity to data variation.

6.4.2 Friedman test for statistical significance

We applied the Friedman test to assess whether performance differences among configurations (C1–C5) are statistically significant. The test yielded χ² = 19.36 with a p-value of 0.00067 for XLNet, and χ² = 23.80 with a p-value of 0.0069 for SBERT.

In both cases, results are statistically significant (p < 0.05), confirming that model configuration meaningfully affects spam detection performance.

The bootstrap analysis indicates that C5 is the most stable configuration, and the Friedman test confirms that performance differences among models are statistically significant. These findings highlight the strong performance of the GRU + Attention + SVM architecture, particularly with SBERT embeddings.

6.5 Limitations

Our approach relies on pretrained models such as SBERT and XLNet, which may be demanding for devices with limited resources. Future work will explore lighter alternatives like DistilBERT or MiniLM, and optimization methods such as quantization and pruning to speed up inference.

The current evaluation is limited to English SMS data. Testing in other languages and domains (e.g., email, chat, social media) will help assess how well the method adapts to different contexts.

We also observed a few missed spam cases and have not yet studied resistance to adversarial or evolving spam patterns. Expanding the evaluation and exploring online or continual learning could further improve adaptability.

7. Conclusion and future work

This study presented a spam detection framework using Sentence-BERT embeddings. Combined with an attention-enhanced GRU for sequential modeling and a SVM classifier in place of softmax. The approach reached 99.88% accuracy with a 95% CI of [99.70%, 100.00%], demonstrating the effectiveness of combining semantic encoding, temporal modeling, and optimized classification. Inference speed was also competitive, supporting deployment in real-time filtering systems.

Beyond technical performance, content-based spam filtering also raises important ethical and societal questions. These systems can greatly reduce exposure to harmful content such as scams and phishing, but they must be designed to protect legitimate communication, respect user privacy, and be transparent in how they work. This calls for models that are easy to understand and accountable, helping to build trust and ensure fair decisions.

Future work will explore:

Advanced Embeddings: Testing lighter or more adaptable embeddings to improve efficiency and generalization.
Dynamic Ensembles: Developing adaptive ensemble methods to handle evolving spam patterns.
Imbalance Mitigation: Applying sampling or loss-based strategies to address class imbalance in practical deployments.

References

1.

Slicktext

.

17 spam text statistics & spam text examples for 2023

.

Accessed

25 December 2024.

Available from:

https://www.slicktext.com/blog/2022/10/17-spam-text-statisitics-for-2022/

2.

Tarek

MM

,

Abd-El-Hafeez

T

.

Developing an efficient method for automatic threshold detection based on hybrid feature selection approach

. In:

Proceedings of the International Conference on Advanced Intelligent Systems and Informatics

.

Springer

;

2020

. p.

45

-

54

.

Google Scholar

3.

Girgis

MR

,

Mahmoud

TM

,

Abd-El-Hafeez

T

.

A new effective system for filtering pornography videos

.

Int J Comput Sci Eng

.

2010

;

2

(

9

):

123

-

30

.

Google Scholar

4.

Sonowal

G

,

Kuppusamy

KS

.

Smidca: an anti-smishing model with machine learning approach

.

The Comput J

.

2018

;

61

(

8

):

1143

-

57

. doi:

https://doi.org/10.1093/comjnl/bxy039

.

Google Scholar

Crossref

5.

Mishra

S

,

Soni

D

.

Smishing detector: a security model to detect smishing through sms content analysis and url behavior analysis

.

Future Gener Comput Syst

.

2020

;

108

:

803

-

15

. doi:

https://doi.org/10.1016/j.future.2020.03.021

.

Google Scholar

Crossref

6.

Woong Joo

J

,

Moon

SY

,

Singh

S

,

Park

JH

.

S-detector: an enhanced security model for detecting smishing attack for mobile computing

.

Telecommun Syst

.

2017

;

66

(

1

):

29

-

38

. doi:

https://doi.org/10.1007/s11235-016-0269-9

.

Google Scholar

Crossref

7.

Ghourabi

A

,

Mahmood

MA

,

Alzubi

QM

.

A hybrid cnn-lstm model for sms spam detection in Arabic and English messages

.

Future Internet

.

2020

;

12

(

9

):

156

. doi:

https://doi.org/10.3390/fi12090156

.

Google Scholar

Crossref

8.

Roy

PK

,

Singh

JP

,

Banerjee

S

.

Deep learning to filter sms spam

.

Future Gener Comput Syst

.

2020

;

102

:

524

-

33

. doi:

https://doi.org/10.1016/j.future.2019.09.001

.

Google Scholar

Crossref

9.

Xia

T

,

Chen

X

.

A weighted feature enhanced hidden markov model for spam sms filtering

.

Neurocomputing

.

2021

;

444

:

48

-

58

. doi:

https://doi.org/10.1016/j.neucom.2021.02.075

.

Google Scholar

Crossref

10.

Liu

X

,

Lu

H

,

Nayak

A

.

A spam transformer model for sms spam detection

.

IEEE Access

.

2021

;

9

:

80253

-

63

. doi:

https://doi.org/10.1109/access.2021.3081479

.

Google Scholar

Crossref

11.

Gupta

A

,

Patil

J

,

Soni

S

,

Rajan

A

.

Email spam detection using multi head cnn bigru network

. In:

International Conference on Advanced Network Technologies and Intelligent Computing

.

Springer

;

2022

. p.

29

-

46

.

Google Scholar

Crossref

12.

Silpa

C

,

Niya Mirza

S

,

Prathyusha

S

,

Latha Reddy

PNS

,

Hrudaya

UJ

,

Vivek

M

.

A meta classifier model for sms spam detection using multinomialnb linearsvc algorithms

. In:

2023 International Conference on Networking and Communications (ICNWC)

.

IEEE

;

2023

. p.

1

-

6

.

Google Scholar

Crossref

13.

Wanda

P

.

Gruspam: robust e mail spam detection using gated recurrent unit (gru) algorithm

.

Int J Inf Technol

.

2023

;

15

(

8

):

4315

-

22

. doi:

https://doi.org/10.1007/s41870-023-01516-z

.

Google Scholar

Crossref

14.

Mahmoud

TM

,

Abd-El-Hafeez

T

.

A new feature selection method based on frequent and associated itemsets for text classification

.

Res Sq

.

2021

.

preprint. Available from:

https://www.researchsquare.com/

Google Scholar

15.

Mahmoud

TM

,

Abd-El-Hafeez

T

.

The effect of rebalancing techniques on the classification performance in cyberbullying datasets

.

Neural Comput Appl

.

2023

;

35

:

12345

-

56

.

Google Scholar

16.

Mahmoud

TM

,

Abd-El-Hafeez

T

,

Omar

A

.

A highly efficient content-based approach to filter pornography websites

.

Int J Comput Appl

.

2012

;

50

(

3

):

1

-

7

.

Google Scholar

17.

Girgis

MR

,

Mahmoud

TM

,

Abd-El-Hafeez

T

.

A system for extracting images and urls from web pages

.

Int J Comput Appl

.

2013

;

75

(

12

):

25

-

30

.

Google Scholar

18.

Zhang

Z

,

Mahmoud

TM

,

Abd-El-Hafeez

T

.

Topic extraction and interactive knowledge graphs for learning resources

.

Sustain

.

2022

;

14

(

1

):

226

.

Google Scholar

19.

Anand Sharma

N

,

Ali

ABMS

,

Kabir

MA

.

A review of sentiment analysis: tasks, applications, and deep learning techniques

.

Int J Data Sci Anal

.

2025

;

19

(

3

):

351

-

88

. doi:

https://doi.org/10.1007/s41060-024-00594-x

.

Google Scholar

Crossref

20.

Lopez-Joya

S

,

Diaz-Garcia

JA

,

Ruiz

MD

,

Martin-Bautista

MJ

.

Dissecting a social bot powered by generative ai: anatomy, new trends and challenges

.

Soc Netw Anal Min

.

2025

;

15

(

1

):

7

. doi:

https://doi.org/10.1007/s13278-025-01410-5

.

Google Scholar

Crossref

21.

Batiuk

T

,

Dosyn

D

.

Intellectual analysis of textual data in social networks using bert and xgboost

.

Info Syst Netw

.

2025

;

17

:

44

-

60

. doi:

https://doi.org/10.23939/sisn2025.17.044

.

Google Scholar

Crossref

22.

Shaba Sayeed

Md

,

Kalyan Dutta

I

.

Detecting malicious urls in brushing scams: a machine learning approach with human-centered cybersecurity

. In:

2025 IEEE World AI IoT Congress (AIIoT)

.

IEEE

;

2025

. p.

0062

-

9

.

Google Scholar

23.

Reimers

N

,

Gurevych

I

.

Sentence bert: sentence embeddings using siamese BERT-networks

. In:

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing

.

Hong Kong, China

:

Association for Computational Linguistics

;

2019

. p.

3973

-

83

.

Google Scholar

Crossref

24.

Devlin

J

,

Chang Kenton

M-W

,

Lee

KT

.

Bert: pre training of deep bidirectional transformers for language understanding

. In:

Proceedings of naacL-HLT

,

1

.

Association for Computational Linguistics

;

2019

. p.

2

.

Google Scholar

25.

Wang

T

,

Shi

H

,

Liu

W

,

Yan

X

.

A joint framenet and element focusing sentence bert method of sentence similarity computation

.

Expert Syst Appl

.

2022

;

200

: 117084. doi:

https://doi.org/10.1016/j.eswa.2022.117084

.

Google Scholar

26.

Peyton

K

,

Unnikrishnan

S

.

A comparison of chatbot platforms with the state-of-the-art sentence bert for answering online student faqs

.

Results Eng

.

2023

;

17

: 100856. doi:

https://doi.org/10.1016/j.rineng.2022.100856

.

Google Scholar

27

Yang

Z

,

Dai

Z

,

Yang

Y

,

Carbonell

J

,

Salakhutdinov

RR

,

Le

Qv

.

Xlnet: Generalized autoregressive pretraining for language understanding

.

Adv Neural Inf Process Syst

.

2019

;

32

:

5753

-

5763

.

Google Scholar

28.

Muhammad Danyal

M

,

Khan

SS

,

Khan

M

,

Ullah

S

,

Mehmood

F

,

Ali

I

.

Proposing sentiment analysis model based on bert and xlnet for movie reviews

.

Multimed Tools Appl

.

2024

;

83

(

24

):

1

-

25

. doi:

https://doi.org/10.1007/s11042-024-18156-5

.

Google Scholar

Crossref

29.

Habbat

N

,

Anoun

H

,

Hassouni

L

.

Combination of gru and cnn deep learning models for sentiment analysis on French customer reviews using xlnet model

.

IEEE Eng Manag Rev

.

2022

;

51

(

1

):

41

-

51

. doi:

https://doi.org/10.1109/emr.2022.3208818

.

Google Scholar

Crossref

30.

Wu

N

,

Hou

H

,

Guo

Z

,

Zheng

W

.

Low resource neural machine translation using xlnet pre-training model

. In:

Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks

,

September 14–17, 2021

.

Bratislava, Slovakia

:

Springer

;

2021

. p.

503

-

14

.

Proceedings, Part V 30

.

Google Scholar

Crossref

31.

Cho

K

,

van Merrienboer

B

,

Gülçehre

Ç

,

Bahdanau

D

,

Bougares

F

,

Schwenk

H

,

Bengio

Y

.

Learning phrase representations using rnn encoder–decoder for statistical machine translation

. In:

Conference on Empirical Methods in Natural Language Processing

.

ACL

;

2014

.

Google Scholar

Crossref

32.

Vasilev

I

.

Advanced deep learning with Python: design and implement advanced next-generation AI solutions using TensorFlow and PyTorch

.

Birmingham, UK

:

Packt Publishing

;

2019

.

Google Scholar

33.

Singh

J

,

Sharma

S

,

Briskilal

J

.

Natural language processing based machine translation for Hindi English using gru and attention

. In:

2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC)

.

IEEE

;

2022

. p.

965

-

9

.

Google Scholar

Crossref

34.

Shewalkar

A

,

Nyavanandi

D

,

Ludwig

SA

.

Performance evaluation of deep neural networks applied to speech recognition: rnn, lstm and gru

.

J Artif Intell Soft Comput Res

.

2019

;

9

(

4

):

235

-

45

. doi:

https://doi.org/10.2478/jaiscr-2019-0006

.

Google Scholar

Crossref

35.

Pirani

M

,

Thakkar

P

,

Jivrani

P

,

Bohara

MH

,

Garg

D

.

A comparative analysis of arima, gru, lstm and bilstm on financial time series forecasting

. In:

2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)

.

IEEE

;

2022

. p.

1

-

6

.

Google Scholar

Crossref

36

Vaswani

A

,

Shazeer

N

,

Parmar

N

,

Uszkoreit

J

,

Jones

L

,

Gomez

AN

,

Kaiser

Ł

,

Polosukhin

I

.

Attention is all you need

.

Adv Neural Inf Process Syst

.

2017

;

30

:

5998

-

6008

.

Google Scholar

37.

Bahdanau

,

D.

,

Cho

,

K.

,

Bengio

,

Y.

Neural Machine Translation by Jointly Learning to Align and Translate

. In:

2015 3rd International Conference on Learning Representations (ICLR 2015)

,

San Diego, CA

.

38.

Raffel

C

,

Shazeer

N

,

Roberts

A

,

Lee

K

,

Narang

S

,

Matena

M

,

Zhou

Y

,

Li

W

,

Liu

PJ

.

Exploring the limits of transfer learning with a unified text to text transformer

.

J Mach Learn Res

.

2020

;

21

(

140

):

1

-

67

.

Google Scholar

PubMed

39.

Almeida

TA

,

Hidalgo

JMG

,

Yamakami

A

.

Contributions to the study of sms spam filtering: new collection and results

. In:

Proceedings of the 11th ACM symposium on Document engineering

.

Association for Computing Machinery

;

2011

. p.

259

-

62

.

Google Scholar

Crossref

40.

NVIDIA

.

Real-Time Natural Language Processing with BERT Using NVIDIA TensorRT

(Updated)

,

2020

.

Accessed

17 April 2025.

Available from:

https://developer.nvidia.com/blog/real-time-nlp-with-bert-using-tensorrt-updated/

41.

NVIDIA

.

Real-Time Natural Language Understanding with BERT Using TensorRT

,

2019

.

Accessed

17 April 2025.

Available from:

https://developer.nvidia.com/blog/nlu-with-tensorrt-bert/

2025

Leila Boussaad

Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

Spam detection using sentence-BERT with attention-enhanced GRU and SVM

1. Introduction

2. Related works

3. Background

3.1 Sentence-BERT model

3.2 XLNet model

3.3 Gated Recurrent Unit (GRU)

3.4 Attention mechanism

4. Proposed approach description

4.1 Data augmentation

4.2 Sentence-BERT representation

4.3 Sequential modeling with GRU and self-attention mechanism:

4.4 Final classification with SVM

4.5 Hyperparameter tuning

5. Experimental evaluation

5.1 Dataset description

5.2 Evaluation metrics

5.3 Experimental methodology description

6. Experimental results and discussion

6.1 Discussion

6.2 Computational efficiency

6.3 Comparison with prior work

6.4 Statistical evaluation of model performance

6.4.1 Confidence intervals via bootstrap

6.4.2 Friedman test for statistical significance

6.5 Limitations

7. Conclusion and future work

References

Email Alerts

Cited By

Spam detection using sentence-BERT with attention-enhanced GRU and SVM

1. Introduction

2. Related works

3. Background

3.1 Sentence-BERT model

3.2 XLNet model

3.3 Gated Recurrent Unit (GRU)

3.4 Attention mechanism

4. Proposed approach description

4.1 Data augmentation

4.2 Sentence-BERT representation

4.3 Sequential modeling with GRU and self-attention mechanism:

4.4 Final classification with SVM

4.5 Hyperparameter tuning

5. Experimental evaluation

5.1 Dataset description

5.2 Evaluation metrics

5.3 Experimental methodology description

6. Experimental results and discussion

6.1 Discussion

6.2 Computational efficiency

6.3 Comparison with prior work

6.4 Statistical evaluation of model performance

6.4.1 Confidence intervals via bootstrap

6.4.2 Friedman test for statistical significance

6.5 Limitations

7. Conclusion and future work

References

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable