Comparison of the multilingual model for undersampling and oversampling data
| Model | Pearson | Time | ||
|---|---|---|---|---|
| Train | Val | Test | ||
| Undersampling data | ||||
| Multilingual paraphrase-multilingual-MiniLM-L12-v2 | 0.72 | 0.70 | 0.59 | 00:15:04 |
| Distilbert-base-multilingual-cased-v1 | 0.72 | 0.69 | 0.59 | 00:20:18 |
| Oversampling data | ||||
| Multilingual paraphrase-multilingual-MiniLM-L12-v2 | 0.62 | 0.59 | 0.61 | 01:17:21 |
| Distilbert-base-multilingual-cased-v1 | 0.66 | 0.58 | 0.63 | 01:44:45 |
| Model | Pearson | Time | ||
|---|---|---|---|---|
| Train | Val | Test | ||
| Undersampling data | ||||
| Multilingual paraphrase-multilingual-MiniLM-L12-v2 | 0.72 | 0.70 | 0.59 | 00:15:04 |
| Distilbert-base-multilingual-cased-v1 | 0.72 | 0.69 | 0.59 | 00:20:18 |
| Oversampling data | ||||
| Multilingual paraphrase-multilingual-MiniLM-L12-v2 | 0.62 | 0.59 | 0.61 | 01:17:21 |
| Distilbert-base-multilingual-cased-v1 | 0.66 | 0.58 | 01:44:45 | |
Note(s): The values are italized based on Pearson -Test, i.e. the highest value is the best result
Source(s): Table courtesy of Dhini and Girsang (2023)