Performance of two-factor and three-factor disentanglement on singing voices across different split sets.
| Data split | Factors | MSE ↓ | Phoneme estimation [%] | Pitch estimation [%] | ||||
|---|---|---|---|---|---|---|---|---|
| Timbre | Variation | Pitch | Variation ↑ | Pitch ↓ | Variation ↓ | Pitch ↑ | ||
| Test | ✓ | ✓ | 0.873 | — | 64.7 | — | 16.6 | |
| Test | ✓ | ✓ | 0.653 | 62.1 | 46.3 | 13.3 | 44.2 | |
| Test | ✓ | ✓ | ✓ | 0.613 | 61.4 | 48.1 | 12.7 | 45.5 |
| Training | ✓ | ✓ | 0.897 | — | 63.8 | — | 14.9 | |
| Training | ✓ | ✓ | 0.650 | 62.4 | 44.4 | 11.5 | 43.0 | |
| Training | ✓ | ✓ | ✓ | 0.605 | 61.5 | 44.2 | 10.7 | 45.2 |
| Validation | ✓ | ✓ | 0.888 | — | 63.2 | — | 17.8 | |
| Validation | ✓ | ✓ | 0.657 | 60.9 | 44.0 | 12.8 | 45.0 | |
| Validation | ✓ | ✓ | ✓ | 0.614 | 60.3 | 46.2 | 11.1 | 47.3 |
| Data split | Factors | MSE ↓ | Phoneme estimation [%] | Pitch estimation [%] | ||||
|---|---|---|---|---|---|---|---|---|
| Timbre | Variation | Pitch | Variation ↑ | Pitch ↓ | Variation ↓ | Pitch ↑ | ||
| Test | ✓ | ✓ | 0.873 | — | 64.7 | — | 16.6 | |
| Test | ✓ | ✓ | 0.653 | 62.1 | 46.3 | 13.3 | 44.2 | |
| Test | ✓ | ✓ | ✓ | 0.613 | 61.4 | 48.1 | 12.7 | 45.5 |
| Training | ✓ | ✓ | 0.897 | — | 63.8 | — | 14.9 | |
| Training | ✓ | ✓ | 0.650 | 62.4 | 44.4 | 11.5 | 43.0 | |
| Training | ✓ | ✓ | ✓ | 0.605 | 61.5 | 44.2 | 10.7 | 45.2 |
| Validation | ✓ | ✓ | 0.888 | — | 63.2 | — | 17.8 | |
| Validation | ✓ | ✓ | 0.657 | 60.9 | 44.0 | 12.8 | 45.0 | |
| Validation | ✓ | ✓ | ✓ | 0.614 | 60.3 | 46.2 | 11.1 | 47.3 |