Inconsistency and intrinsic difficulty of different diffusion-based generative AI models. Classification error rate and obtained with Dual Vision Transformer —DaViT— model.
| Testing Training | FLUX.l-schnell | Stable Diffusion 3 | SD XL Turbo | Wuerstchen | Kandinsky 2.2 |
|---|---|---|---|---|---|
| FLUX.l-schnell | 1.43% | 5.09% | 16.08% | 7.09% | 5.28% |
| Stable Diffusion 3 | 6.88% | 2.35% | 11.48% | 5.19% | 5.64% |
| SD XL Turbo | 16.19% | 25.47% | 0.93% | 18.03% | 17.68% |
| Wuerstchen | 12.50% | 14.41% | 10.41% | 0.98% | 13.11% |
| Kandinsky 2.2 | 10.70% | 10.64% | 19.64% | 8.79% | 1.78% |
| Testing Training | FLUX.l-schnell | Stable Diffusion 3 | SD XL Turbo | Wuerstchen | Kandinsky 2.2 |
|---|---|---|---|---|---|
| FLUX.l-schnell | 1.43% | 5.09% | 16.08% | 7.09% | 5.28% |
| Stable Diffusion 3 | 6.88% | 2.35% | 11.48% | 5.19% | 5.64% |
| SD XL Turbo | 16.19% | 25.47% | 0.93% | 18.03% | 17.68% |
| Wuerstchen | 12.50% | 14.41% | 10.41% | 0.98% | 13.11% |
| Kandinsky 2.2 | 10.70% | 10.64% | 19.64% | 8.79% | 1.78% |
Sharing content requires targeting cookies to be enabled. Please update your cookie preferences to use this feature.