Table 7

Inconsistency and intrinsic difficulty of different versions of text embedding for Wuerstchen. Classification error rate and obtained with Dual Vision Transformer —DaViT— model.

Training TestingWuerstchenSingle promptwith CLIP tokenizerwith T5xxlCLIP model with projection
Wuerstchen0.98%0.98%1.14%1.35%1.06%

or Create an Account

Close Modal
Close Modal