Table 7 Inconsistency and intrinsic...

Table 7

Inconsistency and intrinsic difficulty of different versions of text embedding for Wuerstchen. Classification error rate and obtained with Dual Vision Transformer —DaViT— model.

Training Testing	Wuerstchen	Single prompt	with CLIP tokenizer	with T5xxl	CLIP model with projection
Wuerstchen	0.98%	0.98%	1.14%	1.35%	1.06%

[ViewLarge]

Sharing Unavailable