Table 5.

Experiments on Detector Features

Flickr30k (1k testing set)
Vis FeatImage-to-textText-to-image
Global FeatDetail FeatText FeatRecall@1Recall@5Recall@10Recall@1Recall@5Recall@10
CLIPCLIPCLIP85.391.993.372.190.692.2
DETR encoderDETR decoderCLIP18.335.141.819.525.345.9
ResNet BackboneDETR encoderCLIP66.789.593.356.784.590.3
ResNet BackboneDETR decoderCLIP72.491.695.159.585.790.5
ResNet BackboneDETR decoderRoBERTa64.584.588.453.383.387.3

or Create an Account

Close Modal
Close Modal