Table 5. Experiments on Detector... | Emerald Publishing

Table 5.

Experiments on Detector Features

		Flickr30k (1k testing set)
Vis Feat			Image-to-text			Text-to-image
Global Feat	Detail Feat	Text Feat	Recall@1	Recall@5	Recall@10	Recall@1	Recall@5	Recall@10
CLIP	CLIP	CLIP	85.3	91.9	93.3	72.1	90.6	92.2
DETR encoder	DETR decoder	CLIP	18.3	35.1	41.8	19.5	25.3	45.9
ResNet Backbone	DETR encoder	CLIP	66.7	89.5	93.3	56.7	84.5	90.3
ResNet Backbone	DETR decoder	CLIP	72.4	91.6	95.1	59.5	85.7	90.5
ResNet Backbone	DETR decoder	RoBERTa	64.5	84.5	88.4	53.3	83.3	87.3

		Flickr30k (1k testing set)
Vis Feat			Image-to-text			Text-to-image
Global Feat	Detail Feat	Text Feat	Recall@1	Recall@5	Recall@10	Recall@1	Recall@5	Recall@10
CLIP	CLIP	CLIP	85.3	91.9	93.3	72.1	90.6	92.2
DETR encoder	DETR decoder	CLIP	18.3	35.1	41.8	19.5	25.3	45.9
ResNet Backbone	DETR encoder	CLIP	66.7	89.5	93.3	56.7	84.5	90.3
ResNet Backbone	DETR decoder	CLIP	72.4	91.6	95.1	59.5	85.7	90.5
ResNet Backbone	DETR decoder	RoBERTa	64.5	84.5	88.4	53.3	83.3	87.3