Embedding based learning for collection selection in federated search

Garba, Adamu; Khalid, Shah; Ullah, Irfan; Khusro, Shah; Mumin, Diyawu

doi:10.1108/DTA-01-2019-0005

Article navigation

Article Contents

1. Introduction
2. Related work
2.1 Collection selection
2.2 Word embedding in IR
3. Embedding based learning for collection selection (EBCS)
3.1 Estimating $s i m (q, p_{l})$
4. Experimental setup
4.1 Generation of past queries
4.2 Parameter setting
4.3 Evaluation metrics
5. Results and discussion
6. Conclusion
Notes
References

Research Article| October 28 2020

Embedding based learning for collection selection in federated search

Adamu Garba;

Adamu Garba

School of Computer and Communication Engineering

,

Jiangsu University

, Zhenjiang,

China

Adamu Garba can be contacted at: yakasai6@yahoo.com

Search for other works by this author on:

This Site

PubMed

Google Scholar

Shah Khalid

0000-0001-5735-5863

;

Shah Khalid

School of Electrical Engineering and Computer Science (SEECS)

,

National University of Sciences and Technology

, Islamabad,

Pakistan

School of Computer Science and Communication Engineering

,

Jiangsu University

, Zhenjiang,

China

Search for other works by this author on:

This Site

PubMed

Google Scholar

Irfan Ullah

0000-0003-0693-5467

;

Irfan Ullah

Department of Computer Science,

University of Peshawar

, Peshawar,

Pakistan

Search for other works by this author on:

This Site

PubMed

Google Scholar

Shah Khusro

0000-0002-7734-7243

;

Shah Khusro

Department of Computer Science,

University of Peshawar

, Peshawar,

Pakistan

Search for other works by this author on:

This Site

PubMed

Google Scholar

Diyawu Mumin

0000-0003-4941-5197

Diyawu Mumin

School of Computer Science and Communication Engineering

,

Jiangsu University

, Zhenjiang,

China

Computer Science,

Tamale Technical University

, Tamale,

Ghana

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Adamu Garba can be contacted at: yakasai6@yahoo.com

Publisher: Emerald Publishing

Received: January 12 2019

Revision Received: June 25 2020

Accepted: September 10 2020

Online ISSN: 2514-9318

Print ISSN: 2514-9288

2020

Emerald Publishing Limited

Licensed re-use rights only

Data Technologies and Applications (2020) 54 (5): 703–717.

https://doi.org/10.1108/DTA-01-2019-0005

Purpose

There have been many challenges in crawling deep web by search engines due to their proprietary nature or dynamic content. Distributed Information Retrieval (DIR) tries to solve these problems by providing a unified searchable interface to these databases. Since a DIR must search across many databases, selecting a specific database to search against the user query is challenging. The challenge can be solved if the past queries of the users are considered in selecting collections to search in combination with word embedding techniques. Combining these would aid the best performing collection selection method to speed up retrieval performance of DIR solutions.

Design/methodology/approach

The authors propose a collection selection model based on word embedding using Word2Vec approach that learns the similarity between the current and past queries. They used the cosine and transformed cosine similarity models in computing the similarities among queries. The experiment is conducted using three standard TREC testbeds created for federated search.

Findings

The results show significant improvements over the baseline models.

Originality/value

Although the lexical matching models for collection selection using similarity based on past queries exist, to the best our knowledge, the proposed work is the first of its kind that uses word embedding for collection selection by learning from past queries.

2020

Emerald Publishing Limited

Licensed re-use rights only

You do not currently have access to this content.

Don't already have an account? Register

Embedding based learning for collection selection in federated search

Email Alerts

Cited By

Embedding based learning for collection selection in federated search Available to Purchase

Sign in

Client Account

ICE Member Sign In

Email Alerts

Suggested Reading

Recommended for you

Cited By

Embedding based learning for collection selection in federated search