Classification of scientific publications according to library controlled vocabulariesA new concept matching-based approach

Greenberg, Eva Mendez Rodriguez and Gema Bueno de la Fuente, Jane; Joorabchi, Arash; E. Mahdi, Abdulhussain

doi:10.1108/LHT-03-2013-0030

Article navigation

Volume 31, Issue 4

18 November 2013

Editors

Jane Greenberg, Eva Mendez Rodriguez and Gema Bueno de la Fuente

Search for other works by this author on:

This Site

PubMed

Google Scholar

Research Article| November 18 2013

Classification of scientific publications according to library controlled vocabularies: A new concept matching-based approach

Arash Joorabchi;

Arash Joorabchi

Electronic and Computer Engineering Department, University of Limerick, Limerick, Ireland

Search for other works by this author on:

This Site

PubMed

Google Scholar

Abdulhussain E. Mahdi

Electronic and Computer Engineering Department, University of Limerick, Limerick, Ireland

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Publisher: Emerald Publishing

Online ISSN: 2054-166X

Print ISSN: 0737-8831

2013

Library Hi Tech (2013) 31 (4): 725–747.

https://doi.org/10.1108/LHT-03-2013-0030

Purpose

– This paper aims to report on the design and development of a new approach for automatic classification and subject indexing of research documents in scientific digital libraries and repositories (DLR) according to library controlled vocabularies such as DDC and FAST.

Design/methodology/approach

– The proposed concept matching-based approach (CMA) detects key Wikipedia concepts occurring in a document and searches the OPACs of conventional libraries via querying the WorldCat database to retrieve a set of MARC records which share one or more of the detected key concepts. Then the semantic similarity of each retrieved MARC record to the document is measured and, using an inference algorithm, the DDC classes and FAST subjects of those MARC records which have the highest similarity to the document are assigned to it.

Findings

– The performance of the proposed method in terms of the accuracy of the DDC classes and FAST subjects automatically assigned to a set of research documents is evaluated using standard information retrieval measures of precision, recall, and F1. The authors demonstrate the superiority of the proposed approach in terms of accuracy performance in comparison to a similar system currently deployed in a large scale scientific search engine.

Originality/value

– The proposed approach enables the development of a new type of subject classification system for DLR, and addresses some of the problems similar systems suffer from, such as the problem of imbalanced training data encountered by machine learning-based systems, and the problem of word-sense ambiguity encountered by string matching-based systems.

2013

You do not currently have access to this content.

Don't already have an account? Register

Classification of scientific publications according to library controlled vocabularies: A new concept matching-based approach

Email Alerts

Cited By

Classification of scientific publications according to library controlled vocabularies: A new concept matching-based approach

Sign in

Client Account

ICE Member Sign In

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable