Skip to Main Content
Article navigation
Purpose

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi‐structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.

Design/methodology/approach

First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.

Findings

Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub‐collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub‐millisecond run times, demonstrating minimal overhead for XML search.

Practical implications

Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.

Originality/value

The paper provides useful information on a method for XML indexing based on the IR system Okapi.

You do not currently have access to this content.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.
Pay-Per-View Access
$41.00
Rental

or Create an Account

Close Modal
Close Modal