Skip to Main Content
Article navigation
Purpose

Current segmentation systems almost invariably focus on linear segmentation and can only divide text into linear sequences of segments. This suits cohesive text such as news feed but not coherent texts such as documents of a digital library which have hierarchical structures. To overcome the focus on linear segmentation in document segmentation and to realize the purpose of hierarchical segmentation for a digital library’s structured resources, this paper aimed to propose a new multi-granularity hierarchical topic-based segmentation system (MHTSS) to decide section breaks.

Design/methodology/approach

MHTSS adopts up-down segmentation strategy to divide a structured, digital library document into a document segmentation tree. Specifically, it works in a three-stage process, such as document parsing, coarse segmentation based on document access structures and fine-grained segmentation based on lexical cohesion.

Findings

This paper analyzed limitations of document segmentation methods for the structured, digital library resources. Authors found that the combination of document access structures and lexical cohesion techniques should complement each other and allow for a better segmentation of structured, digital library resources. Based on this finding, this paper proposed the MHTSS for the structured, digital library resources. To evaluate it, MHTSS was compared to the TT and C99 algorithms on real-world digital library corpora. Through comparison, it was found that the MHTSS achieves top overall performance.

Practical implications

With MHTSS, digital library users can get their relevant information directly in segments instead of receiving the whole document. This will improve retrieval performance as well as dramatically reduce information overload.

Originality/value

This paper proposed MHTSS for the structured, digital library resources, which combines the document access structures and lexical cohesion techniques to decide section breaks. With this system, end-users can access a document by sections through a document structure tree.

Licensed re-use rights only
You do not currently have access to this content.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.
Pay-Per-View Access
$41.00
Rental

or Create an Account

Close Modal
Close Modal

Gift article access

As a benefit of your subscription, you can share temporary access to restricted articles.

Each link will stop working after 30 days or 10 uses. You may create up to 10 links in a 30 day period.

Please sign in to your personal account to gift article access.

Register

Gift article access

As a benefit of your subscription, you can share temporary access to restricted articles.

Each link will stop working after 30 days or 10 uses. You may create up to 10 links in a 30 day period.

Gift articles remaining: --

Gift article access

Each link will stop working after 30 days or 10 uses. You may create up to 10 links in a 30 day period.

Gift articles remaining: --

Gift article access

As a benefit of your subscription, you can share temporary access to restricted articles.

Each link will stop working after 30 days or 10 uses.

You have reached the limit of 10 links within a 30 day period.