Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single‐path search technique reduces the search complexity from θ(n) to θ(log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic‐category expansion technique also achieves satisfying results for adding new categories into the system as required.
Article navigation
1 April 2004
Technical Paper|
April 01 2004
Dynamic and hierarchical classification of Web pages Available to Purchase
Ben Choi;
Ben Choi
Assistant Professor in Computer Science at the College of Engineering and Science, Louisiana Tech University, Ruston, Louisiana, USA
Search for other works by this author on:
Xiaogang Peng
Xiaogang Peng
PhD student in Computational Analysis and Modelling, at the College of Engineering and Science, Louisiana Tech University, Ruston, Louisiana, USA
Search for other works by this author on:
Publisher: Emerald Publishing
Online ISSN: 1468-4535
Print ISSN: 1468-4527
© Emerald Group Publishing Limited
2004
Online Information Review (2004) 28 (2): 139–147.
Citation
Choi B, Peng X (2004), "Dynamic and hierarchical classification of Web pages". Online Information Review, Vol. 28 No. 2 pp. 139–147, doi: https://doi.org/10.1108/14684520410531673
Download citation file:
Suggested Reading
Binary k‐nearest neighbor for text categorization
Online Information Review (August,2005)
A strategy for extracting information from semi‐structured web pages
International Journal of Web Information Systems (November,2010)
Sources of information, formal and informal
Management Decision (June,1995)
High‐level Subject Access Tools and Techniques in Internet Cataloging
Library Review (March,2006)
Metadata and the future of cataloging
Library Review (February,2001)
Related Chapters
Garbage in, Garbage out: A Theory-Driven Approach to Improve Data Handling in Supervised Machine Learning
Methods to Improve Our Field
The International Standard Classification of Education 2011
Class and Stratification Analysis
Varieties of Cooperative Strategy in Project Based Organizing: The Case of International Motion Picture Co-Production
Project-Based Organizing and Strategic Management
Recommended for you
These recommendations are informed by your reading behaviors and indicated interests.
