Research organisations and individual researchers increasingly choose to share their research findings by providing lists of their published works on the World Wide Web. To facilitate the exchange of ideas, the lists often include links to published papers in portable document format (PDF) or Postscript (PS) format. Generally, these publication Web sites are updated regularly to include new works. While manual monitoring of relevant Web sites is tedious, commercial search engines and information monitoring systems are ineffective in finding and tracking scholarly publications. Analyses the characteristics of publication index pages and describes effective automatic extraction techniques that the authors have developed. The authors’ techniques combine lexical and syntactic analyses with heuristics. The proposed techniques have been implemented and tested for more than 14,000 Web pages and achieved consistently high success rates of around 90 percent.
Article navigation
1 February 2002
Review Article|
February 01 2002
Effective techniques for automatic extraction of Web publications Available to Purchase
A.C.M. Fong;
A.C.M. Fong
A.C.M. Fong works at the Institute of Information and Mathematical Sciences of Massey University, Auckland, New Zealand.
Search for other works by this author on:
S.C. Hui;
S.C. Hui
S.C. Hui is an Associate Professor at the School of Computer Engineering at Nanyang Technological University, Singapore.
Search for other works by this author on:
H.L. Vu
H.L. Vu
H.L. Vu is a Research Student, at the School of Computer Engineering at Nanyang Technological University, Singapore.
Search for other works by this author on:
Publisher: Emerald Publishing
Online ISSN: 1468-4535
Print ISSN: 1468-4527
© MCB UP Limited
2002
Online Information Review (2002) 26 (1): 4–18.
Citation
Fong A, Hui S, Vu H (2002), "Effective techniques for automatic extraction of Web publications". Online Information Review, Vol. 26 No. 1 pp. 4–18, doi: https://doi.org/10.1108/14684520210418347
Download citation file:
Suggested Reading
Scholarly resources on the Internet
Campus-Wide Information Systems (September,1995)
Brief communication: UK theses online?
Interlending & Document Supply (December,1998)
Open Access: Key Strategic, Technical and Economic Aspects
Interlending & Document Supply (November,2007)
“Through the looking glass: envisioning new library technologies” – an inflection point for social media and the open web? Part 2
Library Hi Tech News (November,2023)
Publishing on the Internet: : A New Medium for a New Millennium
Library Review (February,1998)
Related Chapters
Sustainability Disclosure of Metal Mould Companies – Content Analysis
Governance and Sustainability
Applying a Health Justice Framework to Examine Health and Social Justice in LIS Course Offerings
Roles and Responsibilities of Libraries in Increasing Consumer Health Literacy and Reducing Health Disparities
Defining School Effectiveness in the Reform for Quality-Oriented Education
The Impact and Transformation of Education Policy in China
Recommended for you
These recommendations are informed by your reading behaviors and indicated interests.
