This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest‐match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.
Article navigation
1 February 1996
Review Article|
February 01 1996
A STEMMING ALGORITHM FOR LATIN TEXT DATABASES Available to Purchase
ROBYN SCHINKE;
ROBYN SCHINKE
Humanities Research Institute and Departments of History University of Sheffield, Western Bank, Sheffield S10 2TN
Search for other works by this author on:
MARK GREENGRASS;
MARK GREENGRASS
Humanities Research Institute and Departments of History University of Sheffield, Western Bank, Sheffield S10 2TN
Search for other works by this author on:
ALEXANDER M. ROBERTSON;
ALEXANDER M. ROBERTSON
Information Studies University of Sheffield, Western Bank, Sheffield S10 2TN
Search for other works by this author on:
PETER WILLETT
PETER WILLETT
Information Studies University of Sheffield, Western Bank, Sheffield S10 2TN
Search for other works by this author on:
Publisher: Emerald Publishing
Online ISSN: 1758-7379
Print ISSN: 0022-0418
© MCB UP Limited
1996
Journal of Documentation (1996) 52 (2): 172–187.
Citation
SCHINKE R, GREENGRASS M, ROBERTSON AM, WILLETT P (1996), "A STEMMING ALGORITHM FOR LATIN TEXT DATABASES". Journal of Documentation, Vol. 52 No. 2 pp. 172–187, doi: https://doi.org/10.1108/eb026966
Download citation file:
Suggested Reading
An evaluation of conflation accuracy using finite‐state transducers
Journal of Documentation (May,2006)
Morphological typology of languages for IR
Journal of Documentation (June,2001)
Suffix stripping with modern Greek
Program (March,1995)
To stem or lemmatize a highly inflectional language in a probabilistic IR environment?
Journal of Documentation (August,2005)
Old and new at Automatica exhibition
Sensor Review (March,2005)
Related Chapters
References
Multicultural Challenge
References
Multicultural Challenge
References
Advances in Hospitality and Leisure, Volume 20
Recommended for you
These recommendations are informed by your reading behaviors and indicated interests.
