Skip to Main Content
Article navigation
Purpose

A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts.

Design/methodology/approach

The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure, a finite state transducer was developed, which recognizes text sequences having that structure and outputs the sequence in a normalized form, so that different inflectional forms of the same multiword term can be counted properly. Term candidates were further filtered by their frequencies and evaluated by two domain experts.

Findings

By using language resources, such as electronic dictionaries and grammars, 928 multiword terms were extracted out of 1,523 multiword terms that were recognized as candidates from a corpus having 42,260 different simple word forms; 870 of these were new, not already contained in the existing electronic dictionary of compounds for Serbian, and they were used to enrich the dictionary.

Originality/value

The paper presents methodology that can significantly contribute to the development of terminology lexicons in different areas. In this particular use case, some important agricultural engineering concepts were extracted from the text, but this approach could be used for other domains and languages as well.

Licensed re-use rights only
You do not currently have access to this content.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.
Pay-Per-View Access
$41.00
Rental

or Create an Account

Close Modal
Close Modal

Gift article access

As a benefit of your subscription, you can share temporary access to restricted articles.

Each link will stop working after 30 days or 10 uses. You may create up to 10 links in a 30 day period.

Please sign in to your personal account to gift article access.

Register

Gift article access

As a benefit of your subscription, you can share temporary access to restricted articles.

Each link will stop working after 30 days or 10 uses. You may create up to 10 links in a 30 day period.

Gift articles remaining: --

Gift article access

Each link will stop working after 30 days or 10 uses. You may create up to 10 links in a 30 day period.

Gift articles remaining: --

Gift article access

As a benefit of your subscription, you can share temporary access to restricted articles.

Each link will stop working after 30 days or 10 uses.

You have reached the limit of 10 links within a 30 day period.