Stemming of French words based on grammatical categories

Savoy, Jacques

Stemming of French words based on grammatical categories

Jacques Savoy

Journal of the American Society for Information Science, 1993, vol. 44, issue 1, 1-9

Abstract: Automatic indexing systems use suffix stripping algorithms to cluster various words derived from a common root under the same stem. Currently, removing affixes to either a context‐free or context‐sensitive operation, where the context refers to the remaining stem. In this article, we propose a suffixing algorithm which uses grammatical categories to enhance the stemming process. This approach supports the use of foreign languages. In our case, the language is French, and a morphological analysis is required for removing inflectional suffixes or morphosyntactic variants of a lemma. After this analysis, we implement a suffix stripping algorithm which uses a dictionary and the grammatical categories to remove derivational suffixes. Our approach always returns a linguistically correct lemma, but not necessarily the “right” one. Based on our tests, this solution is an attractive one, with a mean error rate of 16%. We finish by explaining why we cannot expect significantly better results with this approach. © 1993 John Wiley & Sons, Inc.

Date: 1993
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(199301)44:13.0.CO;2-1

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:44:y:1993:i:1:p:1-9

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().