Deriving term relations for a corpus by graph theoretical clusters
J. Gary Augustson and
Jack Minker
Journal of the American Society for Information Science, 1970, vol. 21, issue 2, 101-111
Abstract:
We discuss how alternative methods of automatic term clustering may provide insight into how terms are related within a corpus. The work reported uses a corpus of 2267 documents that contain 3950 index terms. A similarity matrix is developed using the document–term matrix. A threshold level T is applied to the similarity matrix. Entries in the matrix that are greater than or equal to the threshold level are set equal to one, and the remaining entries are set to zero. Three definitions are applied to the corresponding graph of each threshold matrix to develop clusters. These are, (1) the connected components of the graph, (2) the maximal complete subgraphs of the graph, and (3) the combined maximal complete subgraphs of the graph as described by Gotlieb and Kumar. Two examples are described that show how insight may be gained into the term relations by varying the threshold levels and the cluster definitions.
Date: 1970
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.4630210202
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:21:y:1970:i:2:p:101-111
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().