Parallel, massive processing in SuperMatrix: a general tool for distributional semantic analysis of corpora

Broda, Bartosz; Piasecki, Maciej

Parallel, massive processing in SuperMatrix: a general tool for distributional semantic analysis of corpora

Bartosz Broda and Maciej Piasecki

International Journal of Data Mining, Modelling and Management, 2013, vol. 5, issue 1, 1-19

Abstract: This article presents an extended version of the SuperMatrix system - a general tool supporting automatic acquisition of lexical semantic relations from corpora. Extensions focus mainly on parallel processing of massive amounts of data. The construction of the system is discussed. Three distributed parts of the system are presented, i.e., distributed construction of co-incidence matrices from corpora, computation of similarity matrix and parallel solving of synonymy tests. An evaluation of a proposed approach to parallel processing is shown. Parallelisation of similarity matrix computation demonstrates almost linear speedup. The smallest improvements were achieved for construction of matrices, as this process is mostly bound by reading huge amounts of data. Areas of application of the system are described.

Keywords: SuperMatrix; distributional semantics; parallel processing; semantic analysis; lexical semantic relations; corpora; co-incidence matrices; similarity matrix; synonyms. (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=51924 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:5:y:2013:i:1:p:1-19

Access Statistics for this article

More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().