From entity extraction to network analysis: a method and an application to a Portuguese textual source
Conceição Rocha (),
Alípio Mário Jorge,
Márcia Oliveira,
Paula Brito,
João Gama and
Carlos Pimenta
Additional contact information
Conceição Rocha: FEP-UP; LIAAD/INESC
Alípio Mário Jorge: FCUP; LIAAD/INESC
Márcia Oliveira: FEP-UP; LIAAD/INESC
Paula Brito: FEP-UP
João Gama: FEP-UP, LIAAD/INESC
Carlos Pimenta: FEP-UP, OBEGEF
OBEGEF Working Papers from OBEGEF - Observatório de Economia e Gestão de Fraude, OBEGEF Working Papers on Fraud and Corruption
Abstract:
This paper reports advances in the entity extraction task (named entity identification) of a text mining process that aims at unveiling non-trivial semantic structures, such as relationships and interaction between entities or communities. We proposed a 3-phase method that is applicable to the Portuguese language and potentially applicable to other languages as well. The method relies on flexible pattern matching, part-of-speech tagging, lexical-based rules and distance-based entity name merging. All steps are implemented using free software and taking advantage of various existing packages. Evaluation of the efficacy of the entity extraction method on part of a book written in portuguese indicates improved F1 results. For further evaluation and illustration of the usefulness of the proposed method, it is applied to a book on Freemasonry and observe the differences in the entity word clouds produced. We also define a social network of named entities solely from information contained in the book and extract structural insights that reveal connections, relationships and communities between entities.
Pages: 20 pages
Date: 2014-11
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.fep.up.pt/repec/por/obegef/files/wp032.pdf
Our link check indicates that this URL is bad, the error code is: 404 Not Found (http://www.fep.up.pt/repec/por/obegef/files/wp032.pdf [302 Found]--> https://fep.up.pt/repec/por/obegef/files/wp032.pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:por:obegef:032
Access Statistics for this paper
More papers in OBEGEF Working Papers from OBEGEF - Observatório de Economia e Gestão de Fraude, OBEGEF Working Papers on Fraud and Corruption Contact information at EDIRC.
Bibliographic data for series maintained by Rui Henrique Alves ().