EconPapers    
Economics at your fingertips  
 

From entity extraction to network analysis: a method and an application to a Portuguese textual source

Conceição Rocha (), Alípio Mário Jorge, Márcia Oliveira, Paula Brito, João Gama and Carlos Pimenta
Additional contact information
Conceição Rocha: FEP-UP; LIAAD/INESC
Alípio Mário Jorge: FCUP; LIAAD/INESC
Márcia Oliveira: FEP-UP; LIAAD/INESC
Paula Brito: FEP-UP
João Gama: FEP-UP, LIAAD/INESC
Carlos Pimenta: FEP-UP, OBEGEF

OBEGEF Working Papers from OBEGEF - Observatório de Economia e Gestão de Fraude, OBEGEF Working Papers on Fraud and Corruption

Abstract: This paper reports advances in the entity extraction task (named entity identification) of a text mining process that aims at unveiling non-trivial semantic structures, such as relationships and interaction between entities or communities. We proposed a 3-phase method that is applicable to the Portuguese language and potentially applicable to other languages as well. The method relies on flexible pattern matching, part-of-speech tagging, lexical-based rules and distance-based entity name merging. All steps are implemented using free software and taking advantage of various existing packages. Evaluation of the efficacy of the entity extraction method on part of a book written in portuguese indicates improved F1 results. For further evaluation and illustration of the usefulness of the proposed method, it is applied to a book on Freemasonry and observe the differences in the entity word clouds produced. We also define a social network of named entities solely from information contained in the book and extract structural insights that reveal connections, relationships and communities between entities.

Pages: 20 pages
Date: 2014-11
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.fep.up.pt/repec/por/obegef/files/wp032.pdf
Our link check indicates that this URL is bad, the error code is: 404 Not Found (http://www.fep.up.pt/repec/por/obegef/files/wp032.pdf [302 Found]--> https://fep.up.pt/repec/por/obegef/files/wp032.pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:por:obegef:032

Access Statistics for this paper

More papers in OBEGEF Working Papers from OBEGEF - Observatório de Economia e Gestão de Fraude, OBEGEF Working Papers on Fraud and Corruption Contact information at EDIRC.
Bibliographic data for series maintained by Rui Henrique Alves ().

 
Page updated 2025-03-19
Handle: RePEc:por:obegef:032