Combining sequence and itemset mining to discover named entities in biomedical texts: a new type of pattern
Marc Plantevit, 
Thierry Charnois, 
Jiri Klema, 
Christophe Rigotti and 
Bruno Cremilleux
International Journal of Data Mining, Modelling and Management, 2009, vol. 1, issue 2, 119-148
Abstract:
Biomedical named entity recognition (NER) is a challenging problem. In this paper, we show that mining techniques, such as sequential pattern mining and sequential rule mining, can be useful to tackle this problem but present some limitations. We demonstrate and analyse these limitations and introduce a new kind of pattern called LSR pattern that offers an excellent trade-off between the high precision of sequential rules and the high recall of sequential patterns. We formalise the LSR pattern mining problem first. Then we show how LSR patterns enable us to successfully tackle biomedical NER problems. We report experiments carried out on real datasets that underline the relevance of our proposition.
Keywords: LSR patterns; left-sequence-right patterns; sequential patterns; biomedical NER; named entity recognition; constraint-based pattern mining; biomedical texts; sequential rule mining; gene names; protein names; text mining; information extraction. (search for similar items in EconPapers)
Date: 2009
References: Add references at CitEc 
Citations: 
Downloads: (external link)
http://www.inderscience.com/link.php?id=26073 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX 
RIS (EndNote, ProCite, RefMan) 
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:1:y:2009:i:2:p:119-148
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management  from  Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().