EconPapers    
Economics at your fingertips  
 

A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emails

Grigorios Papageorgiou, Polychronis Economou and Sotirios Bersimis

Journal of Applied Statistics, 2024, vol. 51, issue 13, 2592-2626

Abstract: Optimizing text preprocessing and text classification algorithms is an important, everyday task in large organizations and companies and it usually involves a labor-intensive and time-consuming effort. For example, the filtering and sorting of a large number of electronic mails (emails) are crucial to keeping track of the received information and converting it automatically into useful and profitable knowledge. Business emails are often unstructured, noisy, and with many abbreviations and acronyms, which makes their handling a challenging procedure. To overcome those challenges, a two-step classification approach is proposed, along with a two-cycle labeling procedure in order to speed up the labeling process. Every step incorporates a heuristic classification approach to assign emails to predefined classes by comparing several classification and text vectorization algorithms. These algorithms are compared and evaluated using the F1 score and balanced accuracy. The implementation of the proposed algorithm is demonstrated in a shipbroker agent operating in Greece with excellent performance, improving organization and administration while reducing expenses.

Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2024.2307535 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:51:y:2024:i:13:p:2592-2626

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/02664763.2024.2307535

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:japsta:v:51:y:2024:i:13:p:2592-2626