EconPapers    
Economics at your fingertips  
 

Document analysis via combined vectorization and machine learning approaches

Dinara Kaibassova (), Bigul Mukhametzhanova (), Dinara Tokseit (), Aigul Kubegenova () and Murad Kozhanov ()

International Journal of Innovative Research and Scientific Studies, 2025, vol. 8, issue 4, 2195-2204

Abstract: The purpose of this study is to develop an effective hybrid model for automatic document classification by combining statistical and semantic text vectorization techniques with machine learning algorithms. The methodology integrates Term Frequency–Inverse Document Frequency (TF-IDF) and Word2Vec embeddings with classifiers such as Support Vector Machine (SVM) and Random Forest. The proposed approach includes data preprocessing (tokenization, normalization, stop word removal, and lemmatization), feature extraction, model training, and evaluation using classification metrics such as accuracy, F1-score, Matthews Correlation Coefficient (MCC), and Cohen’s Kappa. Experimental results demonstrate that the Word2Vec + SVM model outperforms other configurations, achieving 90.2% accuracy and an F1-score of 82.52%, thus highlighting the advantage of incorporating semantic context into vector representation. The study concludes that hybrid methods combining TF-IDF and Word2Vec with robust classifiers improve both the precision and generalizability of document analysis models. Practical implications include potential applications in sentiment analysis, topic modeling, text classification for legal and healthcare domains, and multilingual contexts. This research provides a foundation for developing high-performance text analysis systems applicable to various real-world natural language processing tasks.

Keywords: Automatic document analysis; Contextual word embedding; Machine learning; Natural language processing; Semantic matching. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://ijirss.com/index.php/ijirss/article/view/8356/1874 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:aac:ijirss:v:8:y:2025:i:4:p:2195-2204:id:8356

Access Statistics for this article

International Journal of Innovative Research and Scientific Studies is currently edited by Natalie Jean

More articles in International Journal of Innovative Research and Scientific Studies from Innovative Research Publishing
Bibliographic data for series maintained by Natalie Jean ().

 
Page updated 2025-07-08
Handle: RePEc:aac:ijirss:v:8:y:2025:i:4:p:2195-2204:id:8356