EconPapers    
Economics at your fingertips  
 

Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

Daiho Uhm and Sunghae Jun
Additional contact information
Daiho Uhm: Department of Mathematics, University of Arkansas—Fort Smith, Fort Smith, AR 72913, USA
Sunghae Jun: Department of Big Data and Statistics, Cheongju University, Chungbuk 28503, Korea

Future Internet, 2022, vol. 14, issue 7, 1-11

Abstract: Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big data analysis using statistics and machine learning algorithms. However, as the use of big data increases, problems also occur. One of them is a zero-inflated problem in structured data preprocessed from big data. Most count values are zeros because a specific word is found in only some documents. In particular, since most of the patent data are in the form of a text document, they are more affected by the zero-inflated problem. To solve this problem, we propose a generation of synthetic samples using statistical inference and tree structure. Using patent document and simulation data, we verify the performance and validity of our proposed method. In this paper, we focus on patent keyword analysis as text big data analysis, and we encounter the zero-inflated problem just like other text data.

Keywords: zero-inflated data; synthetic sample; patent analysis; count data; classification and regression trees (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/14/7/211/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/7/211/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:7:p:211-:d:864174

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jftint:v:14:y:2022:i:7:p:211-:d:864174