Using Probabilistic Models for Data Compression

Iatan, Iuliana; Drăgan, Mihăiţă; Dedu, Silvia; Preda, Vasile

Using Probabilistic Models for Data Compression

Iuliana Iatan, Mihăiţă Drăgan, Silvia Dedu () and Vasile Preda ()
Additional contact information
Iuliana Iatan: Department of Mathematics and Computer Science, Technical University of Civil Engineering, 020396 Bucharest, Romania
Mihăiţă Drăgan: Faculty of Mathematics and Computer Science, University of Bucharest, 010014 Bucharest, Romania
Vasile Preda: Faculty of Mathematics and Computer Science, University of Bucharest, 010014 Bucharest, Romania

Mathematics, 2022, vol. 10, issue 20, 1-29

Abstract: Our research objective is to improve the Huffman coding efficiency by adjusting the data using a Poisson distribution, which avoids the undefined entropies too. The scientific value added by our paper consists in the fact of minimizing the average length of the code words, which is greater in the absence of applying the Poisson distribution. Huffman Coding is an error-free compression method, designed to remove the coding redundancy, by yielding the smallest number of code symbols per source symbol, which in practice can be represented by the intensity of an image or the output of a mapping operation. We shall use the images from the PASCAL Visual Object Classes (VOC) to evaluate our methods. In our work we use 10,102 randomly chosen images, such that half of them are for training, while the other half is for testing. The VOC data sets display significant variability regarding object size, orientation, pose, illumination, position and occlusion. The data sets are composed by 20 object classes, respectively: aeroplane, bicycle, bird, boat, bottle, bus, car, motorbike, train, sofa, table, chair, tv/monitor, potted plant, person, cat, cow, dog, horse and sheep. The descriptors of different objects can be compared to give a measurement of their similarity. Image similarity is an important concept in many applications. This paper is focused on the measure of similarity in the computer science domain, more specifically information retrieval and data mining. Our approach uses 64 descriptors for each image belonging to the training and test set, therefore the number of symbols is 64. The data of our information source are different from a finite memory source (Markov), where its output depends on a finite number of previous outputs. When dealing with large volumes of data, an effective approach to increase the Information Retrieval speed is based on using Neural Networks as an artificial intelligent technique.

Keywords: data compression; descriptors; probabilistic models; entropy; Huffman coding; coding redundancy; coding efficiency; artificial intelligence (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/20/3847/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/20/3847/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:20:p:3847-:d:945132

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().