Three Metric-Based Method for Data Compatibility Calculation
Daniel Vodňanský
Acta Informatica Pragensia, 2021, vol. 2021, issue 1, 38-60
Abstract:
This article analyzes ways of calculating characteristics of data and most common data structure types that allow comparison between them or on a time axis. To achieve this, it studies the key aspects of relational databases, XML, JSON and RDF structure types. These data structure types are compared to multiple isolated approaches to data quality and other data characteristics measurements. The goals of the article are the calculation method itself and a storage structure for calculated values. The article presents a method of characterization of data and data structure types based on the calculation of three metrics: the amount of structuredness, the amount of hierarchicallity and the amount of information. This triad of metrics allows comparison between various data sets (objects), for example evaluating the complexity of the transformation of data from one data object to another, as well as with data structure types (as mentioned above). Based on the vector of three metrics, the calculation method of the compatibility between data and data structure type is proposed. This method can help select the most compatible data format for existing data. The calculated values of metrics can also detect non-optimal storage design and classify data transformations. The method was evaluated on an example case study, which showed its usability on an example demonstration data set. It can be used in the process of data modelling to help select optimal data structure type, to design a data transformation process and to optimize existing data storages.
Keywords: Data metrics; Amount of information; Metadata; Relational database; XML; JSON; RDF; Ontology; Transformation; Structuredness; Hierarchicallity; Normalization; Visualization (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://aip.vse.cz/doi/10.18267/j.aip.145.html (text/html)
http://aip.vse.cz/doi/10.18267/j.aip.145.pdf (application/pdf)
free of charge
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:prg:jnlaip:v:2021:y:2021:i:1:id:145:p:38-60
Ordering information: This journal article can be ordered from
Redakce Acta Informatica Pragensia, Katedra systémové analýzy, Vysoká škola ekonomická v Praze, nám. W. Churchilla 4, 130 67 Praha 3
http://aip.vse.cz
DOI: 10.18267/j.aip.145
Access Statistics for this article
Acta Informatica Pragensia is currently edited by Editorial Office
More articles in Acta Informatica Pragensia from Prague University of Economics and Business Contact information at EDIRC.
Bibliographic data for series maintained by Stanislav Vojir ().