Data Lake Management System based on Topic Modeling
Amine El Haddadi,
Oumaima El Haddadi,
Mohamed Cherradi,
Fadwa Bouhafer,
Anass El Haddadi and
Ahmed El Allaoui
Data and Metadata, 2023, vol. 2, 183
Abstract:
In an environment full of competitiveness, data is a valuable asset for any company looking to grow. It represents a real competitive economic and strategic lever. The most reputable companies are not only concerned with collecting data from heterogeneous data sources, but also with analyzing and transforming these datasets into better decision-making. In this context, the data lake continues to be a powerful solution for storing large amounts of data and providing data analytics for decision support. In this paper, we examine the intelligent data lake management system that addresses the drawbacks of traditional business intelligence, which is no longer capable of handling data-driven demands. Data lakes are highly suitable for analyzing data from a variety of sources, particularly when data cleaning is time-consuming. However, ingesting heterogeneous data sources without any schema represents a major issue, and a data lake can easily turn into a data swamp. In this study, we implement the LDA topic model for managing the storage, processing, analysis, and visualization of big data. To assess the usefulness of our proposal, we evaluated its performance based on the topic coherence metric. The results of these experiments showed our approach to be more accurate on the tested datasets
Date: 2023
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:dbk:datame:v:2:y:2023:i::p:183:id:1056294dm2023183
DOI: 10.56294/dm2023183
Access Statistics for this article
More articles in Data and Metadata from AG Editor
Bibliographic data for series maintained by Javier Gonzalez-Argote ().