Augmenting Naïve Bayes Classifiers with k -Tree Topology

Dastjerdi, Fereshteh R.; Cai, Liming

Augmenting Naïve Bayes Classifiers with k -Tree Topology

Fereshteh R. Dastjerdi () and Liming Cai ()
Additional contact information
Fereshteh R. Dastjerdi: School of Computing, University of Georgia, Athens, GA 30602, USA
Liming Cai: School of Computing, University of Georgia, Athens, GA 30602, USA

Mathematics, 2025, vol. 13, issue 13, 1-16

Abstract: The Bayesian network is a directed, acyclic graphical model that can offer a structured description for probabilistic dependencies among random variables. As powerful tools for classification tasks, Bayesian classifiers often require computing joint probability distributions, which can be computationally intractable due to potential full dependencies among feature variables. On the other hand, Naïve Bayes, which presumes zero dependencies among features, trades accuracy for efficiency and often comes with underperformance. As a result, non-zero dependency structures, such as trees, are often used as more feasible probabilistic graph approximations; in particular, Tree Augmented Naïve Bayes (TAN) has been demonstrated to outperform Naïve Bayes and has become a popular choice. For applications where a variable is strongly influenced by multiple other features, TAN has been further extended to the k -dependency Bayesian classifier (KDB), where one feature can depend on up to k other features (for a given k ≥ 2 ). In such cases, however, the selection of the k parent features for each variable is often made through heuristic search methods (such as sorting), which do not guarantee an optimal approximation of network topology. In this paper, the novel notion of k -tree Augmented Naïve Bayes ( k -TAN) is introduced to augment Naïve Bayesian classifiers with k -tree topology as an approximation of Bayesian networks. It is proved that, under the Kullback–Leibler divergence measurement, k -tree topology approximation of Bayesian classifiers loses the minimum information with the topology of a maximum spanning k -tree, where the edge weights of the graph are mutual information between random variables conditional upon the class label. In addition, while in general finding a maximum spanning k -tree is NP-hard for fixed k ≥ 2 , this work shows that the approximation problem can be solved in time O ( n k + 1 ) if the spanning k -tree also desires to retain a given Hamiltonian path in the graph. Therefore, this algorithm can be employed to ensure efficient approximation of Bayesian networks with k -tree augmented Naïve Bayesian classifiers of the guaranteed minimum loss of information.

Keywords: Bayesian networks; Naïve Bayes augmentation; classification; mutual information; KL-divergence; treewidth; k -tree; maximum spanning k -tree (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/13/2185/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/13/2185/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:13:p:2185-:d:1694586

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().