Accurate predictions on small data with a tabular foundation model
Noah Hollmann (),
Samuel Müller (),
Lennart Purucker,
Arjun Krishnakumar,
Max Körfer,
Shi Bin Hoo,
Robin Tibor Schirrmeister and
Frank Hutter ()
Additional contact information
Noah Hollmann: University of Freiburg
Samuel Müller: University of Freiburg
Lennart Purucker: University of Freiburg
Arjun Krishnakumar: University of Freiburg
Max Körfer: University of Freiburg
Shi Bin Hoo: University of Freiburg
Robin Tibor Schirrmeister: Faculty of Medicine, University of Freiburg
Frank Hutter: University of Freiburg
Nature, 2025, vol. 637, issue 8045, 319-326
Abstract:
Abstract Tabular data, spreadsheets organized in rows and columns, are ubiquitous across scientific fields, from biomedicine to particle physics to economics and climate science1,2. The fundamental prediction task of filling in missing values of a label column based on the rest of the columns is essential for various applications as diverse as biomedical risk models, drug discovery and materials science. Although deep learning has revolutionized learning from raw data and led to numerous high-profile success stories3–5, gradient-boosted decision trees6–9 have dominated tabular data for the past 20 years. Here we present the Tabular Prior-data Fitted Network (TabPFN), a tabular foundation model that outperforms all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time. In 2.8 s, TabPFN outperforms an ensemble of the strongest baselines tuned for 4 h in a classification setting. As a generative transformer-based foundation model, this model also allows fine-tuning, data generation, density estimation and learning reusable embeddings. TabPFN is a learning algorithm that is itself learned across millions of synthetic datasets, demonstrating the power of this approach for algorithm development. By improving modelling abilities across diverse fields, TabPFN has the potential to accelerate scientific discovery and enhance important decision-making in various domains.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41586-024-08328-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:637:y:2025:i:8045:d:10.1038_s41586-024-08328-6
Ordering information: This journal article can be ordered from
https://www.nature.com/
DOI: 10.1038/s41586-024-08328-6
Access Statistics for this article
Nature is currently edited by Magdalena Skipper
More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().