Top Qs
Timeline
Chat
Perspective

TabPFN

AI Foundation model for tabular data From Wikipedia, the free encyclopedia

Remove ads

TabPFN (Tabular Prior-data Fitted Network) is a machine learning model for tabular datasets proposed in 2022. It uses a transformer architecture.[1] It is intended for supervised classification and regression analysis on small- to medium-sized datasets, e.g., up to 10,000 samples.[1] TabPFN-2.5 is the latest version of the foundation model.

Quick facts Developers, Initial release ...
Remove ads

History

TabPFN was first introduced in a 2022 pre-print and presented at ICLR 2023.[2] TabPFN v2 was published in 2025 in Nature (journal) by Hollmann and co-authors.[1] The source code is published on GitHub under a modified Apache License and on PyPi.[4] Writing for ICLR blogs, McCarter states that the model has attracted attention due to its performance on small dataset benchmarks.[5] TabPFN v2.5, the next generation of the foundation model, was released on November 6, 2025.[6]

Prior Labs, founded in 2024, aims to commercialize TabPFN.[7]

Remove ads

Overview and pre-training

Summarize
Perspective

TabPFN supports classification, regression and generative tasks.[1] It leverages "Prior-Data Fitted Networks"[8] models to model tabular data.[1] By using a transformer pre-trained on synthetic tabular datasets,[2][5] TabPFN avoids benchmark contamination and costs of curating real-world data.[2]

TabPFN v2 was pre-trained on approximately 130 million such datasets.[1] Synthetic datasets are generated using causal models or Bayesian neural networks; this can include simulating missing values, imbalanced data, and noise.[1] Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures.[1] During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass.[1] The new dataset is then processed in a single forward pass without retraining.[2] The model's transformer encoder processes features and labels by alternating attention across rows and columns.[9] TabPFN v2 handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation.[1]

Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization.[9]

Remove ads

Research

TabPFN is the subject of on-going research. Applications for TabPFN have been investigated for domains such as chemoproteomics,[10] insurance risk classification,[11] and metagenomics.[12]

Limitations

TabPFN has been criticized for its "one large neural network is all you need" approach to modeling problems.[5] Further, its performance is limited in high-dimensional and large-scale datasets.[13]

See also

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads