nanoTabPFN

Train your own small TabPFN in less than 500 LOC and a few minutes.

The purpose of this repository is to be a good starting point for students and researchers that are interested in learning about how TabPFN works under the hood.

Clone the repository, afterwards install dependencies via:

pip install numpy torch schedulefree h5py scikit-learn openml seaborn

Our Code

model.py contains the implementation of the architecture and a sklearn-like interface in less than 200 lines of code.
train.py implements a simple training loop and prior dump data loader in under 200 lines
experiment.ipynb will recreate the experiment from the paper (requires pip install tabpfn==2.2.1)

Pretrain your own nanoTabPFN

To pretrain your own nanoTabPFN, you need to first download a prior data dump from here, then run train.py.

cd nanoTabPFN

# download data dump
curl -L -H "User-Agent: Mozilla/5.0" -H "Referer: https://figshare.com/" -o 300k_150x5_2.h5 "https://figshare.com/ndownloader/files/58932628?private_link=63fc1ada93e42e388e63"

python train.py

Step by Step explanation:

First we import our code from model.py and train.py

from model import NanoTabPFNModel
from model import NanoTabPFNClassifier
from train import PriorDumpDataLoader
from train import train, get_default_device

Then we instantiate our model

model = NanoTabPFNModel(
    embedding_size=96,
    num_attention_heads=4,
    mlp_hidden_size=192,
    num_layers=3,
    num_outputs=2
)

and our dataloader

prior = PriorDumpDataLoader(
    "300k_150x5_2.h5",
    num_steps=2500,
    batch_size=32,
)

Now we can train our model:

device = get_default_device()
model, _ = train(
    model,
    prior,
    lr = 4e-3,
    device = device
)

and finally we can instantiate our classifier:

clf = NanoTabPFNClassifier(model, device)

and use its .fit, .predict and .predict_proba:

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score, accuracy_score
from sklearn.model_selection import train_test_split

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

clf.fit(X_train, y_train)
prob = clf.predict_proba(X_test)
pred = clf.predict(X_test)
print('ROC AUC', roc_auc_score(y_test, prob))
print('Accuracy', accuracy_score(y_test, pred))

TFM-Playground

The nanoTabPFN repository is supposed to stay ultra small and simple, but we created another repository, the TFM-Playground which we are building out to have a lot more features, like regression, multiple prior interfaces, multiple architectures, ensembling of different pre-processings and more, so check it out if you are interested!

Paper

Check out our paper. Please use the following BibTex to cite:

@article{pfefferle2025nanotabpfn,
  title={nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN},
  author={Pfefferle, Alexander and Hog, Johannes and Purucker, Lennart and Hutter, Frank},
  journal={arXiv preprint arXiv:2511.03634},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
CITATION.bib		CITATION.bib
README.md		README.md
experiment.ipynb		experiment.ipynb
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nanoTabPFN

Our Code

Pretrain your own nanoTabPFN

Step by Step explanation:

TFM-Playground

Paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nanoTabPFN

Our Code

Pretrain your own nanoTabPFN

Step by Step explanation:

TFM-Playground

Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages