This repository contains the dataset from our paper SALT-KG: A Benchmark for Semantics-Aware Learning on Enterprise Tables, presented at EurIPS'25 Table Representation Workshop.
Building upon the SALT benchmark for relational prediction, we introduce SALT-KG, a benchmark for semantics-aware learning on enterprise tables. SALT-KG extends SALT by linking its multi-table transactional data with a structured Operational Business Knowledge represented in a Metadata Knowledge Graph (OBKG) that captures field-level descriptions, relational dependencies, and business object-types. This extension enables evaluation of models that jointly reason over tabular evidence and contextual semantics—an increasingly critical capability for foundation models on structured data. Empirical analysis reveals that while metadata-derived features yield modest improvements in classical prediction metrics, these metadata features consistently highlight gaps in models’ ability to leverage semantics in relational context. By reframing tabular prediction as semantics-conditioned reasoning, SALT-KG establishes a benchmark to advance tabular FMs grounded in declarative knowledge, providing the first empirical step toward semantically linked tables in structured data at enterprise scale.
There is growing research on Tabular Foundation Models. TRL models are typically trained and evaluated on benchmarks that represent relational structure but lack explicit semantic grounding or declarative context. Knowledge graph (KG) and data integration communities have explored connecting tables to semantic graphs through systems such as JENTAB. We can bridge this gap by enriching enterprise relational data with an explicit semantic layer that links tables, fields, and business objects through declarative knowledge in KG
For every relation (Table) in the underlying SALT dataset, we find a matching node in the KG (a View).. We extract triples related to the Views that include:
Fields: data abstraction nodes with associated fields, labels, associations, data classes, reference fields, and other elements.
ObjectNodeTypes: further semantic metadata through technical definitions, business object descriptions
The SALT-KG dataset consists of 4 tables from the SALT benchmark, enriched with semantic metadata from an Operational Business Knowledge Graph (OBKG). The dataset includes:
- 4 relational tables with transactional data
- Metadata Knowledge Graph (OBKG) with field-level descriptions, relational dependencies, and business object
- Train, validation, and test splits for each table
N/A
No known issues
If you use this dataset in your research, please cite the following paper:
@inproceedings{mulang2025saltkg,
title={SALT-KG: A Benchmark for Semantics-Aware Learning on Enterprise Tables},
author={Mulang', Isaiah Onando and Sasaki, Felix and Klein, Tassilo and Kolk, Jonas and Grechanov, Nikolay and Hoffart, Johannes},
booktitle={Proceedings of the NeurIPS 2025 Table Representation Learning Workshop},
year={2025}
}
Create an issue in this repository if you find a bug or have questions about the content.
For additional support, ask a question in SAP Community.
If you wish to contribute code, offer fixes or improvements, please send a pull request. Due to legal reasons, contributors will be asked to accept a DCO when they create the first pull request to this project. This happens in an automated fashion during the submission process. SAP uses the standard DCO text of the Linux Foundation.
Copyright (c) 2025 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the CC-BY-NC-SA-4.0 except as noted otherwise in the LICENSE file.
