VespaTune

No code tool for training tabular models.

Web UI for training, monitoring, and managing models
Tune models directly from CSV files
Real-time training progress with WebSocket updates
Export models to ONNX format for deployment

Installation

Install using pip:

pip install vespatune

Quick Start

Web UI (Recommended)

Start the web interface:

vespatune

This launches the VespaTune UI at http://127.0.0.1:9999 where you can:

Upload train/validation CSV files
Configure model type, target columns, and hyperparameters
Start training with real-time progress monitoring
View trial results and metrics
Download trained models and artifacts
Manage multiple training runs

You can also specify host and port:

vespatune --host 0.0.0.0 --port 8080

CLI

Train a model with explicit train/valid split:

vespatune train \
  --train_filename train.csv \
  --valid_filename valid.csv \
  --output outputs/my_model \
  --model xgboost

Or let VespaTune auto-split your data:

vespatune train \
  --train_filename data.csv \
  --output outputs/my_model \
  --model xgboost

Make predictions:

vespatune predict \
  --model_path outputs/my_model \
  --test_filename test.csv \
  --output_filename predictions.csv

Serve a trained model for predictions:

vespatune serve --model_path outputs/my_model --host 0.0.0.0 --port 8000

Python API

from vespatune import VespaTune

# With explicit validation file
vtune = VespaTune(
    train_filename="train.csv",
    valid_filename="valid.csv",
    output="outputs/my_model",
    model_type="xgboost",  # or "lightgbm" or "catboost"
    targets=["target"],
    num_trials=100,
    time_limit=3600,
)
vtune.train()

# Or with auto-split (no validation file needed)
vtune = VespaTune(
    train_filename="data.csv",
    output="outputs/my_model",
    model_type="xgboost",
    targets=["target"],
    num_trials=100,
)
vtune.train()

Web UI Features

The web interface provides:

File Upload: Drag and drop CSV files for training (validation file is optional)
Auto-Split: If no validation file is provided, automatically splits training data
Auto Column Detection: Automatically detects columns for target and ID selection
Model Selection: Choose between XGBoost, LightGBM, or CatBoost
Real-time Monitoring: Watch training progress with live trial updates via WebSocket
Metrics Visualization: View loss curves and hyperparameter importance
Run Management: Start, stop, and delete training runs
Artifact Downloads: Download trained models, configs, and ONNX exports

Parameters

Required

Parameter	Description
`train_filename`	Path to training CSV file
`output`	Path to output directory for model artifacts

Optional

Parameter	Default	Description
`valid_filename`	`None`	Path to validation CSV file (auto-splits training data if not provided)
`model_type`	`"xgboost"`	Model to use: `"xgboost"`, `"lightgbm"`, `"catboost"`, or `"logreg"`
`test_filename`	`None`	Path to test CSV file (predictions saved if provided)
`task`	`None`	`"classification"` or `"regression"` (auto-detected if not specified)
`idx`	`"id"`	Name of the ID column
`targets`	`["target"]`	List of target column names
`features`	`None`	List of feature columns (all non-id/target columns if not specified)
`categorical_features`	`None`	List of categorical columns (auto-detected if not specified)
`use_gpu`	`False`	Whether to use GPU for training
`seed`	`42`	Random seed for reproducibility
`num_trials`	`1000`	Number of Optuna trials for hyperparameter tuning
`time_limit`	`None`	Time limit for optimization in seconds

Supported Models

XGBoost

Default model with extensive hyperparameter search
Supports GPU acceleration
Best for general-purpose tasks

LightGBM

Native categorical feature support
Fast training on large datasets
Supports GPU acceleration

CatBoost

Best native categorical feature handling
Robust to overfitting
Supports GPU acceleration

Logistic Regression

Linear model for classification tasks only
Searches over preprocessing (imputation, scaling) and regularization
Fast training, interpretable coefficients

Data Splitting

VespaTune supports two modes:

Explicit split: Provide both train_filename and valid_filename
Auto-split: Provide only train_filename - VespaTune automatically creates a 5-fold split and uses fold 0 (80% train, 20% valid)

For manual control over splits, use the splitter utility:

vespatune splitter \
  --data_filename data.csv \
  --output splits/ \
  --target target \
  --task classification \
  --num_folds 5

Or via Python:

from vespatune import VespaTuneSplitter

splitter = VespaTuneSplitter(
    data_filename="data.csv",
    output="splits/",
    target="target",
    task="classification",
    num_folds=5,
)
splitter.split()

This creates fold_0_train.csv, fold_0_valid.csv, etc. for k-fold cross-validation.

Prediction

Using the trained model

from vespatune import VespaTunePredict

predictor = VespaTunePredict(model_path="outputs/my_model")

# Predict on file
predictor.predict_file("test.csv", "predictions.csv")

# Predict single sample
prediction = predictor.predict_single({"feature1": 1.0, "feature2": "A"})

Using ONNX model

from vespatune import VespaTuneONNXPredict

predictor = VespaTuneONNXPredict(model_path="onnx_model/")

# Predict on file
predictor.predict_file("test.csv", "predictions.csv")

# Predict single sample
prediction = predictor.predict_single({"feature1": 1.0, "feature2": "A"})

Standalone Preprocessing

Use VespaTuneProcessor when you want to preprocess data independently and pass it to an external ONNX runtime or inference system:

from vespatune import VespaTuneProcessor
import onnxruntime as ort

# Load preprocessor from model or ONNX export directory
processor = VespaTuneProcessor(model_path="outputs/my_model")

# Transform DataFrame
processed = processor.transform(df)  # Returns float32 numpy array

# Transform single sample
processed = processor.transform_single({"feature1": 1.0, "feature2": "A"})

# Get feature metadata
processor.get_feature_names()        # Input feature names
processor.get_categorical_features() # Categorical feature names
processor.get_feature_names_out()    # Output feature names after transform
processor.get_input_schema()         # Pydantic schema for API validation

# Pass to ONNX runtime
session = ort.InferenceSession("model.onnx")
predictions = session.run(None, {"input": processed})

CLI Reference

Default (UI)

vespatune [--host HOST] [--port PORT]

options:
  --host                Host to serve on (default: 127.0.0.1)
  --port                Port to serve on (default: 9999)
  --version, -v         Display VespaTune version

train

vespatune train --help

options:
  --train_filename      Path to training file (required)
  --valid_filename      Path to validation file (optional, auto-splits if not provided)
  --output              Path to output directory (required)
  --model               Model type: xgboost, lightgbm, catboost, logreg (default: xgboost)
  --test_filename       Path to test file
  --task                Task type: classification, regression
  --idx                 ID column name
  --targets             Target column(s), separate multiple by ';'
  --features            Feature columns, separate by ';'
  --use_gpu             Use GPU for training
  --seed                Random seed (default: 42)
  --num_trials          Number of Optuna trials (default: 100)
  --time_limit          Time limit in seconds

predict

vespatune predict --help

options:
  --model_path          Path to trained model directory (required)
  --test_filename       Path to test file (required)
  --output_filename     Path to output predictions file (required)

export

vespatune export --help

options:
  --model_path          Path to trained model directory (required)
  --output_dir          Path to ONNX output directory

serve

vespatune serve --help

options:
  --model_path          Path to ONNX export directory
  --host                Host to bind (default: 127.0.0.1)
  --port                Port to bind (default: 9999)
  --workers             Number of workers (default: 1)
  --reload              Enable auto-reload for development

splitter

vespatune splitter --help

options:
  --data_filename       Path to data file (required)
  --output              Path to output directory (required)
  --target              Target column name (required)
  --task                Task type: classification, regression (required)
  --num_folds           Number of folds (default: 5)

Output Files

After training, the following files are created in the output directory:

File	Description
`vtune_model.final`	Trained model
`vtune.config`	Model configuration
`vtune.best_params`	Best hyperparameters from Optuna
`vtune.preprocessor.joblib`	Fitted preprocessor (encoding, scaling, imputation)
`vtune.target_encoder`	Target encoder (for classification)
`params.db`	Optuna study database
`train.feather`	Processed training data
`valid.feather`	Processed validation data
`onnx/`	ONNX export directory (after export)
`_splits/`	Auto-generated train/valid splits (only if no validation file provided)

Example

from vespatune import VespaTune

# Train with LightGBM
vtune = VespaTune(
    train_filename="data/train.csv",
    valid_filename="data/valid.csv",
    output="outputs/lgb_model",
    model_type="lightgbm",
    targets=["price"],
    task="regression",
    num_trials=200,
    time_limit=1800,
    use_gpu=False,
    seed=42,
)
vtune.train()

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
assets		assets
data_samples		data_samples
src/vespatune		src/vespatune
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

VespaTune

Installation

Quick Start

Web UI (Recommended)

CLI

Python API

Web UI Features

Parameters

Required

Optional

Supported Models

XGBoost

LightGBM

CatBoost

Logistic Regression

Data Splitting

Prediction

Using the trained model

Using ONNX model

Standalone Preprocessing

CLI Reference

Default (UI)

train

predict

export

serve

splitter

Output Files

Example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages