Skip to content

Experimental evaluation of Large Action Model execution paradigms for reproducible scientific workflows

Notifications You must be signed in to change notification settings

suriyasureshok/LAM_Reproducible_ML_Workflows

Repository files navigation

Large Action Model Reproducible Scientific Workflows with R-LAM

Experimental evaluation repository demonstrating three execution paradigms for scientific workflows using the Breast Cancer Wisconsin dataset.

Overview

Large Action Models (LAMs) are LLM-driven systems that select and execute high-level actions ("load data," "train model") rather than generating code line-by-line.

R-LAM extends LAMs with reproducibility constraints:

  • Complete execution tracing
  • Deterministic replay without re-execution
  • Controlled workflow forking
  • Full provenance and auditability

Three Execution Paradigms

Feature Script Naive LAM R-LAM
Control Hard-coded LLM-driven LLM-driven
Tracing
Replay
Fork
Provenance Code only None Full DAG

1. Script-Based Pipeline (pipelines/naive_pipeline.py)

Traditional deterministic execution with fixed control flow. No LLM involvement.

python -m pipelines.naive_pipeline

2. Naive LAM Pipeline (pipelines/lam_pipeline.py)

LLM plans actions dynamically but executes directly without reproducibility infrastructure.

export OPENROUTER_API_KEY="your_key"
python -m pipelines.lam_pipeline

3. R-LAM Pipeline (pipelines/rlam_pipeline.py)

LLM-driven execution with full tracing, replay, and forking support via rlam framework.

export OPENROUTER_API_KEY="your_key"
python -m pipelines.rlam_pipeline

Repository Structure

├── actions/              # Pure action functions
│   ├── load.py          # load_dataset
│   ├── analyze.py       # analyze_data
│   ├── preprocess.py    # preprocess_data
│   ├── train.py         # train_model
│   └── evaluate.py      # evaluate_model
├── lam/                 # LAM components
│   ├── action_space.py  # Action registry
│   ├── planner.py       # LLM-based planner
│   └── state.py         # Workflow state
├── pipelines/           # Three execution modes
│   ├── naive_pipeline.py
│   ├── lam_pipeline.py
│   └── rlam_pipeline.py
├── experiments/         # Evaluation harness
│   ├── run_all.py       # Execute all pipelines
│   ├── metrics.py       # Metric computation
│   └── results_table.py # Result formatting
└── config.py            # Configuration

Action Space

Each action: (inputs, params) → outputs

Action Inputs Parameters Outputs
load_dataset - - X, y
analyze_data X - stats
preprocess_data X - X_processed
train_model X_processed, y C (regularization) model
evaluate_model model, X_processed, y - accuracy

Installation

pip install -r requirements.txt
# or with uv:
uv pip install -r requirements.txt

Running Experiments

# Individual pipelines
python -m pipelines.naive_pipeline
python -m pipelines.lam_pipeline     # Requires OPENROUTER_API_KEY
python -m pipelines.rlam_pipeline    # Requires OPENROUTER_API_KEY

# All experiments with metrics
python -m experiments.run_all

Key Design Choices

Actions over code generation: Semantic units that are traceable, replayable, and auditable by design.

Execution constraints: R-LAM enforces determinism and provenance at the execution layer, not by modifying the LLM.

Minimal scope: Small action space (5 actions), single dataset, single model—intentionally constrained for research clarity.

Research Context

This repository provides the experimental evaluation for:

"R-LAM: Reproducibility-Constrained Large Action Models for Scientific Workflow Automation"

Core contributions:

  1. Formal action schema for LAM-driven workflows
  2. Deterministic execution engine with provenance capture
  3. Replay and forking semantics for iterative experimentation
  4. Experimental validation on representative ML workflow

This is a research artifact, not a production system.

What This Is Not

  • A production ML platform
  • An autonomous discovery system
  • A claim of novel ML algorithms

Contribution: Reproducibility infrastructure for LAM-driven scientific automation, not the science itself.

Limitations

  • Simple LLM planner (not optimized)
  • Minimal error handling (for clarity)
  • Single dataset and model (Breast Cancer Wisconsin + Logistic Regression)
  • Small action space (5 actions)

These are intentional constraints for isolating execution semantics.

Citation

@article{rlam2026,
  title={R-LAM: Reproducibility-Constrained Large Action Models for Scientific Workflow Automation},
  author={Suriya Sureshkumar},
  year={2026}
}

Artifacts

License

MIT License

About

Experimental evaluation of Large Action Model execution paradigms for reproducible scientific workflows

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages