Experimental evaluation repository demonstrating three execution paradigms for scientific workflows using the Breast Cancer Wisconsin dataset.
Large Action Models (LAMs) are LLM-driven systems that select and execute high-level actions ("load data," "train model") rather than generating code line-by-line.
R-LAM extends LAMs with reproducibility constraints:
- Complete execution tracing
- Deterministic replay without re-execution
- Controlled workflow forking
- Full provenance and auditability
| Feature | Script | Naive LAM | R-LAM |
|---|---|---|---|
| Control | Hard-coded | LLM-driven | LLM-driven |
| Tracing | ❌ | ❌ | ✅ |
| Replay | ❌ | ❌ | ✅ |
| Fork | ❌ | ❌ | ✅ |
| Provenance | Code only | None | Full DAG |
Traditional deterministic execution with fixed control flow. No LLM involvement.
python -m pipelines.naive_pipelineLLM plans actions dynamically but executes directly without reproducibility infrastructure.
export OPENROUTER_API_KEY="your_key"
python -m pipelines.lam_pipelineLLM-driven execution with full tracing, replay, and forking support via rlam framework.
export OPENROUTER_API_KEY="your_key"
python -m pipelines.rlam_pipeline├── actions/ # Pure action functions
│ ├── load.py # load_dataset
│ ├── analyze.py # analyze_data
│ ├── preprocess.py # preprocess_data
│ ├── train.py # train_model
│ └── evaluate.py # evaluate_model
├── lam/ # LAM components
│ ├── action_space.py # Action registry
│ ├── planner.py # LLM-based planner
│ └── state.py # Workflow state
├── pipelines/ # Three execution modes
│ ├── naive_pipeline.py
│ ├── lam_pipeline.py
│ └── rlam_pipeline.py
├── experiments/ # Evaluation harness
│ ├── run_all.py # Execute all pipelines
│ ├── metrics.py # Metric computation
│ └── results_table.py # Result formatting
└── config.py # Configuration
Each action: (inputs, params) → outputs
| Action | Inputs | Parameters | Outputs |
|---|---|---|---|
load_dataset |
- | - | X, y |
analyze_data |
X |
- | stats |
preprocess_data |
X |
- | X_processed |
train_model |
X_processed, y |
C (regularization) |
model |
evaluate_model |
model, X_processed, y |
- | accuracy |
pip install -r requirements.txt
# or with uv:
uv pip install -r requirements.txt# Individual pipelines
python -m pipelines.naive_pipeline
python -m pipelines.lam_pipeline # Requires OPENROUTER_API_KEY
python -m pipelines.rlam_pipeline # Requires OPENROUTER_API_KEY
# All experiments with metrics
python -m experiments.run_allActions over code generation: Semantic units that are traceable, replayable, and auditable by design.
Execution constraints: R-LAM enforces determinism and provenance at the execution layer, not by modifying the LLM.
Minimal scope: Small action space (5 actions), single dataset, single model—intentionally constrained for research clarity.
This repository provides the experimental evaluation for:
"R-LAM: Reproducibility-Constrained Large Action Models for Scientific Workflow Automation"
Core contributions:
- Formal action schema for LAM-driven workflows
- Deterministic execution engine with provenance capture
- Replay and forking semantics for iterative experimentation
- Experimental validation on representative ML workflow
This is a research artifact, not a production system.
- A production ML platform
- An autonomous discovery system
- A claim of novel ML algorithms
Contribution: Reproducibility infrastructure for LAM-driven scientific automation, not the science itself.
- Simple LLM planner (not optimized)
- Minimal error handling (for clarity)
- Single dataset and model (Breast Cancer Wisconsin + Logistic Regression)
- Small action space (5 actions)
These are intentional constraints for isolating execution semantics.
@article{rlam2026,
title={R-LAM: Reproducibility-Constrained Large Action Models for Scientific Workflow Automation},
author={Suriya Sureshkumar},
year={2026}
}- Framework: github.com/suriyasureshok/rlam
- Evaluation: This repository
MIT License