[RFR] AIPMDM-888: Simplify OSS Metaflow core tests to be more Pythonic / pytest-friendly / tox-friendly#3151
[RFR] AIPMDM-888: Simplify OSS Metaflow core tests to be more Pythonic / pytest-friendly / tox-friendly#3151Tingting-Chang wants to merge 59 commits into
Conversation
Greptile SummaryThis PR refactors the Metaflow core test suite (90 files) to replace a 643-line custom orchestration layer ( The bulk of the previous review threads have been addressed: Confidence Score: 4/5Safe to merge; all previously flagged P1 issues are addressed and the two remaining comments are P2 style/diagnostic suggestions. No P0 or new P1 findings. The one open P1 from prior threads (env restore skipped if runner.cleanup() raises) is pre-existing and unchanged. Changes are confined to test infrastructure with no runtime behaviour change to the Metaflow library itself, except for the targeted GCS emulator client fix. test/core/test_core_pytest.py — api executor error-detail capture; test/core/conftest.py — disabled_tests case sensitivity note. Important Files Changed
Reviews (33): Last reviewed commit: "fix precommit" | Re-trigger Greptile |
b4d9457 to
8dc62b3
Compare
ac6f914 to
94d51ca
Compare
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
| for d in addl_spec.submodule_search_locations | ||
| if os.path.isdir(d) | ||
| ] | ||
| if not new_dirs: |
There was a problem hiding this comment.
When did this happen? These things seem to keep evolving but I believe the previous case does take care of editable installs.
There was a problem hiding this comment.
This changes is related to the modern pip package for CardExtensionsImportTest. Details:
The refactor changed which Python process runs the extension discovery: previously run_tests.py was a subprocess whose PYTHONPATH included only test/core/ (so the card packages were found via their finders); the new code calls run_test() in-process with the same PYTHONPATH. In both cases, the finders' find_spec('metaflow_extensions') returns the same broken result. The bug was always there, it was just masked because
CardExtensionsImportTesthad never actually been run successfully in this environment before core-local was wired up.
An alternative will be install the card packages non-editably in test/core/tox.ini:
# Remove -e prefix for the card extensions
{toxinidir}/../../test/extensions/packages/card_via_extinit
Let me WDYT and I can make changes. Thanks
There was a problem hiding this comment.
Hi @talsperre, could you take a look at ^^ and let me know WDYT? Thanks!
fb6a0ff to
d171700
Compare
10a94cb to
b6c165b
Compare
- Reset LocalMetadataProvider._INFO and LocalStorage.datastore_root class-level caches in _run_flow() finally block so deleted tempdirs from one test don't cause MetaflowNotFound in the next cli test - Save/restore _LMP._INFO in _isolated_client_globals() for same reason - Add AWS_ENDPOINT_URL_EVENTBRIDGE to core-sfn tox env so boto3 routes DisableRule calls to the local eventbridge_stub instead of real AWS - Add CI wait step for EventBridge stub readiness before sfn tests Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
DynamoDbClient uses get_aws_client("dynamodb") which reads the standard
botocore env var AWS_ENDPOINT_URL_DYNAMODB — it does not read the old
METAFLOW_SFN_DYNAMO_DB_CLIENT_PARAMS convention.
Replace the injected var so Batch containers running foreach tasks
(save_foreach_cardinality, save_parent_task_id_for_foreach_join,
get_parent_task_ids_for_foreach_join) can reach ddb-local at
host.docker.internal:8765 instead of hitting real AWS DynamoDB.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…duce argo disk BasicForeach (32 parallel pods) exhausts the 6 GB minikube node for argo and the GitHub runner RAM for sfn-batch; MergeArtifactsInclude imports pytest at module level which fails inside python:3.9 containers. Add both to [_disabled]scheduler so core-argo and core-sfn skip them (matching the existing cloud exclusion for core-batch/core-k8s). Reduce core-argo ephemeral disk request from 1024 MB to 50 MB — same value used by core-k8s — to avoid unnecessary ephemeral storage pressure even for small foreach tests. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
CardImport tests card extension packages (editable_import_test_card, non_editable_import_test_card from card_via_extinit/card_via_init) that are installed in the tox venv but not in batch/k8s container images. Without those packages, card generation silently fails and the check_results assertions on card presence would fail. Matches the pattern already in place for CardExtensionsImport (which tests the same class of packages and is already disabled for cloud/scheduler). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Argo, k8s TTL+cleanup, dump failures
Bitnami retired both the charts.bitnami.com helm repo index AND the
CDN (charts.bitnami.com/bitnami/postgresql-12.5.6.tgz), so the previous
curl-based workaround silently fails. The helm_remote call in Tilt then
tries helm pull from the live bitnami repo, which no longer lists 12.5.6,
causing the Tiltfile to error and the devstack to never become ready.
Switch postgresql.tiltfile to the bitnami archive branch on GitHub
(archive-full-index) which preserves all historical chart versions.
Update all three CI workflows (core-tests, full-stack-test, ux-tests)
to:
- add the repo under the name 'postgresql' (matching repo_name in the
Tiltfile so the Tilt helm cache path aligns)
- pre-pull via 'helm pull postgresql/postgresql --version 12.5.6'
instead of the broken CDN curl
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The previous rule accepted from MINIKUBE_GW (192.168.49.1, the host's own IP on the minikube Docker bridge) which never matched. kube-proxy on the minikube node MASQUERADE's pod traffic to the minikube node IP (192.168.49.2 = minikube ip) before it exits the container, so the host always sees the source as 192.168.49.2, not 192.168.49.1 or the pod CIDR. Switch to $(minikube ip) in the ACCEPT rule. Also make the pre-test verification hard-fail (exit 1) if localbatch is unreachable from the minikube container — previously it logged the failure silently and then all 156 tests failed with cryptic errors instead of one clear step failure. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
localbatch runs Batch containers on the CI runner's Docker daemon using python:3.10 as the default image. Without a pre-pull, each container startup hits Docker Hub (30-60s) before pip install (20s) and code-package download (10s) — 60-90s total per container. For graphs with multiple sequential container groups (simple-foreach has ~5, nested-branches has ~4+, simple_switch ~4), the per-test wall-clock time approaches or exceeds the 600 s scheduler timeout, causing ALL tests in those graphs to fail with "scheduler run timed out". Pre-caching python:3.10 in the CI runner's Docker daemon cuts container startup to ~30s, keeping every graph combination well within the 600 s budget. The k8s/argo fix (docker pull + minikube image load) already ran docker pull for those backends; the sfn/batch step does the same without the minikube load. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
PR Type
Summary
Issue
Implements AIPMDM-888: refactor test/core/ to be compatible with standard Python tooling (pytest, tox) so contributors can run tests without understanding the custom orchestration layer.
R1: Replace
contexts.jsonwith tox environments and pytest fixturesThis change removes
contexts.json, which duplicated an environment matrix that tox already supports natively.test/core/tox.iniis added as the source of truth for core test environments. It defines onetestenv:core-*per infrastructure backend: local, GCS, Azure, Batch, K8s, Argo, and SFN.setenv, including Metaflow configuration (datastore, metadata service, credentials) andMETAFLOW_CORE_*control variables (marker, top-level options, executors, disabled tests).{[testenv]setenv}inheritance, with a_disabledsection for common disabled-test lists.test/core/conftest.pynow reads all context fromos.environand usespytest_generate_teststo parametrize(graph, test, executor)combinations. No Python context file is imported anywhere.METAFLOW_CORE_CHECKSenvironment variable into a proper session-scopedcore_checksfixture, which can now be overridden per directory without touching tox.tox.inino longer containscore-*envs and now points users totest/core/tox.ini.R2: Eliminate the custom test runner
run_tests.py(643 lines) is deleted. Test execution is now handled entirely by pytest.test/core/conftest.pynow defines_iter_graphs()and_iter_tests()directly instead of importing them fromrun_tests.py.test/core/test_core_pytest.pynow defines_run_flow()directly in place ofrun_test(). It supportscli,api, and scheduler executors.This also fixes two existing bugs:
apiexecutor now catchesRuntimeErrorfromRunner.run()and converts it into a non-zero return code instead of surfacing an unhandled exceptionFileNotFoundErrorfromopen("run-id")R3: Convert test flows to standard pytest tests
The core test suite now behaves like normal pytest code instead of relying on a subprocess-heavy custom harness.
MetaflowTesthas been renamed toFlowDefinitionacross all 64 test classes and inmetaflow_test/__init__.py.Testsuffix is removed because these classes are flow templates combined with graph topologies byFlowFormatter, not pytest test cases.MetaflowTest = FlowDefinitionalias is kept for external compatibility._run_flow()dynamically imports the generatedtest_flow.py, instantiates the flow class, and callsformatter.test.check_results(flow, checker)directly.AssertionErrors with full pytest tracebacks instead of opaque subprocess exit codes.FlowFormatter._check_lines()andcheck_codeare removed.MetaflowCheckno longer depends onsys.argv:run_idandcli_optionsare now explicit constructor parameters.new_checkernow accepts either a checker class or a checker class name.assert_equals(a, b)calls across 54 files are replaced with plainassert a == b, enabling pytest assertion rewriting and better failure output.assert_exception(lambda: f(), E)are replaced withpytest.raises(E)intag_mutation,merge_artifacts,merge_artifacts_include, andmetadata_check.assert_equals_metadatais removed and replaced with inline assertions inresume_end_step.py.R4: Simplify test utilities
Test helpers are reduced to standard pytest patterns wherever possible.
assert_equals,assert_exception, andassert_equals_metadataare removed frommetaflow_test/__init__.py.ExpectationFailed,AssertArtifactFailed,AssertLogFailed, andAssertCardFailednow subclassAssertionError, so pytest reports them natively.assert_artifact,assert_log, andassert_cardare rewritten to use plainassertinternally rather than manually raising custom exceptions.artifact(step, name)is added to bothCliCheckandMetadataCheck, returning{task_id: value}so tests can make direct assertions such asassert checker.artifact(step, "data") == {"task1": "abc"}.test/core/pytest.iniis added to centralize pytest configuration, includingnorecursedirs,timeout = 1800,addopts = -v --tb=short, and the seven backend markers. Tox command lines now only need to pass the marker flag and parallelism settings.R5: tox is now the orchestration layer
Core test environments can now be run directly with tox, without any custom orchestration layer:
GCS emulator support
This PR also adds first-class support for running against a local GCS emulator.
metaflow/plugins/gcp/gs_storage_client_factory.pynow creates an anonymousstorage.Client()whenSTORAGE_EMULATOR_HOSTis set, instead of callinggoogle.auth.default(). This allows flows to run againstfake-gcs-serverwithout real GCP credentials.devtools/now includes fake-gcs-server as a first-class service, with Kubernetes deployment and service definitions, bucket-init job, secret, a dedicated Tilt file, and integration into the main Tiltfile andpick_services.sh.SERVICES_OVERRIDE=fake-gcs-servermake up.core-gcsandcore-azurenow setMETAFLOW_DEFAULT_DATASTORE=gs/azurealong with the corresponding sysroot and endpoint variables, so flows actually exercise cloud storage code paths against local emulators. This matches the existing core-batch pattern with MinIO.Test Plan
Runtime:
Commands to run:
# paste exact commandsWhere evidence shows up:
Before (error / log snippet)
After (evidence that fix works)
Root Cause
Why This Fix Is Correct
Failure Modes Considered
Tests
Non-Goals
AI Tool Usage