Db up downgrade snapshot #703

JustKuzya · 2024-12-25T06:54:33Z

The code changes fixes #474 and has up-/down-grade logic for alembic up-/down-grades

this commit performs code cleanup and fixes #474

This commit introduces a Python client for interacting with the Dioptra v1 REST API. The client provides a programmatic interface to all of Dioptra's resource collections: - /artifacts - /auth - /entrypoints - /experiments - /groups - /jobs - /models - /pluginParameterTypes - /plugins - /queues - /tags - /users - /workflows The Dioptra worker has been updated to use the new client to download job files. In addition, the client REST API has been fully integrated into the REST API integration tests, providing thorough testing and coverage of the new v1 client code. Co-authored-by: jtsextonMITRE <[email protected]>

Co-authored-by: Andrew Hand <[email protected]>

This commit fixes an issue preventing the modification of plugin files. The root cause is that additional fields from the plugin file GET response were being included in PUT requests, causing a validation error which results in a 400 BAD REQUEST error from the REST API. The fix is to remove extraneous fields from the request constructed by the frontend.

This commit adds functionality for managing metrics via the REST API. The metrics are stored in MLFlow, so this feature uses the MLFlow client in the service layer. Providing metrics management via the Dioptra REST API is part of a larger effort to remove the need for the user to interact with MLFlow directly. It adds the following endpoints for managing metrics: - GET `/experiments/{id}/metrics` - Gets all of the latest metrics for every job in the experiment - GET `/jobs/{id}/metrics` - Gets a Job resource's latest metrics - POST `jobs/{id}/metrics` - Sets a metric for a Job - GET `/jobs/{id}/metrics/{name}/snapshots` - Gets a Job metric history This commit also updates the Python client for the new functionality.

This commit makes the following changes to the Job creation form in the frontend: - Jobs can now be created through the all-jobs page and the experiment -> job page. - Entrypoint dropdown will be greyed out until experiment is selected, and will only show entrypoints that are linked to the selected experiment - Queue dropdown will be greyed out until entrypoint is selected, and will only show queues that are linked to the selected entrypoint - A message with link that opens a dialog box to update experiment/entrypoint associations is added under the Entrypoint dropdown - A message with link that opens a dialog box to update entrypoint/queue associations is added under the Queue dropdown - Returning to an unsaved form should give you the option to load previous values

This commit adds the snapshot_created_on datetime field to all Resource response schemas. It also updates the `build_` helper functions to include the field in responses and updates the test suite to validate the updated response schemas.

This commit adds a new UI element that allows the user to browse the history of a queue resource. In the edit queue dialog, click the view history toggle to browse snapshots. The Latest snapshot will be selected by default. Click or tab over to the list, and then you can use up/down arrow keys. Submit button will be disabled when viewing history. feat: simplify snapshot list fix: hide history toggle for drafts fix: form edits should be cleared when toggling on history fix: widen width of history column fix: use snapshotCreatedOn for timestamp fix: getSnapshots should get all results, not just first page

The current mypy test complains that Schema.from_dict only accepts the dict[str, Field] type, not dict[str, Field | type]. Field objects are the only value seen in the dictionaries that are being passed to Schema.from_dict, so the annotations have been updated accordingly.

This update adds file uploading support to the Dioptra client library through a new DioptraFile class, supporting utility functions, and extending the client's post method to accept `data` and `files` arguments. The changes enable users to upload single files, multiple files, or all files within a directory while enforcing strict path validation in the associated filenames to prevent server-side directory traversal attacks. The new functionality includes: - A DioptraFile class that validates filenames to ensure they are normalized, POSIX-compliant relative paths without traversal components (e.g., ".."). - File selection utility functions: - select_files_in_directory: Recursively selects files from a directory with optional regex pattern filtering - select_one_or_more_files: Selects individual files with optional renaming support to handle filename collisions - Extended POST request in the client to handle multipart form data that contains regular form fields and file uploads. All file uploads, whether they're single file or multiple files, are streamed to the server. The DioptraSession implementation for the requests-based client and the Flask test client have both been updated to support the new functionality.

This change introduces a monkey-patch for the flask-restx `Swagger.serialize_operation` method to force Swagger to use "multipart/form-data" content type for multi-file uploads instead of "application/x-www-form-urlencoded", as required by our application. The patch applies the proposed changes from PR #542 in the flask-restx repository (python-restx/flask-restx#542), which is unlikely to be accepted into the main branch. The monkey-patch includes the following safety measures: - A hash verification system that checks the source code of `flask_restx.Swagger.serialize_operation` before applying the patch. This prevents potential issues if the underlying library code changes. - Early exit logic that detects if the patch has already been applied, preventing double-patching scenarios that were causing issues during pytest execution (where imports remain in memory but fixtures are re-run). - Logging to document whether the patch is successfully applied or skipped because it was already present. The patch is applied during application initialization through a new `monkey_patch_flask_restx()` function, which is called in the `create_app()` setup process. While this is not an ideal solution, the safeguards should sufficiently minimize risk in the absence of upstream support for this behavior.

This update adds a new Marshmallow Schema Field type to handle scenarios where multiple files need to be uploaded through the REST API. The changes include: - A new MultiFileUpload field class in custom_schema_fields.py - An update to the parameter schema generation logic that sets the "action" parameter to "append" when the field type is MultiFileUpload. - A new entry for the MultiFileUpload type in the TYPE_MAP_MA_TO_REQPARSE dictionary, mapping it to werkzeug's FileStorage type.

jkglasbrenner · 2025-01-21T23:39:37Z

The migration script is mixing the ORM and Core styles of SQLAlchemy in a way that I don't think you need to do here, and its making it harder to understand what is happening. Also, in downgrade(), you shouldn't need that counting operation because the resource_type field is a primary key already. The way the migration scripts are applied (e.g. you cannot apply the same upgrade or downgrade twice in a row) should also prevent a duplicate row from being inserted even if the database engine ignores the uniqueness constraint.

jkglasbrenner · 2025-01-21T23:45:23Z

Here's how you can write this migration script using just the ORM. Note that you need to create both Upgrade and Downgrade versions of your ORM classes in order for things to work correctly (based on the comments in your code, I suspect that you ran into this and that's why you started using operations from SQLAlchemy Core). Both the upgrade() and the downgrade() migrations query and check for the record where resource_type = 'resource_snapshot', and use that as a guard against deleting or inserting the record when its unexpectedly missing/present.

"""Up/Down-grade resource snapshot types as delete/insert 'resource_snapshot' [GH-Issue #474]

Revision ID: 0ca6eca33569
Revises: 6a75ede23821
Create Date: 2024-12-12 00:47:23.575103

"""

from typing import Annotated

import sqlalchemy as sa
from alembic import op
from sqlalchemy.orm import (
    DeclarativeBase,
    Mapped,
    MappedAsDataclass,
    mapped_column,
    sessionmaker,
)

# revision identifiers, used by Alembic.
revision = "0ca6eca33569"
down_revision = "6a75ede23821"
branch_labels = None
depends_on = None

text_ = Annotated[str, mapped_column(sa.Text())]


class UpgradeBase(DeclarativeBase, MappedAsDataclass):
    pass


class DowngradeBase(DeclarativeBase, MappedAsDataclass):
    pass


class ResourceTypeUpgrade(UpgradeBase):
    __tablename__ = "resource_types"
    resource_type: Mapped[text_] = mapped_column(primary_key=True)


class ResourceTypeDowngrade(DowngradeBase):
    __tablename__ = "resource_types"
    resource_type: Mapped[text_] = mapped_column(primary_key=True)


def upgrade():
    """Alembic Upgrade Hook-Point that deletes the entry from
    resource_types table with value 'resource_snapshot'
    """
    bind = op.get_bind()
    Session = sessionmaker(bind=bind)

    with Session() as session:
        stmt = sa.select(ResourceTypeUpgrade).where(
            ResourceTypeUpgrade.resource_type == "resource_snapshot"
        )
        resource_snapshot_type = session.scalar(stmt)

        if resource_snapshot_type is not None:
            session.delete(resource_snapshot_type)

        session.commit()


def downgrade():
    """Alembic Downgrade Hook-Point that reinstates the entry to
    resource_types table with value 'resource_snapshot'
    """
    bind = op.get_bind()
    Session = sessionmaker(bind=bind)

    with Session() as session:
        stmt = sa.select(ResourceTypeUpgrade).where(
            ResourceTypeUpgrade.resource_type == "resource_snapshot"
        )
        resource_snapshot_type = session.scalar(stmt)

        if resource_snapshot_type is None:
            session.add(ResourceTypeUpgrade(resource_type="resource_snapshot"))

        session.commit()

jkglasbrenner

See #703 (comment).

feat: remove/add resource_snapshot from enum table

70daae4

keithmanville changed the base branch from main to dev January 13, 2025 16:03

keithmanville requested a review from jkglasbrenner January 13, 2025 16:06

keithmanville assigned jkglasbrenner Jan 13, 2025

JustKuzya and others added 16 commits January 16, 2025 11:16

feat: remove/add resource_snapshot from enum table

d4898cc

this commit performs code cleanup and fixes #474

fix: update worker health check

523e7bc

test: add unit tests for experiment and entrypoints endpoints

3104777

Co-authored-by: Andrew Hand <[email protected]>

chore: fix mypy errors in task-plugins and task_engine

19b7732

docs: fix broken links

d9e890c

build: add requests-toolbelt as dependency and regenerate lockfiles

3e6eaaf

jkglasbrenner requested changes Jan 21, 2025

View reviewed changes

JustKuzya added 6 commits January 22, 2025 11:41

feat: remove/add resource_snapshot from enum table + rebase

f343b03

fix: modify up/downgrade code to use only ORM alembic functionality

60fe549

fix: clean-up the comments

2817a70

Merge branch 'dev' into db-up-downgrade-snapshot

70b896c

fix: "de-tox-ed" the quote marks and white-spaces

cb89816

fix: fixing space complaints from tox

524e7d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Db up downgrade snapshot #703

Db up downgrade snapshot #703

JustKuzya commented Dec 25, 2024

jkglasbrenner commented Jan 21, 2025

jkglasbrenner commented Jan 21, 2025

jkglasbrenner left a comment

Db up downgrade snapshot #703

Are you sure you want to change the base?

Db up downgrade snapshot #703

Conversation

JustKuzya commented Dec 25, 2024

jkglasbrenner commented Jan 21, 2025

jkglasbrenner commented Jan 21, 2025

jkglasbrenner left a comment

Choose a reason for hiding this comment