Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Db up downgrade snapshot #703

Open
wants to merge 23 commits into
base: dev
Choose a base branch
from
Open

Db up downgrade snapshot #703

wants to merge 23 commits into from

Conversation

JustKuzya
Copy link
Collaborator

The code changes fixes #474 and has up-/down-grade logic for alembic up-/down-grades

@keithmanville keithmanville changed the base branch from main to dev January 13, 2025 16:03
JustKuzya and others added 16 commits January 16, 2025 11:16
this commit performs code cleanup and fixes #474
This commit introduces a Python client for interacting with the Dioptra v1 REST API. The client
provides a programmatic interface to all of Dioptra's resource collections:

- /artifacts
- /auth
- /entrypoints
- /experiments
- /groups
- /jobs
- /models
- /pluginParameterTypes
- /plugins
- /queues
- /tags
- /users
- /workflows

The Dioptra worker has been updated to use the new client to download job files. In addition, the
client REST API has been fully integrated into the REST API integration tests, providing thorough
testing and coverage of the new v1 client code.

Co-authored-by: jtsextonMITRE <[email protected]>
This commit fixes an issue preventing the modification of plugin files. The root cause is that
additional fields from the plugin file GET response were being included in PUT requests, causing
a validation error which results in a 400 BAD REQUEST error from the REST API. The fix is to
remove extraneous fields from the request constructed by the frontend.
This commit adds functionality for managing metrics via the REST API. The metrics are stored in
MLFlow, so this feature uses the MLFlow client in the service layer. Providing metrics management
via the Dioptra REST API is part of a larger effort to remove the need for the user to interact
with MLFlow directly.

It adds the following endpoints for managing metrics:
- GET `/experiments/{id}/metrics` - Gets all of the latest metrics for every job in the experiment
- GET `/jobs/{id}/metrics` - Gets a Job resource's latest metrics
- POST `jobs/{id}/metrics` - Sets a metric for a Job
- GET `/jobs/{id}/metrics/{name}/snapshots` - Gets a Job metric history

This commit also updates the Python client for the new functionality.
This commit makes the following changes to the Job creation form in the frontend:

- Jobs can now be created through the all-jobs page and the experiment -> job page.
- Entrypoint dropdown will be greyed out until experiment is selected, and will only show
  entrypoints that are linked to the selected experiment
- Queue dropdown will be greyed out until entrypoint is selected, and will only show queues that
  are linked to the selected entrypoint
- A message with link that opens a dialog box to update experiment/entrypoint associations is added
  under the Entrypoint dropdown
- A message with link that opens a dialog box to update entrypoint/queue associations is added under
  the Queue dropdown
- Returning to an unsaved form should give you the option to load previous values
This commit adds the snapshot_created_on datetime field to all Resource response schemas. It also
updates the `build_` helper functions to include the field in responses and updates the test suite
to validate the updated response schemas.
This commit adds a new UI element that allows the user to browse the history of a queue resource.
In the edit queue dialog, click the view history toggle to browse snapshots.  The Latest snapshot
will be selected by default. Click or tab over to the list, and then you can use up/down arrow
keys. Submit button will be disabled when viewing history.

feat: simplify snapshot list

fix: hide history toggle for drafts

fix: form edits should be cleared when toggling on history

fix: widen width of history column

fix: use snapshotCreatedOn for timestamp

fix: getSnapshots should get all results, not just first page
The current mypy test complains that Schema.from_dict only accepts the dict[str, Field] type, not
dict[str, Field | type]. Field objects are the only value seen in the dictionaries that are being
passed to Schema.from_dict, so the annotations have been updated accordingly.
This update adds file uploading support to the Dioptra client library through a new DioptraFile
class, supporting utility functions, and extending the client's post method to accept `data` and
`files` arguments. The changes enable users to upload single files, multiple files, or all files
within a directory while enforcing strict path validation in the associated filenames to prevent
server-side directory traversal attacks.

The new functionality includes:

- A DioptraFile class that validates filenames to ensure they are normalized, POSIX-compliant
  relative paths without traversal components (e.g., "..").
- File selection utility functions:
  - select_files_in_directory: Recursively selects files from a directory with optional regex
    pattern filtering
  - select_one_or_more_files: Selects individual files with optional renaming support to handle
    filename collisions
- Extended POST request in the client to handle multipart form data that contains regular form
  fields and file uploads. All file uploads, whether they're single file or multiple files, are
  streamed to the server.

The DioptraSession implementation for the requests-based client and the Flask test client have both
been updated to support the new functionality.
This change introduces a monkey-patch for the flask-restx `Swagger.serialize_operation` method to
force Swagger to use "multipart/form-data" content type for multi-file uploads instead of
"application/x-www-form-urlencoded", as required by our application. The patch applies the proposed
changes from PR #542 in the flask-restx repository
(python-restx/flask-restx#542), which is unlikely to be accepted into the
main branch.

The monkey-patch includes the following safety measures:

- A hash verification system that checks the source code of
  `flask_restx.Swagger.serialize_operation` before applying the patch. This prevents potential
  issues if the underlying library code changes.
- Early exit logic that detects if the patch has already been applied, preventing double-patching
  scenarios that were causing issues during pytest execution (where imports remain in memory but
  fixtures are re-run).
- Logging to document whether the patch is successfully applied or skipped because it was already
  present.

The patch is applied during application initialization through a new `monkey_patch_flask_restx()`
function, which is called in the `create_app()` setup process. While this is not an ideal solution,
the safeguards should sufficiently minimize risk in the absence of upstream support for this
behavior.
This update adds a new Marshmallow Schema Field type to handle scenarios where multiple files need
to be uploaded through the REST API. The changes include:

- A new MultiFileUpload field class in custom_schema_fields.py
- An update to the parameter schema generation logic that sets the "action" parameter to "append"
  when the field type is MultiFileUpload.
- A new entry for the MultiFileUpload type in the TYPE_MAP_MA_TO_REQPARSE dictionary, mapping it to
  werkzeug's FileStorage type.
@jkglasbrenner
Copy link
Collaborator

The migration script is mixing the ORM and Core styles of SQLAlchemy in a way that I don't think you need to do here, and its making it harder to understand what is happening. Also, in downgrade(), you shouldn't need that counting operation because the resource_type field is a primary key already. The way the migration scripts are applied (e.g. you cannot apply the same upgrade or downgrade twice in a row) should also prevent a duplicate row from being inserted even if the database engine ignores the uniqueness constraint.

@jkglasbrenner
Copy link
Collaborator

Here's how you can write this migration script using just the ORM. Note that you need to create both Upgrade and Downgrade versions of your ORM classes in order for things to work correctly (based on the comments in your code, I suspect that you ran into this and that's why you started using operations from SQLAlchemy Core). Both the upgrade() and the downgrade() migrations query and check for the record where resource_type = 'resource_snapshot', and use that as a guard against deleting or inserting the record when its unexpectedly missing/present.

"""Up/Down-grade resource snapshot types as delete/insert 'resource_snapshot' [GH-Issue #474]

Revision ID: 0ca6eca33569
Revises: 6a75ede23821
Create Date: 2024-12-12 00:47:23.575103

"""

from typing import Annotated

import sqlalchemy as sa
from alembic import op
from sqlalchemy.orm import (
    DeclarativeBase,
    Mapped,
    MappedAsDataclass,
    mapped_column,
    sessionmaker,
)

# revision identifiers, used by Alembic.
revision = "0ca6eca33569"
down_revision = "6a75ede23821"
branch_labels = None
depends_on = None

text_ = Annotated[str, mapped_column(sa.Text())]


class UpgradeBase(DeclarativeBase, MappedAsDataclass):
    pass


class DowngradeBase(DeclarativeBase, MappedAsDataclass):
    pass


class ResourceTypeUpgrade(UpgradeBase):
    __tablename__ = "resource_types"
    resource_type: Mapped[text_] = mapped_column(primary_key=True)


class ResourceTypeDowngrade(DowngradeBase):
    __tablename__ = "resource_types"
    resource_type: Mapped[text_] = mapped_column(primary_key=True)


def upgrade():
    """Alembic Upgrade Hook-Point that deletes the entry from
    resource_types table with value 'resource_snapshot'
    """
    bind = op.get_bind()
    Session = sessionmaker(bind=bind)

    with Session() as session:
        stmt = sa.select(ResourceTypeUpgrade).where(
            ResourceTypeUpgrade.resource_type == "resource_snapshot"
        )
        resource_snapshot_type = session.scalar(stmt)

        if resource_snapshot_type is not None:
            session.delete(resource_snapshot_type)

        session.commit()


def downgrade():
    """Alembic Downgrade Hook-Point that reinstates the entry to
    resource_types table with value 'resource_snapshot'
    """
    bind = op.get_bind()
    Session = sessionmaker(bind=bind)

    with Session() as session:
        stmt = sa.select(ResourceTypeUpgrade).where(
            ResourceTypeUpgrade.resource_type == "resource_snapshot"
        )
        resource_snapshot_type = session.scalar(stmt)

        if resource_snapshot_type is None:
            session.add(ResourceTypeUpgrade(resource_type="resource_snapshot"))

        session.commit()

Copy link
Collaborator

@jkglasbrenner jkglasbrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove the "resource_snapshot" type from the resource_snapshot_types Enum table
6 participants