Skip to content

Commit

Permalink
Visualise pipeline objects in notebook (kedro-org#2241)
Browse files Browse the repository at this point in the history
* initial draft

Signed-off-by: ravi_kumar_pilla <[email protected]>

* adding window config for jupyter users

Signed-off-by: ravi_kumar_pilla <[email protected]>

* working draft

Signed-off-by: ravi_kumar_pilla <[email protected]>

* working final draft

Signed-off-by: ravi_kumar_pilla <[email protected]>

* working final draft

Signed-off-by: ravi_kumar_pilla <[email protected]>

* clean window pollution

Signed-off-by: ravi_kumar_pilla <[email protected]>

* working draft with 2 approaches

Signed-off-by: ravi_kumar_pilla <[email protected]>

* initial bundle draft

Signed-off-by: ravi_kumar_pilla <[email protected]>

* update webpack

Signed-off-by: ravi_kumar_pilla <[email protected]>

* testing webpack

Signed-off-by: ravi_kumar_pilla <[email protected]>

* ignore babel for umd

Signed-off-by: ravi_kumar_pilla <[email protected]>

* testing with published bundle

Signed-off-by: ravi_kumar_pilla <[email protected]>

* tested bundle

Signed-off-by: ravi_kumar_pilla <[email protected]>

* optimization code added

Signed-off-by: ravi_kumar_pilla <[email protected]>

* add optimization to prod bundle

Signed-off-by: ravi_kumar_pilla <[email protected]>

* add umd to repo

Signed-off-by: ravi_kumar_pilla <[email protected]>

* v10.3.0

Signed-off-by: ravi_kumar_pilla <[email protected]>

* push umd bundle

Signed-off-by: ravi_kumar_pilla <[email protected]>

* remove additional commits

Signed-off-by: ravi_kumar_pilla <[email protected]>

* remove additional commits

Signed-off-by: ravi_kumar_pilla <[email protected]>

* add release note

Signed-off-by: ravi_kumar_pilla <[email protected]>

* testing esm module

Signed-off-by: ravi_kumar_pilla <[email protected]>

* add esm ref

Signed-off-by: ravi_kumar_pilla <[email protected]>

* add esm

Signed-off-by: ravi_kumar_pilla <[email protected]>

* test with esm

Signed-off-by: ravi_kumar_pilla <[email protected]>

* add esm draft

Signed-off-by: ravi_kumar_pilla <[email protected]>

* clean bundle config

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix lint and format checks

Signed-off-by: ravi_kumar_pilla <[email protected]>

* temp remove gql checks

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix lint

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix lint

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix tests

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix doc test

Signed-off-by: ravi_kumar_pilla <[email protected]>

* add granularity to notebook visualizer

Signed-off-by: ravi_kumar_pilla <[email protected]>

* structured notebook visualizer

Signed-off-by: ravi_kumar_pilla <[email protected]>

* updated js link

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix lint

Signed-off-by: ravi_kumar_pilla <[email protected]>

* restore global navigation

Signed-off-by: ravi_kumar_pilla <[email protected]>

* add default globalNavigation

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix cache deprecation

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix based on comments

Signed-off-by: ravi_kumar_pilla <[email protected]>

* address PR comments

Signed-off-by: ravi_kumar_pilla <[email protected]>

* remove unused import

Signed-off-by: ravi_kumar_pilla <[email protected]>

* remove test notebook

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix lint

Signed-off-by: ravi_kumar_pilla <[email protected]>

* address PR comments2

Signed-off-by: ravi_kumar_pilla <[email protected]>

* change generate_html

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix broken doc links

Signed-off-by: ravi_kumar_pilla <[email protected]>

* fix tests

Signed-off-by: ravi_kumar_pilla <[email protected]>

* update release

Signed-off-by: ravi_kumar_pilla <[email protected]>

---------

Signed-off-by: ravi_kumar_pilla <[email protected]>
  • Loading branch information
ravi-kumar-pilla authored Feb 25, 2025
1 parent 0abdc4e commit 1aae810
Show file tree
Hide file tree
Showing 14 changed files with 541 additions and 29 deletions.
2 changes: 1 addition & 1 deletion .github/actions/install_node_dependencies/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ runs:
shell: bash

- name: Cache Node.js packages
uses: actions/cache@v2
uses: actions/cache@v4
with:
path: "${{ steps.npm-cache-dir.outputs.dir }}"
key: "${{ runner.os }}-node-${{ hashFiles(format('{0}/package-lock.json', inputs.package-path)) }}"
Expand Down
7 changes: 0 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,6 @@ lint-check:
mypy --config-file=package/mypy.ini package/kedro_viz package/features
mypy --disable-error-code abstract --config-file=package/mypy.ini package/tests

schema-fix:
strawberry export-schema --app-dir=package kedro_viz.api.graphql.schema > src/apollo/schema.graphql
graphqlviz src/apollo/schema.graphql | dot -Tpng -o .github/img/schema.graphql.png

schema-check:
strawberry export-schema --app-dir=package kedro_viz.api.graphql.schema | diff src/apollo/schema.graphql -

secret-scan:
trufflehog --max_depth 1 --exclude_path trufflehog-ignore.txt .

Expand Down
2 changes: 2 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Please follow the established format:
## Major features and improvements
- Remove experiment tracking. (#2237)

- Visualise pipeline objects in notebook. (#2241)

## Bug fixes and other changes

- Add ESM bundle for Kedro-Viz. (#2268)
Expand Down
18 changes: 9 additions & 9 deletions docs/source/migrate_experiment_tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,14 @@ Update the dataset configurations in your `catalog.yml` to transition to `kedro-

| Kedro-Viz Dataset Type | MLflow Dataset Type | Update Instructions |
|---------------------------------|----------------------------|---------------------------------------------------------|
| `tracking.MetricsDataset` | `MlflowMetricDataset` | Update type to [`MlflowMetricDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/03_experiment_tracking/01_experiment_tracking/05_version_metrics.html#saving-a-single-float-as-a-metric-with-mlflowmetricdataset) |
| `tracking.JSONDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `json.JSONDataset`. |
| `plotly.plotlyDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `plotly.HTMLDataset`. |
| `plotly.JSONDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `plotly.HTMLDataset`. |
| `matplotlib.MatplotlibWriter` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset). |
| `tracking.MetricsDataset` | `MlflowMetricDataset` | Update type to [`MlflowMetricDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowmetricdataset). |
| `tracking.JSONDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `json.JSONDataset`. |
| `plotly.plotlyDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `plotly.HTMLDataset`. |
| `plotly.JSONDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `plotly.HTMLDataset`. |
| `matplotlib.MatplotlibWriter` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset). |

### Metrics dataset
For `tracking.MetricsDataset`, update its type to [`MlflowMetricDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/03_experiment_tracking/01_experiment_tracking/05_version_metrics.html#saving-a-single-float-as-a-metric-with-mlflowmetricdataset):
For `tracking.MetricsDataset`, update its type to [`MlflowMetricDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowmetricdataset):

Before:
```yaml
Expand All @@ -65,7 +65,7 @@ metrics:
```
### JSON dataset
For `tracking.JSONDataset`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) and configure it as `json.JSONDataset`:
For `tracking.JSONDataset`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) and configure it as `json.JSONDataset`:

Before:
```yaml
Expand All @@ -85,7 +85,7 @@ companies_columns:
```

### Plotly dataset
For `plotly.plotlyDataset` and `plotly.JSONDataset`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) and configure it as `plotly.HTMLDataset` to render interactive plots in the MLflow UI:
For `plotly.plotlyDataset` and `plotly.JSONDataset`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) and configure it as `plotly.HTMLDataset` to render interactive plots in the MLflow UI:

Before:
```yaml
Expand All @@ -104,7 +104,7 @@ plotly_json_data:
```

### Matplotlib writer
For `matplotlib.MatplotlibWriter`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/0.14.3/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset):
For `matplotlib.MatplotlibWriter`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset):

Before:
```yaml
Expand Down
9 changes: 8 additions & 1 deletion package/kedro_viz/data_access/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,10 @@ class DataAccessManager:
"""Centralised interface for the rest of the application to interact with data repositories."""

def __init__(self):
self._initialize_fields()

def _initialize_fields(self):
"""Initialize or reset all instance variables."""
self.catalog = CatalogRepository()
self.nodes = GraphNodesRepository()
self.registered_pipelines = RegisteredPipelinesRepository()
Expand All @@ -67,12 +71,15 @@ def __init__(self):
)
self.dataset_stats = {}

def reset_fields(self):
"""Reset all instance variables."""
self._initialize_fields()

def add_catalog(self, catalog: DataCatalog):
"""Add the catalog to the CatalogRepository
Args:
catalog: The DataCatalog instance to add.
pipelines: A dictionary which holds project pipelines
"""
self.catalog.set_catalog(catalog)

Expand Down
4 changes: 4 additions & 0 deletions package/kedro_viz/integrations/notebook/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""`kedro_viz.integrations.notebook` provides interface to integrate Kedro-Viz with Notebook."""

# alias to ease Notebook visualization import
from .visualizer import NotebookVisualizer
52 changes: 52 additions & 0 deletions package/kedro_viz/integrations/notebook/data_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"""`kedro_viz.integrations.notebook.data_loader` provides interface to
load data from a notebook. It takes care of making sure viz can
load data from pipelines created in a range of Kedro versions.
"""

from typing import Dict, Optional, Tuple, Union, cast

from kedro.io import DataCatalog
from kedro.pipeline import Pipeline

from kedro_viz.data_access import data_access_manager
from kedro_viz.server import populate_data


def load_data_for_notebook_users(
notebook_pipeline: Union[Pipeline, Dict[str, Pipeline]],
notebook_catalog: Optional[DataCatalog],
) -> Tuple[DataCatalog, Dict[str, Pipeline], Dict]:
"""Load data from a notebook user's pipeline"""
# Create a dummy data catalog with all datasets as memory datasets
catalog = DataCatalog() if notebook_catalog is None else notebook_catalog
stats_dict: Dict = {}

notebook_user_pipeline = notebook_pipeline

# create a default pipeline if a dictionary of pipelines are sent
if isinstance(notebook_user_pipeline, dict):
notebook_user_pipeline = {
"__default__": notebook_user_pipeline["__default__"]
if "__default__" in notebook_user_pipeline
else cast(Pipeline, sum(notebook_user_pipeline.values()))
}
else:
notebook_user_pipeline = {"__default__": notebook_user_pipeline}

return catalog, notebook_user_pipeline, stats_dict


def load_and_populate_data_for_notebook_users(
notebook_pipeline: Union[Pipeline, Dict[str, Pipeline]],
notebook_catalog: Optional[DataCatalog],
):
"""Loads pipeline data and populates Kedro Viz Repositories for a notebook user"""
catalog, pipelines, stats_dict = load_data_for_notebook_users(
notebook_pipeline, notebook_catalog
)

# make each cell independent
data_access_manager.reset_fields()

# Creates data repositories which are used by Kedro Viz Backend APIs
populate_data(data_access_manager, catalog, pipelines, stats_dict)
177 changes: 177 additions & 0 deletions package/kedro_viz/integrations/notebook/visualizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
import json
import logging
import uuid
from contextlib import contextmanager
from typing import Any, Dict, Optional, Union

from IPython.display import HTML, display
from kedro.io.data_catalog import DataCatalog
from kedro.pipeline import Pipeline

from kedro_viz.api.rest.responses.pipelines import get_kedro_project_json_data
from kedro_viz.integrations.notebook.data_loader import (
load_and_populate_data_for_notebook_users,
)
from kedro_viz.utils import Spinner, merge_dicts

DEFAULT_VIZ_OPTIONS = {
"display": {
"expandPipelinesBtn": False,
"globalNavigation": False,
"exportBtn": False,
"labelBtn": False,
"layerBtn": False,
"metadataPanel": False,
"miniMap": False,
"sidebar": False,
"zoomToolbar": False,
},
"expandAllPipelines": False,
"behaviour": {
"reFocus": False,
},
"theme": "dark",
"width": "100%",
"height": "600px",
}

DEFAULT_JS_URL = (
"https://cdn.jsdelivr.net/gh/kedro-org/kedro-viz@main/esm/kedro-viz.production.mjs"
)


class NotebookVisualizer:
"""Represent a Kedro-Viz visualization instance in a notebook"""

def __init__(
self,
pipeline: Union[Pipeline, Dict[str, Pipeline]],
catalog: Optional[DataCatalog] = None,
options: Optional[Dict[str, Any]] = None,
js_url: Optional[str] = None,
):
"""
Initialize NotebookVisualizer.
Args:
pipeline: Kedro pipeline(s) to visualize.
catalog: Kedro data catalog.
options: Visualization options.
(Ref: https://github.com/kedro-org/kedro-viz/blob/main/README.npm.md#configure-kedro-viz-with-options)
js_url: Optional URL for the Kedro-Viz JS bundle.
Returns:
A new ``NotebookVisualizer`` instance.
"""
self.pipeline = pipeline
self.catalog = catalog
self.options = (
DEFAULT_VIZ_OPTIONS
if options is None
else merge_dicts(DEFAULT_VIZ_OPTIONS, options)
)
# Force `globalNavigation` to always be False as it
# breaks visualizer due to security concerns
self.options.setdefault("display", {})["globalNavigation"] = False # type: ignore

self.js_url = js_url or DEFAULT_JS_URL

def _load_viz_data(self) -> Optional[Any]:
"""Load pipeline and catalog data for visualization."""
load_and_populate_data_for_notebook_users(self.pipeline, self.catalog)
return get_kedro_project_json_data()

def generate_html(self) -> str:
"""Generate HTML markup for Kedro-Viz as a string."""
unique_id = uuid.uuid4().hex[:8] # To isolate container for each cell execution
json_data_str = json.dumps(self._load_viz_data())
options_str = json.dumps(self.options)

html_content = (
r"""<!DOCTYPE html>
<html lang='en'>
<head>
<meta charset='UTF-8'>
<meta name='viewport' content='width=device-width, initial-scale=1.0'>
<title>Kedro-Viz</title>
</head>
<body>
<div id=kedro-viz-"""
+ unique_id
+ """ style='height: 600px'></div>
<script type="module">
import { KedroViz, React, createRoot } from '"""
+ self.js_url
+ """';
const viz_container = document.getElementById('kedro-viz-"""
+ unique_id
+ """');
if (createRoot && viz_container) {
const viz_root = createRoot(viz_container);
viz_root.render(
React.createElement(KedroViz, {
data: """
+ json_data_str
+ """,
options: """
+ options_str
+ """
})
);
}
</script>
</body>
</html>"""
)

return html_content

@staticmethod
def _wrap_in_iframe(
html_content: str,
width: str = str(DEFAULT_VIZ_OPTIONS.get("width", "")),
height: str = str(DEFAULT_VIZ_OPTIONS.get("height", "")),
) -> str:
"""Wrap the HTML content in an iframe.
Args:
html_content: The HTML markup template as a string for visualization
width: iframe width
height: iframe height
Returns:
A string containing html markup embedded in an iframe
"""
sanitized_content = html_content.replace('"', "&quot;")
return f"""<iframe srcdoc="{sanitized_content}" style="width:{width}; height:{height}; border:none;" sandbox="allow-scripts"></iframe>"""

@staticmethod
@contextmanager
def _suppress_logs():
logger = logging.getLogger()
previous_level = logger.level
logger.setLevel(logging.CRITICAL) # Suppress logs
try:
yield
finally:
logger.setLevel(previous_level) # Restore the original level

def show(self) -> None:
"""Display Kedro-Viz in a notebook."""
with self._suppress_logs():
try:
spinner = Spinner("Starting Kedro-Viz...")
spinner.start()

html_content = self.generate_html()
iframe_content = self._wrap_in_iframe(
html_content,
str(self.options.get("width", "100%")),
str(self.options.get("height", "600px")),
)
spinner.stop()
display(HTML(iframe_content))
except Exception as exc: # noqa: BLE001
spinner.stop()
display(HTML(f"<strong>Error: {str(exc)}</strong>"))
Loading

0 comments on commit 1aae810

Please sign in to comment.