Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: modifying code generation to reduce bundle size #4978

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

gvwilson
Copy link
Contributor

@gvwilson gvwilson commented Jan 22, 2025

feat: modify code generation to reduce bundle size

branch format bytes %age
master .whl 14803768
codegen2 .whl 12215667 -18%
master .tar.gz 8100014
codegen2 .tar.gz 6114458 -24%
  1. Assign an empty string to the data_docs field of generated
    validators. (This has a major impact because those docs are
    duplicated many times.)

  2. Alias name of base validator during import in
    codegen/validators.py.

  3. Remove the long list of CSS colors from help strings for color
    properties.

  4. Replace super(Parent, self) with super() in generated code.

  5. Modify commands.py to run code generation.

  6. Remove comments from generated code.

  7. Replaced named arguments in constructors with positional arguments.

  8. Drop use of sys.version_info and TYPE_CHECKING. Removed the check
    for Python < 3.7 using sys.version_info and as a backup checking
    typing.TYPE_CHECKING; this saves a little space and also cleans
    up the code.

  9. Introduce _init_provided() for BaseFigure and BasePlotlyType
    that calls a helper function _initialize_provided() to replace
    repetitions of:

_v = arg.pop("something", None)
_v = something if something is not None else _v
if _v is not None:
    self["something"] = _v

Used in #5008.

@gvwilson gvwilson added P2 considered for next cycle feature something new labels Jan 22, 2025
@gvwilson gvwilson self-assigned this Jan 22, 2025
@gvwilson gvwilson force-pushed the codegen2 branch 2 times, most recently from 0c7ea5e to 2b3d47b Compare January 23, 2025 17:56
gvwilson added a commit that referenced this pull request Jan 24, 2025
Removing the check for Python < 3.7 using `sys.version_info` and as a
backup checking `typing.TYPE_CHECKING`; this saves us a little space
and also cleans up the code. Proposing this as an enhancement beyond
what's in the `codegen2` branch / PR #4978.
@gvwilson gvwilson force-pushed the codegen2 branch 2 times, most recently from 000a5e9 to dbbbaa8 Compare January 27, 2025 17:12
1.  Add `bin/get_size.py` so that `python bin/get_size.py plotly build`
    reports the number of files and total size in bytes of the `plotly`
    directory (where generated code is put) and the `build` directory
    that is populated by `python setup.py build`.

1.  Modify `codegen/__init__.py` and `./setup.py` so that
    `python setup.py --reformat=false` disables reformatting.

1.  Assign an empty string to the `data_docs` field of generated
    validators.  (This has a major impact because those docs are
    duplicated many times.)

1.  Alias name of base validator during import in
    `codegen/validators.py`.

1.  Remove the long list of CSS colors from help strings for color
    properties.

1.  Replace `super(Parent, self)` with `super()` in generated code.

1.  Drop use of sys.version_info and TYPE_CHECKING.  Removed the check
    for Python < 3.7 using `sys.version_info` and as a backup checking
    `typing.TYPE_CHECKING`; this saves a little space and also cleans
    up the code.

1.  Remove mention of Chart Studio and explicit enumeration of system
    font names from plotly.js / plot-schema.json so that this text
    isn't copied dozens of times into the plotly.py bundle.

1.  Introduce `_init_provided()` for `BaseFigure` and `BasePlotlyType`
    that calls a helper function `_initialize_provided()` to replace
    repetitions of:

```
_v = arg.pop("something", None)
_v = something if something is not None else _v
if _v is not None:
    self["something"] = _v
```

Original size of plotly/**/*.py: 42283582 bytes
Current size of plotly/**/*.py:  31931739 bytes
Change: -25%
1.  Modify `commands.py` to run code generation.
1.  Remove comments from generated code.
1.  Replaced named arguments in constructors with positional arguments.
1.  Regenerate all code.

Notes:

The generated code is reformatted once again: this slightly increases
size but makes it more readable.

There is one flaky test:

`tests/test_plotly_utils/validators/test_colorscale_validator.py::test_acceptance_named[Inferno_r]`

It fails when the entire test suite is run but does *not* fail when
only `test_colorscale_validator.py` is run (using Python 3.11.9).

| branch   | format  | bytes    | %age |
| -------- | ------- | -------- | ---- |
| master   | .whl    | 14803768 |      |
| codegen2 | .whl    | 12215667 | -18% |
| master   | .tar.gz |  8100014 |      |
| codegen2 | .tar.gz |  6114458 | -24% |
@@ -267,36 +267,24 @@ def perform_codegen():
root_datatype_imports.append(f"._deprecations.{dep_clas}")

optional_figure_widget_import = f"""
if sys.version_info < (3, 7) or TYPE_CHECKING:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcjohnson commented on this in a Slack thread saying "I would have thought we’d still want the TYPE_CHECKING case (just not the sys.version_info < (3, 7) case)?" Our code doesn't make mypy happy right now, and I suspect that fixing that will involve (a) adding type annotations (started in #5008) and (b) re-thinking and updating code like this. My thought was therefore to take this out for now and fix it properly as part of type annotation work, but if @emilykl or @marthacryan thinks we should keep the if TYPE_CHECKING code, easy to add it back into this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvwilson Have your thoughts on this TYPE_CHECKING case changed at all following the work you've been doing more recently with type annotations?

I don't have enough context here to understand the implications of removing the TYPE_CHECKING case. But would be happy to discuss more with you or @alexcjohnson or @T4rk1n .

@gvwilson
Copy link
Contributor Author

see also #4951

1.  Update required version of `black` in `pyproject.toml` and `.circleci/config.yml`.
2.  Update Python version identifiers in `tool.black` section of `pyproject.toml` to include `py311` and `py312`.
3.  Modify invocation of `black` in `commands.py` to format with `py311`.
4.  Regenerate and reformat code.
@emilykl
Copy link
Contributor

emilykl commented Apr 3, 2025

@gvwilson High-level question -- what is the practical impact of some of these changes on the docs / docstrings, for example, removing the list of CSS colors or the list of dict properties?

Completely agree that having those large chunks of text repeated many times over is a very poor use of bundle bytes. But are there cases where it makes the docstrings less useful, and if so are there any changes we need to make in the docs to mitigate the impact?

@@ -26,6 +26,10 @@
get_data_validator_instance,
)

# Target Python version for code formatting with Black.
# Must be one of the values listed in pyproject.toml.
BLACK_TARGET_VERSION = "py311"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black docs say to pass ALL Python versions supported by the code as a --target-version — should we follow that recommendation?

I guess the configuration in the pyproject.toml doesn't play nice with the codegen, which is why we need to redefine target_version here?

@emilykl
Copy link
Contributor

emilykl commented Apr 3, 2025

@gvwilson Could we add a single line at the top of all codegen files saying # THIS FILE IS AUTO-GENERATED?

I realize that runs exactly opposite to the goal of reducing bundle size, but it would make development a lot easier to have all code-generated files identified in the file itself.

(Could be a separate PR.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file showing up in the diff? Bad merge? Should probably just defer to whatever version of this file is on main.

params["data_docs"] = (
'"""' + self.get_constructor_params_docstring() + '\n"""'
)
params["data_docs"] = '"""\n"""'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if removing this docstring has any impact on the docs, or otherwise impacts the dev process? I am assuming not, but I do wonder why it was there in the first place.

if "updateplotlyjsdev" in sys.argv:
def main():
if len(sys.argv) != 2:
print(USAGE, file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -834,6 +847,14 @@ def _ipython_display_(self):
else:
print(repr(self))

def _init_provided(self, name, arg, provided):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this function be called something like _set_property(self, prop_name, value, defaults)?

I find the existing name hard to parse.

@emilykl
Copy link
Contributor

emilykl commented Apr 3, 2025

@gvwilson Looks good, see my comments for some questions I have, but none of them are blockers to merging — except possibly the removal of the TYPE_CHECKING if clause, which I don't fully understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P2 considered for next cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants