Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use plotly.js base64 API to store and pass typed arrays declared by numpy, pandas, etc. #4470

Merged
merged 128 commits into from
Oct 21, 2024

Conversation

archmoj
Copy link
Contributor

@archmoj archmoj commented Dec 21, 2023

This PR make use of this plotly.js PR and changes the default behavior to pass numpy arrays as plotly.js base64 spec.

Code Changes

  • Adds to_typed_array_spec function to use for converting numpy arrays to the "typed array spec" type that plotly.js expects when receiving base64 encoded arrays.
  • Updates the validate_coerce function of the DataArrayValidator class to base64 encode numpy arrays and store them in the "typed array spec" format (by calling to_typed_array_spec)
  • Adds thorough testing to validators
  • Adds performance (size and speed) tests

Co-authored-by: Liam Connors <liam@plot.ly>
doc/python/b64.md Outdated Show resolved Hide resolved
doc/python/b64.md Outdated Show resolved Hide resolved
doc/python/b64.md Outdated Show resolved Hide resolved
doc/python/b64.md Outdated Show resolved Hide resolved
archmoj and others added 3 commits February 29, 2024 11:10
 - fixed Conflicts in
 packages/python/plotly/plotly/tests/test_optional/test_figure_factory/test_figure_factory.py
 packages/python/plotly/plotly/tests/test_optional/test_px/test_px_input.py
Convert base64 in BaseFigure.to_dict instead of validate_coerce
@archmoj
Copy link
Contributor Author

archmoj commented Oct 15, 2024

@marthacryan Please resolve the conflicts with master; then merge master into this branch.
Then you may possibly copy the version of tests from master into this branch to see if any tests need any adjustments.
Thank you!

@gvwilson
Copy link
Contributor

thank you everyone

- Removes a speed comparison that is no longer needed because the base64 encoding makes it faster
- Updates a test to use numpy arrays to account for plotly express automatically converting numerical lists to pandas arrays.
- Remove unnecessary enumerate
@@ -372,38 +372,6 @@ def test_invalid_encode_exception(self):
with self.assertRaises(TypeError):
_json.dumps({"a": {1}}, cls=utils.PlotlyJSONEncoder)

def test_fast_track_finite_arrays(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marthacryan
Could we revert this change now or we should drop the test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because it was failing. Presumably it's just faster now?

Copy link
Member

@ndrezn ndrezn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Psyched for this to land! Killer work team.
ezgif-4-d5a21d0496

if any(l1 != l2 for l1, l2 in zip(fig1, fig2)):
print("".join(difflib.unified_diff(fig1, fig2)))
raise ValueError(f"Pandas 1/2 difference in {filename}")
if filename not in [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marthacryan / @archmoj What is the reason for excluding these particular files?

are skipped for conversion to the typed array spec
"""
skipped_keys = ["geojson", "layer", "range"]
return any(skipped_key in key for skipped_key in skipped_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marthacryan Shouldn't the test be skipped_key == key rather than skipped_key in key ?

Or we really do want to be checking that the skipped key is a substring of key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the docstring comment is no longer accurate

v = copy_to_readonly_numpy_array(v)

np = get_module("numpy", should_load=False)
if not isinstance(v, np.ndarray):
Copy link
Contributor

@emilykl emilykl Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this line raise an error if numpy is not installed? (Since get_module will return None in that case.)

Is that ok?

@marthacryan marthacryan merged commit 8c75004 into master Oct 21, 2024
2 of 4 checks passed
@marthacryan marthacryan deleted the pass-b64 branch October 21, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P1 needed for current cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants