Speed up graph objs #678

theengineear · 2017-01-30T22:33:34Z

Introduces a utility decorator to memoize functions along with some other performance improvements surrounding `graph_reference.

This was something I tinkered on over the weekend. I spent a bit more time today formalizing it so that the work didn't go to waste.

Closes #497

^^ I'm closing that issue, but it's likely that there are other improvements to be had. My suggestion would be to separate make_subplots so that users can choose whether they'd like to use graph_objs or native Python objects to create a plot. That can/should be handled as a different issue though.

theengineear · 2017-01-30T22:37:24Z

@Kully, wanna put some 👀 on this?

//cc @depettit, thanks for the clear issue documentation. This targets the make_subplots portion. It's quite possible that the append_trace is sill going to be problematic for you. I'll likely ask for your help in manually testing this after this PR is merged into master.

Kully · 2017-01-30T22:43:12Z

@Kully, wanna put some 👀 on this?
Sure, tomorrow most likely 👍

dhirschfeld · 2017-01-30T23:37:45Z

FWIW I always use the decorator module to implement signature/docstring preserving decorators. If the decorated functions aren't user-facing then it's possibly not a big issue however I've found it useful when debugging my own code to have the decorated functions have meaningful signatures and docstrings.

The added utility does have to be weighed up against another dependency but these days with conda I don't see that as such a big issue and it's already a dep for sever other widely used packages.

theengineear · 2017-01-30T23:48:49Z

Oops, thanks @dhirschfeld , I meant to preserve the docstrings but forgot 😓.

I typically use something like wraps. What benefits beyond something like wraps, does the decorator package provide?

theengineear · 2017-01-30T23:49:48Z

FWIW, these decorated functions aren't really meant to be user-facing, but, it was my intention to make a more general-purpose decorator that would preserve names/docs.

theengineear · 2017-01-31T00:10:38Z

OK, just read through the docs for decorate. Looks great! I'll try to rewrite using this, seems like the dep is pretty small.

…

On Jan 30, 2017 6:37 PM, "Dave Hirschfeld" ***@***.***> wrote: FWIW I always use the decorator <http://pythonhosted.org/decorator/documentation.html> module to implement signature/docstring preserving decorators. If the decorated functions aren't user-facing then it's possibly not a big issue however I've found it useful when debugging my own code to have the decorated functions have meaningful signatures and docstrings. The added utility does have to be weighed up against another dependency but these days with conda I don't see that as such a big issue and it's already a dep for sever other widely used packages. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#678 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGTiWlyU3n8F61fN11Gba5JryQT9sFlMks5rXnRMgaJpZM4LyAWH> .

dhirschfeld · 2017-01-31T00:18:53Z

The decorator module will preserve docstring and signature on both py2/3 and will ensure inspect.getargspec also works correctly.

From a quick test, much of this is mitigated on py3.5 where wraps also preserves the signature and where getargspec is deprecated in favour of inspect.signature which does work with wrapped functions.

My mantra of always using the decorator module comes from a long history of Py2 usage 😛 . Still, if you have to support both it's probably worth considering...

The bottleneck for the `make_subplots` function was due to excessive lookups in the `plot_schema`. These lookups are actually pretty computation-intensive so caching these computations can give us a large performance boost. Note that Python 3.2+ has a new [`functools.lru_cache`](https://docs.python.org/3/library/functools.html#functools.lru_cache) which can be used for this. HOWEVER, we support Python 2.7+ and I didn’t see a backport for it. There are numerous `memoize` packages on PyPI, so many that I didn’t want to commit to one. It’s fairly simple to write this and then we don’t need another dependency.

The `value` argument currently allows for any value type. This isn’t very helpful for memoization. Instead, the top-level function now categorizes `value` as a string so that we *can* memoize the function. Also, we need hashable args, so we `tuple` the `parent_object_names`.

The `__init__` method ends up doing some additional validation, it’s more performant (and exactly the same), to set an item in layout as a dict instead of as a *graph_obj*.

theengineear · 2017-01-31T18:04:25Z

@dhirschfeld , I took you up on the suggestion. Thanks!

@chriddyp the decorator module is a dep for ipython, so it's likely that folks will have it already. It's fairly small and provides some nice Python-version-independent functionality. Given it's size, I'm a proponent of adding it as a dep, but I figured I'd ping you on it.

theengineear · 2017-01-31T18:06:09Z

Additionally, I should note that instead of letting callers specify a key_function, I decided it was better form to nudge developers to just create functions that are naturally memoizable (all args/kwargs hashable).

theengineear · 2017-01-31T23:17:18Z

@Kully unless there's pushback on the new decorator dep, I'm done futzing with this PR.

Kully · 2017-02-01T18:29:22Z

plotly/tools.py

@@ -1000,8 +1000,7 @@ def _get_anchors(r, c, x_cnt, y_cnt, shared_xaxes, shared_yaxes):
    # Function pasting x/y domains in layout object (2d case)
    def _add_domain(layout, x_or_y, label, domain, anchor, position):
        name = label[0] + 'axis' + label[1:]
-        graph_obj = '{X_or_Y}Axis'.format(X_or_Y=x_or_y.upper())
-        axis = getattr(graph_objs, graph_obj)(domain=domain)
+        axis = {'domain': domain}


I suppose you did this because axis will never be equal to anything other than {'domain': domain}, and so it's like a trivial case of memoization? :)

this isn't about memoization as much as it's about performance. the line graph_obj = '{X_or_Y}Axis'.format(X_or_Y=x_or_y.upper()) was eating up a ton of cpu time because it was instantiating that graph_object and then immediately setting it inside layout, which effectively was instantiating it again.

Kully · 2017-02-01T18:31:53Z

plotly/utils.py

+
+    :param (int|None) maxsize: Limit the number of cached results. This is a
+                               simple way to prevent memory leaks. Setting this
+                               to `None` will remember *all* calls. The 128


Do you think it's a good idea to allow None - i.e. remembering all calls? Perhaps to avoid huge memory leaks, maybe have the cap at a really high number.

Yah, this mimics lru_cache. I'd have tried to use this, but it's not available in Python < 3.2.

Kully · 2017-02-01T18:56:59Z

plotly/utils.py

+        if key in keys:
+            return cache[key]
+
+        if maxsize is not None and len(keys) == maxsize:


This condition is set out very clearly! 👍

Kully · 2017-02-01T19:12:42Z

plotly/tests/test_core/test_utils/test_utils.py

+
+        self.assertEqual(name_space.call_count, 0)
+        for i, (inputs, result) in enumerate(tests, 1):
+            for _ in range(10):


Out of curiosity, is _ the canonical (or a typical) variable used for loops where the looping variable is not used inside?

yup, _ is the common placeholder when you don't intent to use the value.

theengineear · 2017-02-02T17:25:56Z

@Kully , what's the word, 🐦? Can this 💃?

@chriddyp , can I get a confirmation that adding the decorator dep is OK? Again, IPython uses it, it's PY2/3 agnostic, and it's quite small (nudge, nudge, nudge).

chriddyp · 2017-02-02T17:29:33Z

can I get a confirmation that adding the decorator dep is OK? Again, IPython uses it, it's PY2/3 agnostic, and it's quite small (nudge, nudge, nudge).

yup!

Kully · 2017-02-02T18:29:16Z

Can this 💃?

Yup! 💃 🕺 👯

Use TestCase for test_make_subplots.

adce1bb

theengineear added 6 commits January 31, 2017 12:45

Memoize some graph_reference functions.

0f171dd

Pre-compile a regex to improve performance.

1061b24

Implicitly create graph_objs in make_subplots.

e2e6bb9

The `__init__` method ends up doing some additional validation, it’s more performant (and exactly the same), to set an item in layout as a dict instead of as a *graph_obj*.

Update CHANGELOG.md.

28c6fe0

theengineear force-pushed the speed_up_graph_objs branch from 9ee5042 to 28c6fe0 Compare January 31, 2017 18:01

Kully reviewed Feb 1, 2017

View reviewed changes

theengineear merged commit 6f9621a into master Feb 2, 2017

theengineear deleted the speed_up_graph_objs branch February 2, 2017 21:03

Uh oh!

Speed up graph objs #678

Speed up graph objs #678

Uh oh!

Conversation

theengineear commented Jan 30, 2017

Uh oh!

theengineear commented Jan 30, 2017

Uh oh!

Kully commented Jan 30, 2017

Uh oh!

dhirschfeld commented Jan 30, 2017

Uh oh!

theengineear commented Jan 30, 2017

Uh oh!

theengineear commented Jan 30, 2017

Uh oh!

theengineear commented Jan 31, 2017 via email

Uh oh!

dhirschfeld commented Jan 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theengineear commented Jan 31, 2017

Uh oh!

theengineear commented Jan 31, 2017

Uh oh!

theengineear commented Jan 31, 2017

Uh oh!

Kully Feb 1, 2017

Choose a reason for hiding this comment

Uh oh!

theengineear Feb 1, 2017

Choose a reason for hiding this comment

Uh oh!

Kully Feb 1, 2017

Choose a reason for hiding this comment

Uh oh!

theengineear Feb 1, 2017

Choose a reason for hiding this comment

Uh oh!

Kully Feb 1, 2017

Choose a reason for hiding this comment

Uh oh!

Kully Feb 1, 2017

Choose a reason for hiding this comment

Uh oh!

theengineear Feb 1, 2017

Choose a reason for hiding this comment

Uh oh!

theengineear commented Feb 2, 2017

Uh oh!

chriddyp commented Feb 2, 2017

Uh oh!

Kully commented Feb 2, 2017

Uh oh!

Uh oh!

dhirschfeld commented Jan 31, 2017 •

edited

Loading