Skip to content

Support calling positional arguments by keyword (fix #998) #2081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 17, 2018

Conversation

AdamGleave
Copy link
Contributor

@AdamGleave AdamGleave commented May 17, 2018

What do these changes do?

Python positional arguments can also be called via their keyword, with Python allowing the caller to choose between these two styles. (Exception: some built-in and C-bound functions only support positional arguments.) Ray only supported calling keyword arguments when they had a default value, a surprising semantic for Python users.

This PR adds support keyword arguments in all common cases. It also adds unit tests to verify Ray accepts the keyword arguments, and rejects invalid uses.

There are a few unusual cases it does not support (but will warn the user of):

  • We do not handle arguments that can only be specified by keyword; this happens when keyword arguments are defined after * or *args. Ideally we'd support this too -- I've intentionally written code like def f(*, kwd1, kwd2) to force callers to use keyword arguments. This doesn't seem difficult conceptually, but requires changing the return type of extend_args to be args, kwargs rather than just. This impacts a number of call sites I do not understand fully, so I decided not to touch this.
  • We do not support **kwargs (an existing limitation in Ray).

Related issue number

Addresses #998 (apart from the caveats described above).

@AdamGleave
Copy link
Contributor Author

I also note that funcsigs has a function Signature.bind that does exactly what extend_args is built to do, and should handle all the weird edge cases for us. I didn't make this change as there is some Cython-related magic going on in get_signature_params that I think might break this. Could someone who understands the related code better comment?

@robertnishihara
Copy link
Collaborator

Thanks @AdamGleave, this looks really great!

I think you're right that we need to switch to having extend_args return args and kwargs. The only challenge here is that we will need a slightly different way of encoding that in our C++ "TaskSpecification" object.

Could you quickly run something like

import ray
import cProfile

ray.init()

@ray.remote
def f(a, b, c, x=1, y=2, z=3):
    pass

ray.get(f.remote(1, 2, 3))

cProfile.run('for _ in range(10000): f.remote(1, 2, 3)')

before and after this change and see if the amount of time spent in extend_args changes much? It doesn't look like it should.

@danielsuo implemented support for Cython in #1193.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5448/
Test PASSed.

@AdamGleave
Copy link
Contributor Author

Thanks for the profiling code @robertnishihara . The changes don't seem to have had much impact on performance -- it's hovering around 130ms spent in extend_args with or without changes.

For completeness, I tested it on a 6-core i7-3930K, and the full results are below. I tried running it several times (not reported) but the results were pretty consistent from run to run.

# Before changes
         570003 function calls in 1.718 seconds                                                                                                                                                       

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.023    0.023    1.718    1.718 <string>:1(<module>)
    10000    0.026    0.000    1.694    0.000 remote_function.py:112(remote)
    10000    0.139    0.000    1.668    0.000 remote_function.py:116(_submit)
    10000    0.129    0.000    0.184    0.000 signature.py:158(extend_args)
    20000    0.030    0.000    0.030    0.000 threading.py:1076(name)
    20000    0.045    0.000    0.074    0.000 threading.py:1150(getName)
    20000    0.048    0.000    0.069    0.000 threading.py:1230(current_thread)
    10000    0.016    0.000    0.016    0.000 utils.py:168(resources_from_resource_arguments)
    20000    0.075    0.000    0.218    0.000 worker.py:1081(check_main_thread)
    10000    0.010    0.000    0.010    0.000 worker.py:236(check_connected)
    10000    0.015    0.000    0.015    0.000 worker.py:2365(__init__)
    10000    0.025    0.000    0.109    0.000 worker.py:2371(__enter__)
    10000    0.030    0.000    0.118    0.000 worker.py:2378(__exit__)
    10000    0.032    0.000    0.047    0.000 worker.py:2395(log_span)
    20000    0.110    0.000    0.173    0.000 worker.py:2403(log)
    20000    0.022    0.000    0.022    0.000 worker.py:2424(<dictcomp>)
    10000    0.010    0.000    0.010    0.000 worker.py:2587(get_global_worker)
    10000    0.500    0.000    1.185    0.000 worker.py:505(submit_task)
    20000    0.021    0.000    0.021    0.000 {built-in method _thread.get_ident}
        1    0.000    0.000    1.718    1.718 {built-in method builtins.exec}
    80000    0.086    0.000    0.086    0.000 {built-in method builtins.isinstance}
    40000    0.038    0.000    0.038    0.000 {built-in method builtins.len}
    60000    0.061    0.000    0.061    0.000 {built-in method liblocal_scheduler.check_simple_value}
    90000    0.084    0.000    0.084    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    10000    0.011    0.000    0.011    0.000 {method 'id' of 'common.ObjectID' objects}
    20000    0.019    0.000    0.019    0.000 {method 'items' of 'dict' objects}
    10000    0.015    0.000    0.015    0.000 {method 'returns' of 'task.Task' objects}
    10000    0.098    0.000    0.098    0.000 {method 'submit' of 'local_scheduler.LocalSchedulerClient' objects}

# After changes
         570003 function calls in 1.750 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.024    0.024    1.749    1.749 <string>:1(<module>)
    10000    0.027    0.000    1.726    0.000 remote_function.py:112(remote)
    10000    0.142    0.000    1.699    0.000 remote_function.py:116(_submit)
    10000    0.130    0.000    0.188    0.000 signature.py:158(extend_args)
    20000    0.022    0.000    0.022    0.000 threading.py:1076(name)
    20000    0.045    0.000    0.067    0.000 threading.py:1150(getName)
    20000    0.049    0.000    0.071    0.000 threading.py:1230(current_thread)
    10000    0.017    0.000    0.017    0.000 utils.py:168(resources_from_resource_arguments)
    20000    0.074    0.000    0.213    0.000 worker.py:1081(check_main_thread)
    10000    0.011    0.000    0.011    0.000 worker.py:236(check_connected)
    10000    0.016    0.000    0.016    0.000 worker.py:2365(__init__)                                                                                                                                
    10000    0.025    0.000    0.113    0.000 worker.py:2371(__enter__)                                                                                                                               
    10000    0.030    0.000    0.122    0.000 worker.py:2378(__exit__)
    10000    0.033    0.000    0.049    0.000 worker.py:2395(log_span)
    20000    0.114    0.000    0.181    0.000 worker.py:2403(log)
    20000    0.023    0.000    0.023    0.000 worker.py:2424(<dictcomp>)
    10000    0.010    0.000    0.010    0.000 worker.py:2587(get_global_worker)
    10000    0.516    0.000    1.213    0.000 worker.py:505(submit_task)
    20000    0.022    0.000    0.022    0.000 {built-in method _thread.get_ident}
        1    0.000    0.000    1.750    1.750 {built-in method builtins.exec}
    80000    0.094    0.000    0.094    0.000 {built-in method builtins.isinstance}
    40000    0.039    0.000    0.039    0.000 {built-in method builtins.len}                                                                                                                          
    60000    0.062    0.000    0.062    0.000 {built-in method liblocal_scheduler.check_simple_value}                                                                                                 
    90000    0.086    0.000    0.086    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}                                                                                                        
    10000    0.011    0.000    0.011    0.000 {method 'id' of 'common.ObjectID' objects}                                                                                                              
    20000    0.020    0.000    0.020    0.000 {method 'items' of 'dict' objects}                                                                                                                      
    10000    0.015    0.000    0.015    0.000 {method 'returns' of 'task.Task' objects}
    10000    0.093    0.000    0.093    0.000 {method 'submit' of 'local_scheduler.LocalSchedulerClient' objects}

@robertnishihara
Copy link
Collaborator

Ok, sounds good.

I'm seeing the following test failure from python test/runtest.py APITest.testVariableNumberOfArgs

======================================================================
FAIL: testVariableNumberOfArgs (__main__.APITest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/runtest.py", line 607, in testVariableNumberOfArgs
    self.assertTrue(test_functions.varargs_and_kwargs_exception_thrown)
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 3.626s

FAILED (failures=1)

Looks like that test needs to be updated.

Also, can you run

yapf --style=pep8 -i python/ray/signature.py

and commit the resulting change to that file? We have some fairly strict linting checks.

@AdamGleave
Copy link
Contributor Author

I've updated signature.py to conform to code style. yapf made some awkward line breaks so I manually split out some of the deeply nested booleans to make the structure clearer.

Apologies for not catching that failing test. I think Ray should accept functions of that form (*args after a keyword argument), it is valid Python albeit dubious coding style, so I have removed that test. If there is a reason for us forbidding this case then I can add a check in signature.py to forbid this.

@robertnishihara
Copy link
Collaborator

Ok thanks! Yeah, we should support functions of that form. The only reason for the test was that they didn't work before this PR I think.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5453/
Test PASSed.

@robertnishihara
Copy link
Collaborator

I had to push a small change, I think our versions of yapf may differ. Anyway, will merge soon. Thanks again!

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5456/
Test PASSed.

@robertnishihara robertnishihara merged commit 470887c into ray-project:master May 17, 2018
@robertnishihara
Copy link
Collaborator

Thanks again, this is a huge help!

alok added a commit to alok/ray that referenced this pull request May 18, 2018
* master: (22 commits)
  [xray] Fix bug in updating actor execution dependencies (ray-project#2064)
  [DataFrame] Refactor __delitem__ (ray-project#2080)
  [xray] Better error messaging when pulling from self. (ray-project#2068)
  Use source code in hash where possible (fix ray-project#2089) (ray-project#2090)
  Functions for flushing done tasks and evicted objects. (ray-project#2033)
  Fix compilation error for RAY_USE_NEW_GCS with latest clang. (ray-project#2086)
  [xray] Corrects Error Handling During Push and Pull. (ray-project#2059)
  [xray] Sophisticated task dependency management (ray-project#2035)
  Support calling positional arguments by keyword (fix ray-project#998) (ray-project#2081)
  [DataFrame] Improve performance of iteration methods (ray-project#2026)
  [DataFrame] Implement to_csv (ray-project#2014)
  [xray] Lineage cache only requests notifications about remote parent tasks (ray-project#2066)
  [rllib] Add magic methods for rollouts (ray-project#2024)
  [DataFrame] Allows DataFrame constructor to take in another DataFrame (ray-project#2072)
  Pin Pandas version for Travis to 0.22 (ray-project#2075)
  Fix python linting (ray-project#2076)
  [xray] Fix GCS table prefixes (ray-project#2065)
  Some tests for _submit API. (ray-project#2062)
  [rllib] Queue lib for python 2.7 (ray-project#2057)
  [autoscaler] Remove faulty assert that breaks during downscaling, pull configs from env (ray-project#2006)
  ...
alok added a commit to alok/ray that referenced this pull request May 21, 2018
* master: (24 commits)
  Performance fix (ray-project#2110)
  Use flake8-comprehensions (ray-project#1976)
  Improve error message printing and suppression. (ray-project#2104)
  [rllib] [doc] Broken link in ddpg doc
  YAPF, take 3 (ray-project#2098)
  [rllib] rename async -> _async (ray-project#2097)
  fix unused lambda capture (ray-project#2102)
  [xray] Use pubsub instead of timeout for ObjectManager Pull. (ray-project#2079)
  [DataFrame] Update _inherit_docstrings (ray-project#2085)
  [JavaWorker] Changes to the build system for support java worker (ray-project#2092)
  [xray] Fix bug in updating actor execution dependencies (ray-project#2064)
  [DataFrame] Refactor __delitem__ (ray-project#2080)
  [xray] Better error messaging when pulling from self. (ray-project#2068)
  Use source code in hash where possible (fix ray-project#2089) (ray-project#2090)
  Functions for flushing done tasks and evicted objects. (ray-project#2033)
  Fix compilation error for RAY_USE_NEW_GCS with latest clang. (ray-project#2086)
  [xray] Corrects Error Handling During Push and Pull. (ray-project#2059)
  [xray] Sophisticated task dependency management (ray-project#2035)
  Support calling positional arguments by keyword (fix ray-project#998) (ray-project#2081)
  [DataFrame] Improve performance of iteration methods (ray-project#2026)
  ...
@AdamGleave AdamGleave deleted the kwargs-fix branch June 14, 2018 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants