Skip to content

add optional arguments to add_str() #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

ftyers
Copy link

@ftyers ftyers commented Jun 28, 2021

Responding as per request on Matrix. I've added the optional arguments I think are relevant. If this looks good, I'm happy to add the same processing for add_json and add_bytes.

Copy link
Contributor

@c0llab0rat0r c0llab0rat0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs automated tests to exercise new parameters.

@ntninja I would like to hold off on merging this until 0.8.0 is released.

Comment on lines 347 to 359
opts = {
"trickle": trickle,
"only-hash": only_hash,
"pin": pin,
"raw-leaves": raw_leaves if raw_leaves is not None else nocopy,
} # type: ty.Dict[str, ty.Union[str, bool]]
for option_name, option_value in [
("chunker", chunker),
("cid-version", cid_version),
]:
if option_value is not None:
opts[option_name] = option_value
kwargs.setdefault("opts", {}).update(opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block looks duplicated with add_bytes - can it be refactored into a function called from both add_bytes and add_str ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably can, but I don't know how.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the extract function refactor.

It looks like the extracted function would take as arguments:

  • tricke
  • only_hash
  • pin
  • raw_leaves
  • nocopy
  • chunker
  • cid_version

...and return a new dict (or a new CommonArgs) that would be passed to self._client.request

The implementation in files.py in Base::add is different than this new implementation so it wouldn't be a part of the refactor.

]:
if option_value is not None:
opts[option_name] = option_value
kwargs.setdefault("opts", {}).update(opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid mutating kwargs and copy into a new dict instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the implementation from files.py, so presumably it works. presumably what you ask can be done, I don't know how to do it. But if you provide the code I am happy to copy/paste.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It "works" because changes to the kwargs dict - when declared as **kwargs - don't mutate the values of the variables at the call site in the function that called the function declaring **kwargs - however if it was declared as kwargs (no asterisks) you would mutate the call site's dictionary.

It's bad style - mutating mutable arguments can cause sneaky defects in functions that call the function doing the mutating (unless the documentation or name of the function states or strongly suggests that's the expected behavior).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you refactor this into a new function that returns dict (as mentioned in another comment) the issue of kwargs being mutated disappears.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c0llab0rat0r: I don't believe that mutating the kwargs is a problem, since it is always a shallow copy of the caller's provided arguments.

I've seen plenty of code doing things like:

def func(a, *a, **kw):
    xtra_arg = kw.pop("xtra", "default")
    ...

This was/is a very common trick to emulate kw-only args with writing Python 2 compatible code. Until we dropped Python 2 support py-ipfs-api-client had similar code itself…

Note: kwargs is the exception here, not the rule. You are right that in general mutating arguments like this is indeed very bad unless the function/method is clearly meant to do so.

@@ -240,7 +240,13 @@ def apply_workarounds(self):

@utils.return_field('Hash')
@base.returns_single_item(dict)
def add_bytes(self, data: bytes, **kwargs):
def add_bytes(self, data: bytes
trickle: bool = False, only_hash: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put each argument on a separate line (i.e. trickle and only_hash should be on separate lines)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, although this is how it is in the place I copied it from.

chunker: ty.Optional[str] = None,
pin: bool = True, raw_leaves: bool = None,
cid_version: ty.Optional[int] = None,
**kwargs: base.CommonArgs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pre-existing problem, but add_bytes (and add_str) lack a return type declaration.

Copy link
Author

@ftyers ftyers Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you tell me what to put I'll copy/paste it :) I've never used types in Python before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. There's some examples in this library of how to do it. The library pre-dates Python's typing support so there's also lots of code that doesn't use it yet, or uses it a legacy way (e.g. with comments).

Function that doesn't return anything:

def foo(bar: str) -> None:

Function that returns a value:

def foo(bar: str) -> int:

Function that returns a tuple:

def foo(bar: str) -> ty.Tuple[int, bytes]:

Note that the ty. prefix is unique to this library since all the typing imports are imported as follows:

import typing as ty

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pin
Pin this object when adding
raw_leaves
Use raw blocks for leaf nodes. (experimental). (Default: ``True``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for enabling the experimental raw blocks feature when nocopy is enabled? Since it's experimental, can raw blocks be left disabled by default even when nocopy is being used?

Note - I'm not yet suggesting the behavior must change, but rather seeing to understand the choice first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no rationale, I'm working by copy/pasting here. I was just trying to scratch an itch (e.g. why add() has only_hash=True but add_str() did not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we need @ntninja to provide some more background on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_leaves is a prerequisite for nocopy on the daemon side because of how no-copy blocks are stored there.

Also I already asked @ftyers to remove the nocopy argument because it does not make sense for content not added from local files or URLs. So all that is really needed here is:

  1. Remove any mentioning of nocopy in the docs (the entire Default: part can go)
  2. Make the argument default to False instead of None
  3. Change "raw-leaves": raw_leaves if raw_leaves is not None else nocopy, to just "raw-leaves": raw_leaves, below (that should also fix the test failure)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is raw blocks still an experimental feature? (ignoring that the entire IPFS is v0.x lol)

Is some adjustment needed to both this new docstring and the existing docstrings mentioning raw blocks / raw leaves?

"only-hash": only_hash,
"pin": pin,
"raw-leaves": raw_leaves if raw_leaves is not None else nocopy,
} # type: ty.Dict[str, ty.Union[str, bool]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Python 3.6 forward, it's preferable to declare the type on the variable rather than in a comment.

Change to opts: ty.Dict[str, ty.Union[str, bool]] = { ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 305 to 310
def add_str(self, string,
trickle: bool = False, only_hash: bool = False,
chunker: ty.Optional[str] = None,
pin: bool = True, raw_leaves: bool = None,
cid_version: ty.Optional[int] = None,
**kwargs: base.CommonArgs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issues as with add_bytes - parameter on each line, and add return type declaration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@c0llab0rat0r
Copy link
Contributor

How were these changes tested?

There's some basic errors in the build failures:

https://travis-ci.com/github/ipfs-shipyard/py-ipfs-http-client/jobs/520226458

NameError: name 'nocopy' is not defined

Also, add some new pytest tests to exercise the new parameters and behavior.

@ftyers
Copy link
Author

ftyers commented Jun 29, 2021

How were these changes tested?

There's some basic errors in the build failures:

https://travis-ci.com/github/ipfs-shipyard/py-ipfs-http-client/jobs/520226458

NameError: name 'nocopy' is not defined

Also, add some new pytest tests to exercise the new parameters and behavior.

They weren't tested.

Copy link
Contributor

@ntninja ntninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ftyers: Please don't be discouraged by @c0llab0rat0r's feedback! They don't want to make you feel bad, just point out some issues and possible areas of improvement they noticed.

(Also: Just because it doesn't pass the tests – which it really does have to in the end – doesn't mean its horrible code or not a good start. It's normal that tests don't pass on first try; that's pretty much why we have them actually.)

pin
Pin this object when adding
raw_leaves
Use raw blocks for leaf nodes. (experimental). (Default: ``True``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_leaves is a prerequisite for nocopy on the daemon side because of how no-copy blocks are stored there.

Also I already asked @ftyers to remove the nocopy argument because it does not make sense for content not added from local files or URLs. So all that is really needed here is:

  1. Remove any mentioning of nocopy in the docs (the entire Default: part can go)
  2. Make the argument default to False instead of None
  3. Change "raw-leaves": raw_leaves if raw_leaves is not None else nocopy, to just "raw-leaves": raw_leaves, below (that should also fix the test failure)

]:
if option_value is not None:
opts[option_name] = option_value
kwargs.setdefault("opts", {}).update(opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c0llab0rat0r: I don't believe that mutating the kwargs is a problem, since it is always a shallow copy of the caller's provided arguments.

I've seen plenty of code doing things like:

def func(a, *a, **kw):
    xtra_arg = kw.pop("xtra", "default")
    ...

This was/is a very common trick to emulate kw-only args with writing Python 2 compatible code. Until we dropped Python 2 support py-ipfs-api-client had similar code itself…

Note: kwargs is the exception here, not the rule. You are right that in general mutating arguments like this is indeed very bad unless the function/method is clearly meant to do so.

@ntninja
Copy link
Contributor

ntninja commented Jun 29, 2021

@ftyers: If you really want to stop working on this please do tell and we'll probably complete this eventually. (No pressure or anything, I just feel there has been some kind of bad social interaction here, so I don't want you to quit for the wrong reasons.)

@ftyers
Copy link
Author

ftyers commented Jun 29, 2021

@ntninja I think I don't see the point of contributing if someone already knows how to do it. It seems a lot less efficient going through the motions here than @c0llab0rat0r just doing it themselves as they clearly have a much better idea of Python and the py-ipfs codebase than I do. The reason I opened the PR was to save everyone time, it doesn't feel like we're saving time here. This should be an easy PR and I feel like I'm just wasting everyone's time (mine included).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants