Skip to content

Add support for namespaced method call #25052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Add support for namespaced method call #25052

wants to merge 3 commits into from

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Dec 13, 2017

This patch add preliminary support for calling a function (method in OOB sense)
without having to export the function and therefore solves the namespace
issue for end-users, i.e. one can write

a = M.A()
a-->f1(...)
a-->f2(...)

instead of

a = M.A()
M.f1(a, ...)
M.f2(a, ...)

It doesn't solve the issue for abstract types or implementer of a concrete
extending an abstract type. All the functions that belongs to the
abstract type still need to be imported in order for them to work on the
concrete type. (See below)

Breakage/impact

  1. With this change, the recommendation argument order for functions
    that is operation on an object
    and happens to take a callable argument is changed.
    As mentioned elsewhere, finalizer is the only ambiguous case
    and can't be changed in a backward compatible way
    when the recommendation changes which is why this has to be done now.

    If the finalizer change was not done initially,
    this feature can be added later as a non-breaking change
    in some way, which was my original plan.

  2. Parsing of -->

    The selection of the current syntax is breaking for macro writers.
    There are ways to do this without any such breakage but those
    may require more testing in real use to know whether they are
    good/useful for the users. (See below.)

This also means that only the first commit and the parser change
need to be in before the release. The actual implementation
of the feature can be added later if needed.

Syntax

There are a few different new and old features that require similar
syntax (i.e. automatic quoting of the RHS of the operator)
The list and their properties/requirements

  1. getfield

    Or the operation that actually returns the field of a composit type
    This will not be overloadable (or need to be accessed by the
    overloaded version).

  2. Accessing method that are bound to object without export (this PR)

    This doesn't need overloading either.

  3. getproperty

    Or the overloadable version of getfield

  4. gepfield

    For mutation embeded immutable. Should also be overloadable

The first two are (low level) operations to provide certain features
while the last two are syntax sugars to give better syntax for an
otherwise existing feature (4 is kind of in between,
it adds a new low level operation but the user facing API/syntax
can have user defined meaning). The separation suggests that
the first two could use the same syntax while the latter two could
use another syntax, given that, IMHO, the overload for 3 and 4
generally shouldn't have overlapping meanings.
This would mean using . for getfield and this PR while using @
for the overloadable version to mean the overloadable
gepfield by default.
However, I kind of doubt that everyone would be happy with the syntax
and I would agree such an decision should not be made in a rush.
Therefore, given that I don't really care what syntax this PR uses,
(as long as it provide the per-object/type namespace) I just pick
one of the unused syntactic operators to minimize the change
as well as the inteference with other features.
Changing the operator on top of this can easily be done as long as
we keep the old one in the same major release.

OTOH, I do like that the method call is visually distinct from
normal getfield. In my experience, it almost never help to give the two
the same syntax and in rare case where it does (like swaping out a
method definition for an instance in python), it can still be supported
by going through the overloadable version.

To conclude, the alternative syntax I've considerred are,

  1. Using @

    No fundamental difference from -->. Closer to MATLAB syntax.

  2. Using .

    Or as getfield internal.
    This is basically what python have.
    I kind of feel like it's confusing sometimes but YMMV.

  3. Using . but only in a.b(...)

    The parser change is actually surprisingly easy.
    A special case is needed for Module (included in current impl).
    I don't really like it but it's an option and it's not that hard
    to explain. It will make it harder to access the closure
    version of the function but we have anonymous function for that
    already...

Semantics

The whole point of this feature is namespace (avoiding importing/
namespace pollution). There are at least two aspect of it,

  1. For end-users, it should be able to call functions on the object
    without having to import them (edit: or use fully qualified names).
  2. For implementations of abstract type, extending and using
    functions defined for the abstract type shouldn't require
    importing either. (edit: use of fully qualified name is less of an issue here since this should happen less often(see below))

1 is the most important case and is what this implementation solve.
Implementing 2 and making it inferable is harder but since there's
hopefully fewer creator of types than the users of them, it's less
important than 1 (p.s. if this is not ture, it implies that
some types are not being used and should probably be deleted...)
Additionally, a significant fraction of abstract types that are not defined
in the same module as concrete types are base abstract types.
As long as we keep the function exported, those will just work.

This leave the question of what namespace to use for the type.
The simplest solution and what used in the current implementation is
the module (which is our only namespace) the type is defined in.
It does not provide 2 naturally but is the simplest and safest fallback.
Extensions can be added and I'm pretty convinced that those should
generally not interfere with a fallback/default like this.

The current implementation also only concerns function calls.
Function definition syntax can also be added later as part of 2.

Two types can be special

  1. Module

    This is kind of only needed if we want to use ..
    In principle not needed for currently implementation
    but I decide to keep it to be more forward compatible.
    It's unlikely going to cause much issue in practice.

  2. DataType

    The obvious alternative meaning would be to access the
    function for the given type. The special case not included
    only because the current implementation was based on using .
    syntax before I decide --> is a better one for now.
    An error for it can easily be added for now to make future
    extension easier.

Implementation

It is true that this can be implemented by the user with getfield overload.
However, it's non-trivial, requires accessing internal data structure
and will not have the same semantics for all possible implementations.
The getfield overload feature is basically not designed for this
usecase since this (the lookup) is static in nature.
Having a base implementation will make sure this is available
for all types, making it easier to reason about.

The current version can almost certainly be implemented with a normal
function. It is implemented in it's current form since the low level
implementation was done before such implementation is inferxrable and
it is kept this way due to the uncertainty about the desired future
semantics which may need special typeinf support again.

Given the current selection of syntax, I also want to avoid adding
another overloadable syntax before we believe it's really necessary...

There's a special case for Core type which is needed so they can
access functions that are otherwise not defined in their module.

@ararslan ararslan added the triage This should be discussed on a triage call label Dec 13, 2017
@vtjnash
Copy link
Member

vtjnash commented Dec 13, 2017

Interesting approach. Could we also make a-->b mean a-->b() / A.b(a)?

@vtjnash
Copy link
Member

vtjnash commented Dec 13, 2017

Although I'm not sure I understand how finalizer is related here. Why would you ever write x-->finalizer(close) (or close-->finalizer(x) or x-->finalizer do y; close(y); end) instead of finalizer(close, x)? The scope of the finalizer function lookup seems like it would be incorrect.

@yuyichao
Copy link
Contributor Author

Could we also make a-->b mean a-->b() / A.b(a)?

You mean making the call implicit? I think that's too "matlab".....

I'm not sure I understand how finalizer is related here

Because it's an example of a function that operate on the object. As explained in the comment, it is true that the current approach does work with functions defined on abstract type yet though it will just work with base functions.

@StefanKarpinski
Copy link
Member

If the finalizer change was not done initially, this feature can be added later as a non-breaking change in some way, which was my original plan.

Mentioning this plan during the very long discussion of that change would have been helpful.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 13, 2017

I do think this is a problem that needs a solution and had pondered and occasionally suggested similar solutions. I'd prefer the x.f(...) syntax with the following reasoning:

  1. This is what other languages with this kind of feature use. If we use the quite different x-->f(...) syntax, it will just seem weird and off-putting. IMO, it's better to just not have the feature and tell people they have to write M.f(x, ...) than to offer the feature with an unpalatable syntax.

  2. With a.b property overloading on the horizon, one of the main use cases will be letting people use familiar syntax when interfacing with Python and Java libraries via PyCall and JavaCall. It will even weirder that you can write x.f(...) when x is a Python/Java object to call its f method, but that you can't write that when x is a Julia object. Makes Julia feel second-class in Julia itself.

On the whole, I'd prefer to think through what we might need to reserve in order to make this possible to opt into in the future.

@KristofferC
Copy link
Member

Just a check, is this effectively something similar to https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax?

@ararslan
Copy link
Member

it's better to just not have the feature and tell people they have to write M.f(x, ...)

💯

@StefanKarpinski
Copy link
Member

@KristofferC, it's similar as far as I can tell, but there's the additional namespacing consideration. In Julia at least (and maybe some of these other languages), you can't just say that it makes x.f(...) and f(x, ...) equivalent syntaxes since the meaning of f is quite different in those two: in the former it's something that's looked up relative to x whereas in f it's whatever f means in the current scope – which is actually the whole point of this feature. In other languages, I get the impression that it's really a way to appease both people who prefer x.f() and f(x).

@yuyichao
Copy link
Contributor Author

Mentioning this plan during the very long discussion of that change would have been helpful.

I did. #16307 (comment)

IMO, it's better to just not have the feature and tell people they have to write M.f(x, ...) than to offer the feature with an unpalatable syntax.
On the whole, I'd prefer to think through what we might need to reserve in order to make this possible to opt into in the future.

As mentioned above, I'm fine with that too as long as the breaking part is merged. I also explained why I kind of prefer a distinct syntax than making it look similar to field access. It really isn't field access after all.

but that you can't write that when x is a Julia object. Makes Julia feel second-class in Julia itself.

That's exactly the argument about making @ the only overloadable version.
I don't think having the two using the same syntax is very nice either since there'll be case where julia operation on the object will conflict with the python/java operation.

@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 13, 2017

I get the impression that it's really a way to appease both people who prefer x.f() and f(x).

Not just that. x.f() solves the namespace issue. On top of that accessing the f is still important and is not just a way to allow people that prefer f(x). Such need mainly show up when padding the function as a callback. It's more common in C due to the lack of closure but the usecase still exist in higher level languages (e.g. when multiple callbacks are passed in to operate on the same object).

@JeffBezanson
Copy link
Member

I hugely dislike this. It basically splits the system into two, where you need to constantly wonder which style of function call to use. This doesn't really solve the namespace problem; rather it adds a different kind of dispatch that has a different kind of namespace problem. It doesn't work for operators and it doesn't work for functions dispatched on a non-first argument.

There are cases where this feature seems nice, but I believe there is a net increase in complexity and confusion.

@yuyichao
Copy link
Contributor Author

It basically splits the system into two, where you need to constantly wonder which style of function call to use.

Based on the wide availability of this feature in other languages and the existence of namespace in julia, I disagree.

This doesn't really solve the namespace problem; rather it adds a different kind of dispatch that has a different kind of namespace problem.

It doesn't add a different kind of dispatch, nothing more than allowing modules to have/be their own namespace that we have now already. I don't see what new namespace problem this add since all what this does is to remove cases where import (namespace) is needed so it'll have strictly less namespace problem. Any namespace issue on the object's namespace would be exactly the same as the namespace issue of the corresponding module and will never be a different kind/new problem.

It doesn't work for operators and it doesn't work for functions dispatched on a non-first argument.

Correct. I'm not saying it does. That's why the M.f(x, y, z) syntax should still be used in those cases.
There is a very big difference in terms of readability though since in a-->f(...) it should be easy to see that the name f is tightly related to a so the M in M.f(a, ...) is often redundant. OTOH, if neither x, y, z are special, there isn't much one can do apart from explicitly requesting the namespace M.
In another word, this will not make all the explicit namespace lookup unnecessary and is not intended to do that. It does, however, cover an important case in practice, which is also the only case that one can solve in a syntax that's easy to reason about.

There are cases where this feature seems nice, but I believe there is a net increase in complexity and confusion.

  1. As long as getfield overloading is implemented, there'll be no way to stop people from actually doing this in different ways and that will be even more confusing.
  2. The complexity is pretty low and it doesn't interfere with anything else AFAICT. This is an argument to not treat a.f() or a-->f() differently than (a.f)() or (a-->f)() since that will actually increase frontend complexity.

Also, to

Just a check, is this effectively something similar to https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax?

after reading that wiki page, no and not even a little bit. IIUC that concept is about making sth like a.f(..) identical to f(a, ...) which necessarily means either the a.f or f will have a different meaning in non-function call context. Making f having different meaning would be very confusing IMHO and while making a.f having different meaning seems less confusing (and is given as one option above), it is unrelated to what's being solved here. As I mentioned in the title as well as the top-post, comments, the whole point is to make a.f (or a-->f) do the lookup in a's namespace, i.e. so that it can be something completely different and therefore won't conflict with the value of a local f. The a.f or a-->f syntax is merely a way to make it obvious that a name lookup was done on a. .f(a, ...) or -->f(a, ...) would be as good for solving the issue but would probably be a less intuitive syntax.

@StefanKarpinski
Copy link
Member

I'm somewhat sympathetic to this feature, but @JeffBezanson's concerns are very legit and I really don't think we should rush this sort of thing. One observation: if we ever add generic function merging, it also addresses this problem and in many ways in a better way than this does.

@yuyichao
Copy link
Contributor Author

if we ever add generic function merging, it also addresses this problem and in many ways in a better way than this does.

No, that will not solve conflicts with local variables or other non-function objects.

@ararslan
Copy link
Member

No, that will not solve conflicts with local variables or other non-function objects.

For those we have module prefixing.

@yuyichao
Copy link
Contributor Author

For those we have module prefixing.

And that's exactly the problem this solves, a syntax sugar for just that. See the "instead of" part in the very first post.

@timholy
Copy link
Member

timholy commented Dec 14, 2017

Overall I love the symmetry of Julia's dispatch rules. While I understand some of the attractions of this feature, to me the symmetry is not something to give up on lightly.

@yuyichao
Copy link
Contributor Author

the symmetry is not something to give up on lightly.

You don't need to though. This is purely a syntax sugar to change namespace lookup and it doesn't affect dispatch at all. You can still use the same dispatch on this function and they behave exactly the same as normal dispatch.

julia> struct A
       end

julia> struct B
       end

julia> f(::A) = 1
f (generic function with 1 method)

julia> f(::B) = 2
f (generic function with 2 methods)

julia> A()-->f()
1

julia> B()-->f()
2

In fact, adding custom dispatch to the lookup is invalid under the current implementation. As such, it doesn't add any asymmetry or complexity more than allowing the same name to be bound to different functions/objects in different modules.

@vtjnash
Copy link
Member

vtjnash commented Dec 14, 2017

This doesn't really solve the namespace problem; rather it adds a different kind of dispatch that has a different kind of namespace problem

As Yichao just posted, this adds a new namespace/scope and doesn't use dispatch. But whether this "solves" the problem is certainly a fair question, regardless of the nomenclature.

It basically splits the system into two, where you need to constantly wonder which style of function call to use

The advantage here is that it divides it according to a namespacing question, while leaving dispatch unchanged. With this PR, you can always use the existing variable scope to call the same function. I'm not sure if I saw Yichao emphasizing this above, but one of the goals of this PR is to dissuade people from using getproperty dispatch on Val to simulate OO-style syntax (by providing the feature as a uniform syntax-rewrite rule, rather than an ad-hoc overload of getfield-overloading on some types).

@iamed2
Copy link
Contributor

iamed2 commented Dec 14, 2017

a.f(...) to me says the method is a property (semantically, not as a field) of a. However, that isn't how Julia code is organized. Functions belong to modules. While this PR is just providing access to certain functions related to a, the syntax suggests a different form of organization.

@Keno Keno removed the triage This should be discussed on a triage call label Dec 14, 2017
@Keno
Copy link
Member

Keno commented Dec 14, 2017

triage: Not for 1.0

@yuyichao
Copy link
Contributor Author

Functions belong to modules.

And types too.

triage: Not for 1.0

So just to merge the breaking part would be good enough, as I said in the top post.

@JeffBezanson
Copy link
Member

it doesn't affect dispatch at all

It's true that this leaves the current dispatch system alone, but my objection is that it adds a new single dispatch OO system. The module of an object's type is used as the class pointer, which is used for dispatch. It's implemented using something we previously called a namespace lookup, but since the namespace lookup is hidden inside a type-based lookup, in reality it's a method lookup.

Another way to look at it is that currently, which module a function or type is defined in is just a matter of code organization and API organization. But with this feature, it's semantically significant.

Here's an example scenario. User code has a bunch of calls to length(x), but the user decides they want length to be their own private function instead. So they rewrite the calls to x-->length() in an attempt to use the correct length function for all objects instead of the local length. But this is not guaranteed to work: if the module containing typeof(x) happens to have its own private length function it won't work as expected. The only way out is for everybody to program in the "module = class" style all the time. It becomes effectively "unsafe" to define methods for a type in multiple modules.

@yuyichao I know you didn't claim that this works for operators or non-first arguments; my point is that a true "solution" would need to handle everything. As I said, this doesn't solve a problem for julia, but rather introduces a different, non-julia object system that doesn't have the problem. I claim that in the long run, having two object systems is worse than having one with a slight problem (introduced by its more powerful model). I don't want my code to be a random mix of f(x) and x-->f().

@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 15, 2017

But with this feature, it's semantically significant.

No it's not, not more than looking up the function in the wrong namespace won't return the right thing like what we have now. The user always need to know the namespace the function needs to be found and properly use import/qualified name for that. With this, the module the function and type defined in is still insignificant, it is the implementation detail/an organization decision for the types' author how and what function needs to be brought into the scope of the type's namespace.

User code has a bunch of calls to length(x), but the user decides they want length to be their own private function instead. So they rewrite the calls to x-->length() in an attempt to use the correct length function for all objects instead of the local length. But this is not guaranteed to work: if the module containing typeof(x) happens to have its own private length function it won't work as expected.

As any API for the type, it needs to be well defined and documented. It's unclear what scenario you are talking about but here's a few guesses,

  1. If x is always a type defined in this module/package, (as is the primary target for this feature) (edit: i.e. typeof(x) is defined in the same module)

    Everything should be under control and work.

  2. If this is a user of a type x from somewhere else. (edit: i.e. typeof(x) is defined in an upstream module)

    This works if the x is documented to have length method defined. I doubt this is the case you are talking about though since in this case the user can't decide they suddenly want to switch.

  3. If x has to be an abstract type the module define and can be extended by the user (edit: i.e. typeof(x) is defined in a downstream module)

    In this case, if the package "suddenly" decide that that length need to be a private function, it must be documented as such and all users of the code need to follow that. Failure to do that for the user of the code is nothing different from any other API breakage. Making this process easier is exactly what the second point in the semantic section above is talking about. (i.e. a future version should improve on that but it's not needed as much as what the current pr covers)

The only way out is for everybody to program in the "module = class" style all the time.

So I don't see why this is true. The availability of a method call can be relied on as much and as little as the document. Nothing different from what we have in every other cases.

It becomes effectively "unsafe" to define methods for a type in multiple modules.

There's nothing unsafe about it, there's no where in this that says that a function that takes the object as the first argument has to be called with the new syntax. If you don't like it and don't want your functions to be possibly called this way, don't put them in the same module. If you want to use module to further organize the functions but still want to make them callable this way, simply import them into the right module. It is a new feature so it'll obviously affect the best practice to do things but nothing more than that.

Finally, I should point out that every single thing mentioned above about reliability apply directly to getfield overload after replacing --> by . and removing the ()

my point is that a true "solution" would need to handle everything. As I said, this doesn't solve a problem for julia, but rather introduces a different, non-julia object system that doesn't have the problem.

And I claim that I'm introducing a standard way for doing what can be done in a more messy way anyway that will have more problems. It is also as julia as I think it can be. In it's current form, it doesn't introduce a new namespace, it doesn't introduce new dispatch rules and after the lookup dispatch works exactly the same they would work otherwise and that can even be improved if function defined on abstract type can be handled better. It's entirely built on the current dispatch system and can't work without it. It is not even hiding anything about it (everything about dispatch works exactly the same after the lookup, you can even get method error due to mismatch on the first argument). As such I will definitely not call this a completely different non-julia object system.

I claim that in the long run, having two object systems is worse than having one with a slight problem (introduced by its more powerful model). I don't want my code to be a random mix of f(x) and x-->f().

And the point here is not how powerful it is. It is true that the current dispatch system can mimic everything that can be done elsewhere. However, from code organization and other constraints, it is clear that it can sometimes be not as convenient to actually express those. That's why, as mentioned above, this is strictly only a syntax sugar for a name lookup on top of dispatch and is intentionally made not-powerful so that it won't fight or replace dispatch. I would not like this syntax to be sensitive to anything that's not equivalent to a name lookup in the types namespace.

The namespace is AFAICT a fundamental issue. Nothing else mentioned in this thread or anywhere else is even close at resolving that. It is certainly not a big problem in base but I've been running into the issue in packages over and over again.

@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 15, 2017

The module of an object's type is used as the class pointer, which is used for dispatch. It's implemented using something we previously called a namespace lookup, but since the namespace lookup is hidden inside a type-based lookup, in reality it's a method lookup.

introduced by its more powerful model

And yet another way to see this is that since the current system is powerful enough to support what may look like a single dispatch without introducing it. Or this is basically a prove that the less powerful use case is indeed covered and can also be done in real code easily.

If you want to call any "type based lookup" method lookup or dispatch, I guess I can't argue with that since a type sensitive lookup is the core of the problem. But I think that's too broad a definition and other than that, there's a few more evidents (or maybe at least a few different prospective of the same property) that this is not a single dispatch or a new dispatch system by any mean.

  1. (As mentioned above), dispatch works after the lookup.

    As concrete examples, you can get dispatch on the rest of the arguments. You can even get symmetric dispatch including the first arguments, which could be a good API if all types of the function are in the same module (it won't be the target usage of this feature and even if they are all in the same module, it's still only marginally what this should be used for since there isn't a special argument for the function, which is really the primary target for this feature)

  2. This doesn't introduce new ways to extend dispatch.

    There's no new syntax to define new method introduced in the current version. Future new related syntax should only be about accessing the namespace at method definition time and nothing else. Adding names (without any type info) and accessing names (also without any type info) are valid operations on namespaces. These are the only allowed operations currently and must be satisfied for any future extension of this feature. If any of the two operations has type information attached to it, that'll indeed be a dispatch so I'll agree those should not be added.

I'll also highlight again that this does not enable you to do anything that you can't possibly do today, only to make doing something easier. This alone already imply that it's a feature you can use if it makes your code easier to use and ignore if not. Combined with the fact that this is useful for many types (i.e. the positive effect is non-zero) and that even the sugar part will be implementable in a non-standard way (i.e. the negative effect all exists without this so there's no new negative effect), this should not encourage more misuse than what will be there otherwise and it should have a net positive effect overall.

@JeffBezanson
Copy link
Member

The big-picture view of this is that you're advocating for julia code to contain a random mix of method call styles. It makes the language uglier and harder to explain. In fact I don't think there is any crisp explanation for when to use the other style. I don't know how to identify functions that "operate on an object". Every function operates on an object. And since method calls are very nearly the only thing in the language, some vague guidelines won't cut it. The bar for changing something so fundamental is way higher.

I don't understand how finalizer is related. Do you intend to tell people to write x-->finalizer(f)? As @vtjnash said, that should not ever be necessary. If there's a conflict with the name finalizer, you can write Base.finalizer, which is no worse than needing to switch to --> syntax. Either way, the caller has to know to do something to switch the name lookup.

after the lookup dispatch works exactly the same they would work otherwise

"After the lookup" is much more significant than it sounds at first --- this is a double dispatch system, where first one kind of dispatch is used and then another.

I understand the argument that it's possible to do things like this anyway, so we might as well have a standard way to do it. However, it doesn't seem likely to me that lots of packages will start abusing dot overloading (or other syntax) to provide a.b(c)-style APIs. If (1) it's a bit awkward to do and (2) we officially discourage it, it won't happen much. But adding a feature designed to enable this, and deliberately using it in important packages is a different story. The question is not so much what is possible, as what we want the language to look like and how we want it to be used.

@yuyichao
Copy link
Contributor Author

Do you intend to tell people to write x-->finalizer(f)

No, I intend to tell people they can write x-->finalizer(f).

@JeffBezanson
Copy link
Member

I intend to tell people they can write x-->finalizer(f)

How do you know when to write that? The expression basically means "call the finalizer function in the module where x's type is defined", but there are a lot of unknowns there. What module is that, and does it contain an appropriate finalizer function? The worst part is that finalizer can be called on almost any object, so it's possible that x's type's module didn't even consider that you might want to add a finalizer to one of its objects. So the correct thing to write in case of conflicts is just Base.finalizer.

@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 15, 2017

The big-picture view of this is that you're advocating for julia code to contain a random mix of method call styles.

I don't see how random this is. I'm not advocating a style. If anything, I said that the new feature should be used only when it is useful so any case that's ambiguous should stay the same.

How do you know when to write that?

See above. And also as I explained in the concrete case above in #25052 (comment). The primary target of this feature is when you know that this is an API of the type. And as I also said in the same comment, I agree the current implementation does not make propagating this API from an abstract type easy but it can still be done and can be improved in future version. As that is improved, the usecase of this will expand and the way to decide when to write that will also change.

The worst part is that finalizer can be called on almost any object, so it's possible that x's type's module didn't even consider that you might want to add a finalizer to one of its objects.

That's why I mentioned in the top post that the module special case is actually not needed for this.

It makes the language uglier and harder to explain.

I'll not comment on if --> is ugly or not but I don't see how this make explaining anything harder.
Any feature need documentation and that's the minimum amount of explanation required. However, what I think would distinguish a feature that is easy to explain from a feature that isn't is that you don't need to explain it at all, not even in corner cases, unless you need it, and the semantics about it satisfies other features the syntax is allowed to do. This feature satisfy all that AFAICT. It doesn't interfere with any other features. It operates as a completely different step that will only kick in with the special syntax and it can only do what getpropery can do with very similar syntax. As long as we say that this feature should only be used in the code for the case that are well supported as mentioned above, it will increase readability for code where this is used and not affect code that doesn't use this.

In fact I don't think there is any crisp explanation for when to use the other style. I don't know how to identify functions that "operate on an object".

Practically I don't find that to be an issue at all, see above for current limitation and what we can allow/encourage later.

Every function operates on an object.

I disagree, 1 + 2 does not operate just on 1 or 2. It's symmetric and I wouldn't say any of them are more special than the other. OTOH, the [] in size([], 1) is clearly special so it is ok to call it as []-->size(1), which by no mean suggest that all code should be changed to do that.

where first one kind of dispatch

As I said, if any lookup that's automatically based on the type is called dispatch then sure you can call it that way. However, I don't really see what practical significance it has. It's a dispatch that doesn't allow customization. It's evaluating a single expression (a-->b) independent of the surroundings so it's not like it's trying to do two dispatch with a single syntax either.

@JeffBezanson
Copy link
Member

JeffBezanson commented Dec 15, 2017

I'm not advocating a style.

I don't see how you can say that. If this feature gets used, julia code will contain a mix of f(x) and x-->f() calls. So by arguing for this feature you're arguing that's how julia code should look.

The specific --> syntax is not the problem; the mix of method call styles is the problem. Although in some sense --> might be better than . since it's less confusing to those used to class-based OO.

the new feature should be used only when it is useful

Unfortunately language design doesn't work that way. Adding features has a complexity cost that is hard to contain. This is also nearly a tautology: if features are only used in ways that cause no problems, then it's not possible to argue that any feature is bad. We might as well just add every possible feature.

How do you know when to write that?

See above. And also as I explained in the concrete case above in #25052 (comment). The primary target of this feature is when you know that this is an API of the type.

Ok, I can imagine a package documenting that its functions should be called with -->, but finalizer is not like that, so I'm still missing something. What is the answer to "when do I write x-->finalizer(f)"?
There's a lot of text here so maybe I'm not seeing it. If it's there somewhere, could you copy and paste it (or reword it)?

@yuyichao
Copy link
Contributor Author

I don't see how you can say that. If this feature gets used, julia code will contain a mix of f(x) and x-->f() calls. So by arguing for this feature you're arguing that's how julia code should look.

What I mean is that there are some cases that the new syntax is unambiguously better. I'm not saying any case that's not clear enough should switch.

Adding features has a complexity cost that is hard to contain. This is also nearly a tautology: if features are only used in ways that cause no problems, then it's not possible to argue that any feature is bad. We might as well just add every possible feature.

I agree, and that's exactly why I've been repeating about the constraint I want to have on such a feature. I want it to be equivalent to a namespace lookup, not customizable other than that, and be a context independent evaluation of a single operator. Not all possible features has this property.

What is the answer to "when do I write x-->finalizer(f)"?

The safe answer now for any when should I write x-->f(a), as mentioned above in the targetting case, is when you know the type(name) of x (obviously assuming f is an applicable function). It's not finalizer specific but 1) finalizer is such a function that is operating on f and 2) the majority of the use indeed have that the caller know exactly what type(name) x is. And also as mentioned above, the known type restriction on the recommendation can be lifted in future version.

@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 15, 2017

And also to clarify more,

In terms of when I want to have this, I agree this doesn't have to be in 1.0. As long as it (along with the change in style recommendation) can be added without breakage I think this can be polished more before it is merged, which is the original plan anyway. (And the recommendation change is more about the final version with improved support for abstract type, the guide for the current version will mainly affect functions for leaftypes. For functions on abstract it's only a heads up for how to make those more compatible with future changes)

In terms of complexity, there are two aspects that I can think of,

  1. On the syntax level, or how easy it is for the programmer to understand it.

    I believe this is satisfied by giving it a distinct and context independent syntax. a-->f will return the same result in any context, including g(a-->f) and a-->f(b). This makes sure it has zero interference with other syntax structures and can be completely treated as an independent black box.

  2. On the implementation level, or how easy it is for the runtime to infer and support this

    I believe this is satisfied essentially in the same way. Basically anything that's equivalent to a function call after lowering can be supported equally easily by the runtime.

I would in general agree with

Adding features has a complexity cost that is hard to contain

but by making sure it can be treated completely independently by both the frontend and the runtime, the complexity is entirely contained. This is actually a much harder requirement on a (low level) feature and a sufficient one for containing the complexity so it does not contradict with the hardness of adding a feature with contained complexity.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 15, 2017

I would be ok with reserving the --> operator as syntax, although I don't like it for this usage. I do, however, like it better than |> for writing data pipelines, e.g.:

data --> filter(iseven) --> map(_ ÷ 2) --> join(_, other) --> out

or even like the following, if we allow continuing a pipeline when the following line starts with -->:

data
--> filter(iseven)
--> map(_ ÷ 2)
--> join(_, other)
--> out

I know that you may say that this proposal already addresses that use case, but I will preemptively say that nice syntax for data pipelining and whether filter is looked up in the "module of" data are completely unrelated and should not be baked into a single feature. Similarly, I think that if we're going to have a feature that simulates single-dispatch OO (which the jury is clearly very much out on), then it should look similar to how it looks in single-dispatch OO languages, i.e. x.f(y). Of course, if we reserve the --> syntax now, then we can use it for whatever we choose. It would also make me feel better about not reserving |> as syntax.

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Dec 15, 2017

Just gathering some empirical evidence for when popular packages could use this to solve namespacing issues. Let's look at the top however many:

  • IJulia is kind of an outlier where mostly it's just notebook that's called, but that wouldn't make use of this.
  • Gadfly and Plots. These have a namespacing issue because they, like pretty much all plotting libraries, use plot. However, they typically don't dispatch on internal types since what's natural is to plot things like Base arrays. So this doesn't fix the plotting namespace issues.
  • This would work with Mocha.jl's solver method, i..e SGD()-->make_solver_parameters(...), but not with its model creation parts.
  • For DataFrames, it wouldn't be useful on the constructor because those use Base types. Most of the functionality extends Base functions, so this syntax wouldn't be used there. But its special new functions like join(names, jobs, on = :ID) could make use of it (these cases are odd though because it's not just the first argument that's a DataFrame).
  • JuMP's macros have the first argument as the model, but I am not sure that applies. Its solver setup uses a keyword, so this special syntax doesn't apply there.
  • KNet is hard to describe because so much of using it involves user-defined functions, but I cannot think of a place where the dispatch is on the first argument which is a KNet-defined argument. It's train function dispatches on the last argument which could be moved to be compatible with this.
  • I'm not personally familiar with DSGE.jl, but most of its functionality (according to a quick perusal of the docs) is based on high level functions with keyword arguments, so dispatching on the first arg seems not compatible with their current API.
  • DifferentialEquations.jl always dispatches solve(prob,alg) on both the problem and the algorithm type, with the algorithm choosing the package and the problem possibly choosing a subset of the code. It could in theory be swapped (though then that makes optional algorithms a little odd) to use alg-->solve(prob,... effectively
  • Not sure this effects PyCall.jl at all. Or Cxx.jl
  • I think Escher.jl could use it quite a bit.
  • It works perfectly for just about every function in Distributions.jl
  • TensorFlow.jl's placeholder type of stuff dispatch on Base types, so they are ineligible. run doesn't need it since it just overloads a Base function with a package specific type.
  • None of MXNet.jl's chain creation or model creation can use the syntax, but it's fit function can.
  • Optim.jl's optimize function dispatches on the third argument. It could in theory be swapped to support -->.

This analysis highlights a few things. First of all, the things that clearly benefit from this syntax are most data-oriented (Distributions.jl, DataFrames.jl).

However, it doesn't seem to capture most of the namespace pollution. A lot of these libraries have their namespace pollution in terms of their algorithm types. DiffEq has a few hundred algorithms yet always calls solve, JuMP and Optim.jl has tens of algorithms that all call optimize, Mocha.jl and MXNet.jl have plenty of learners that all call fit, GLM.jl has a ton of different links with models that all call fit. But these generic names (solve, fit, optimize) are always dispatching on a package-specific type, so instead of making breaking API changes to namespace a specific version of them, they would all actually be compatible extending the same function which seems like a better solution anyways.

And none of this addresses the issue with plot, which I am not entirely sure is even solvable without just doing Plots.plot etc.

To me, this points out that the biggest problems with namespace pollution is the amount of short names for the types, not the functions on which they are called. Solving this seems like it would require two things:

  1. Defining common scientific verbs as stub functions (solve, fit, optimize) somewhere that everyone can safely extend (and disallow type-piracy).
  2. Have some way to import as for the algorithms. DifferentialEquations.Tsit5() (or actually OrdinaryDiffEq.Tsit5()) is overly verbose, yet it's not well known that you can use const to make that ode.Tsit5() means that not exporting the algorithm types is a heavy syntactic burden. One approach that packages could take here is similar to what SIUnits.jl does by having a submodule that you can use to export all of the short names, but doesn't export them by default. MXNet.jl somewhat does this with its mx.

Altogether, looking at the most starred Julia packages, this new syntax doesn't seem to really address the real issue, and when it does, that issue is better addressed in other ways. Thus if thinking about it in a cost vs benefit sense, I don't think that it gives a major advantage while it would reduce the readability of cases where it exists (or at least, it would make the syntax different).

@vtjnash
Copy link
Member

vtjnash commented Dec 15, 2017

Those not the entire set of use cases this had in mind. I would say the primary goal was to assist with places that need getter/setter-like APIs, where there is a soft distinction between "the API of the type provided by the package" and "functions you can call on it (provided elsewhere)". Some places this might be useful include: HTTP.jl (request-->status), QRFactorization (qr-->Q), FFTW (plan-->execute(data)), GUIs (this is the classic OO example: window-->resize(10, 10), etc.).

This can be beneficial anytime it makes more sense to use the scope from the object to find the method. And is not a big feature since it doesn't alter dispatch, just name-binding-resolution in the scope. (For the DifferentialEquations.jl example, solve(prob, alg) could be used as prob-->solve(alg). Since dispatch is unchanged, this PR doesn't affect it. It would just be an alternate syntax.)

@andyferris
Copy link
Member

I do, however, like it better than |> for writing data pipelines,

I'm getting a bit far from the OP, but please note that I already find piping visually too similar to anonymous functions, making it hard to decode:

data |> x -> map(f, x) |> x -> groupby(by, g, x) |> x -> reduce(h, x)
data --> x -> map(f, x) --> x -> groupby(by, g, x) --> x -> reduce(h, x)
data |> map(f, _) |> groupby(by, g, _) |> reduce(h, _)      # currying, for comparison's sake

The second is even worse, IMO. (Of course, users can solve this with extra carriage returns).

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 15, 2017

@vtjnash, you've really hit on all the examples where this syntax would makes us look ridiculous:

traditional Julia this PR
request.status status(request) request-->status
qr.Q qr[:Q] qr-->Q
plan.execute(data) execute(plan, data) plan-->execute(data)
window.resize(10, 10) resize(window, 10, 10) window-->resize(10, 10)

Telling people coming from OO languages that you write status(request) is a lot less laughable than than telling them to write request-->status. Seriously, if this is what we tell people to write, some huge portion of people will just dismiss Julia as a crank language. Yes, there are other languages that use different syntaxes for this, but they all have the excuse of long preceding the x.f(y) notation being ubiquitous – and many of them have bolted it on after the fact.

@StefanKarpinski
Copy link
Member

too similar to anonymous functions

Definitely—had not thought of that.

@andyferris
Copy link
Member

Sorry, I feel like I've missed something, because I'm confused what this solves that #24960 does not.

If you want method namespacing on object classes like C++, wouldn't you just have getproperty(object, methodname) return a closure over object? And if you want to put some method into a namespace attached to your abstract types, you can just as easily define getproperty(object::AbstractClass, methodname)? (Give us multiple inheritance and we are basically a superset of C++ 😝). The namespacing will follow the usual chain of .s, just like every other language that I know, and will let us either use functions namespaced by module, or class methods which are (effectively) namespaced by their (abstract) type.

Beyond that, this PR seems to add unnecessary additional syntax and complexity (but like I said, I may have missed something important).

@ararslan
Copy link
Member

AFAIK the getproperty PR will (rightfully) be unfriendly to hacky attempts at OOP. Also, as Jeff said in #16307 (comment),

that when we allow more explicit namespace control per object/type, we can write object.f(function)

We are never doing that.

@andyferris
Copy link
Member

AFAICT, Jameson's PR is already quite friendly to hacky attempts at OOP, minus the vtable version of dynamic dispatch. (But maybe I'm wrong... I'll have to try it out more fully to confirm that).

@vtjnash
Copy link
Member

vtjnash commented Dec 15, 2017

data --> x -> map(f, x) --> x -> groupby(by, g, x) --> x -> reduce(h, x)

I think this is either a straw man, or you got misled by the PR description somewhere. This expression is simply a syntax error. The proposed syntax of this PR follows roughly the same syntax patterns as .: expression --> symbol(restargs...).

For a data pipeline, there could be a package providing Lazy(Eager)SequenceProtocol (as known in Swift) designed for this that renders the the above pipeline as (where each step of the pipeline returns a new eager (lazy) wrapper:

eager(data) --> map(f) --> groupby(by, g) --> reduce(h)

@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 15, 2017

I know that you may say that this proposal already addresses that use case

No that's a completely different issue. AFAICT what you have still require importing all those names.

Just gathering some empirical evidence for when popular packages could use this to solve namespacing issues. Let's look at the top however many:

As Jameson said, this is useful for any libraries that use OOP style in other languages, this include basically all GUI toolkit and many other operations that has a "main object" to operate on (yes this is vague but it's always pretty easy to decide that for each cases). There's no way this can solve all the problems in all those packages, and FWIW some of the problems mentioned aren't even related. Many of the packages from the list you give are obviously packages that fit the current julia syntax very well particularly because they don't fit the OOP method well and so they are not implemented nicely in other language. I've certainly opt to do things in other language since the syntax fix the problem so much better. In another word, the list you have is a biased selection that does not include usecases that julia doesn't have good support for.

Honestly, I think that telling people coming from OO languages that you write status(request) is a lot less laughable than than telling them to write request-->status.

Honestly I don't care what people from OO languages like. I only care about being able to do the same thing. I don't care about people not very familiar with the real advantage of a.f() syntax (though it is clear from "The Zen of Python") and insist on the . syntax rather than a different syntax with the same semantics though I also believe it is very easy to inform those people and make them understand what the advantage of the syntax really is and why they can have exactly the same with the julia syntax.

Also, as Jeff said in #16307 (comment),

Bringing that up is not helpful. What's needed is explanation as Jeff gave above and I'm arguing against.

Jameson's PR is already quite friendly to hacky attempts at OOP, minus the vtable version of dynamic dispatch. (But maybe I'm wrong... I'll have to try it out more fully to confirm that).

Correct, it's a PR not designed for it but is powerful enough to allow people to write bad implementations. As mentioned above many times, that is exactly why a more standard way is better.

@andyferris
Copy link
Member

I think this is either a straw man, or you got misled by the PR description somewhere

@vtjnash To clarify, I was merely responding to #25052 (comment) (Stefan's suggestion of --> as a straight-up piping operator rather than the usage from the OP).

I do like the idea of being able to switch between eager and lazy like that :)

@andyferris
Copy link
Member

Correct, it's a PR not designed for it but is powerful enough to allow people to write bad implementations. As mentioned above many times, that is exactly why a more standard way is better.

The "standard" OOP way is that member methods are namespaced by types and classes (or even objects). The "standard" Julia / procedural / functional programming way is to use functions namespaced by modules. This PR seems to be syntax sugar emulating an entirely new hybrid of the two.

@yuyichao
Copy link
Contributor Author

This PR seems to be syntax sugar emulating an entirely new hybrid of the two.

I kind of agree. (Though I wouldn't attribute namespace to julia/functional/etc languages) but as I said above, it simply relate types and functions through the module, which is a concept already exist in julia so I don't see much issue with it.

I should also note that this idea was not proposed by me and I had the same thought but I now think it is a pretty safe fallback (mentioned in the top post) and it maintains the syntax for how functions are defined which is a property that I'd like to maintain.

@JeffBezanson
Copy link
Member

The safe answer now for any when should I write x-->f(a), as mentioned above in the targetting case, is when you know the type(name) of x (obviously assuming f is an applicable function). It's not finalizer specific but 1) finalizer is such a function that is operating on f and 2) the majority of the use indeed have that the caller know exactly what type(name) x is.

I don't get it. When I know the type of x, I should write x-->finalizer(f), and when I don't know the type of x I should write finalizer(x, f)? That can't be right.

I know this isn't specific to finalizer but it would help to fully dispense with that example. Let's keep it as simple and specific as possible: with this feature, will code ever contain literally x-->finalizer(f)? I suspect the answer is just "no", since it's very unlikely a package will have a function called finalizer, but I'd like to be sure.

Another way I look at this is to compare how to address the namespace problem before and after this change.
Currently:

  • Package authors collaborate to extend the same functions.
  • A package might document that you should use e.g. CSV.read(...).
  • In case of unanticipated conflicts, users write Module.f(x).

After:

  • Package authors collaborate to put the most useful distinguishing argument first.
  • A package might document that you should use x-->f().
  • In case of unanticipated conflicts, users write either Module.f(x) or x-->f(). But --> will only work if there is an appropriate argument to dispatch on, and package authors have gone through step (1).

I don't see a clear win here.

@yuyichao
Copy link
Contributor Author

I don't get it. When I know the type of x, I should write x-->finalizer(f), and when I don't know the type of x I should write finalizer(x, f)? That can't be right.

This is only a limitation of the current implementation so that's not really a fundamental issue.
This is also just saying that generic code sometime cannot make some assumptions and therefore cannot write code in some ways. That's nothing new and there's nothing wrong with it.

will code ever contain literally x-->finalizer(f)

Yes.

since it's very unlikely a package will have a function called finalizer, but I'd like to be sure.

Package shouldn't have a function called finalizer. It's not needed in most cases (including the target case) for x-->finalizer to work with the current implementation and won't be needed in any cases with future versions.

Package authors collaborate to put the most useful distinguishing argument first.

This is wrong to start with. As I hope that I have made this very clear, this is for OOP patterns. This means that, even after abstract type support is improved, I would not recommend trying to use this to fix all namespace problems. There are namespace problems that I identified above as not possible to have a similar solution and this is never intended to solve those.

Also, there's no need to collaborate in this case, since it belongs to your own namespace.

But --> will only work if there is an appropriate argument to dispatch on

Correct.

and package authors have gone through step (1).

So no.

The comparison seems to be entirely based on asking this to solve all tangentially related problems. It is not and I don't think it's even possible. I'll repeat that this is purely a syntax sugar to do a simple syntax transformation (i.e. x-->f() to M.f(x) with the proper M) so as far as the user of the type is concerned, it will be a improvement. It will certainly not help at all for cases this doesn't apply but it will be of great help for cases that it does. Trying to force this to be used for wrong cases will certainly not help.

@Keno
Copy link
Member

Keno commented Dec 25, 2020

I'm gonna close this. I think it's fair to say that it won't land in its current form and any such language feature will require a fair amount of design work for which this issue is probably not the right place. It'll still be here as a reference of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.