-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
rename Parallel to Distributed #20486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this isn't strictly multi node though. |
The purpose of all those primitives are to do distributed computing. I think parallel is too broad, and this is good. |
Going forward, I think most of the needs of single-node parallelism will be addressed via multi-threading and GPU computing. Distributed will be largely used for true multi-node or single node distributed (like the testing infrastructure). We can recover the "Parallel" name for a framework that abstracts over all forms of parallelism, which is a distant goal at this point. |
"Single node distributed" is a contradiction. It's about multiple processes, coordination between them and data movement, whether single-node or multi-node. Most of the current usage of this is likely single node for which this would be a misnomer - that may change a bit if our threading capabilities improve, but it'll be at the cost of a smaller portion of people using the multi-process model overall. |
Considering the unit of distribution as physical processors and not OS processes, this can be presented as : Threading leverages multiple processors on a single node "Single node distributed" is not a contradiction if Distributed is presented as distributed across CPU cores, single or multi-node. It is up to how the literature presents it. Let us also take a vote for |
It isn't really physical processors either unless you're very careful about managing affinity, OS-level scheduling, hyperthreading etc which we don't do a whole lot of (we do some thread-level affinity, but not processes as far as I'm aware). The literature does distinguish distributed memory from shared memory parallelism, so I'd be fine with DistributedMemory as a more specific module name. That would also address the part-of-speech problem. |
Just fyi, not related this conversation, we do have some process-CPU affinity capability via https://github.com/JuliaParallel/ClusterManagers.jl#using-localaffinitymanager-for-pinning-local-workers-to-specific-cores, linux only for now. |
I usually see that referred to in the literature as "partitioned global address space," whereas "distributed memory" is referring to conventional message passing or the underlying communication layer between workers that are used to implement a PGAS model. Global consistency and "single view" aren't implied by distributed memory (usually the opposite), as that can be expensive and isn't always needed. |
The API exposed by this module, |
I think it's nearly obvious that one can use a distributed computing library on a single machine as a degenerate special case. One would also expect such a library to support running N processes on each of M machines. The fact that that special case exists doesn't mean the library is misnamed. If you're into the whole brevity thing, we could call it |
In favor of not conflating abstractions and hardware one way or another: Threads and processes are the standard language for the respective abstractions. Those abstractions stand independent of hardware, and the common association of processes with multi-node concurrency and threads with single-node concurrency often fails in practice: Expressing concurrency with processes is and will continue to be common on single node (and even single physical execution pipeline) systems as in e.g. CSP. Expressing concurrency with threads on systems traditionally considered physically distributed can also occur with e.g. PGAS and RDMA. Moreover, the concepts of physical nodes and physically shared/distributed resources are progressively blurring (with, for example, [proliferation and tight integration of daughterboards with DMA], generalized NUMA, and RDMA); those concepts are tied to hardware, subject to change at the pace of hardware evolution, and likely will become only hazier with time (particularly as we march towards heterogeneity and virtualization as is happening now in HPC). On the other hand, threads and processes are largely atemporal models. Basing a system's model's and terminology on the abstractions of threads and processes, and maintaining clear separation between those abstractions and hardware, is a future-proof decision. Best! |
@Sacha0, that's really convincing – but I'm not sure what I'm convinced of... 😬 |
(Edit: This post reflects a misunderstanding of this module's long-term purpose. Please see #20486 (comment).) I would hope the above convinces you to avoid names that risk conflation of abstractions and hardware (e.g. distributed), and favor instead names that eschew that conflation insofar as reasonable in practice (e.g. threads and processes). Apart from that sentiment I lack strong feelings.
|
We can interpret the word "distributed" as meaning distributed across processes, rather than distributed across physical machines and then the name is fine. Bonus that it's often both. |
But then can't you also interpret |
That's a bit like saying
You can say that, but such usage is sufficiently removed from common usage that confusion is inevitable (and particularly conflation of the process-based-concurrency/parallelism abstraction with distribution across multiple nodes). Re. common usage: Googling "distributed" yields two hits related to computer science on the first page: Wikipedia's distributed computing entry ("... a model in which components located on networked computers communicate and coordinate their actions ...") and distributed.net, a web-scale distributed computing system (in the preceding sense). Googling "computer science distributed" yields hits related to "distributed computing", "distributed systems", and "distributed storage", with all hits but one on the first page using "distributed" in the manner of the Wikipedia entry insofar as I see. Googling "distributed parallelism" yields similar results, though also with references to "distributed memory". Re. top "distributed memory" hits, LLNL's Introduction to Parallel Computing page states "Distributed Memory. In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing." The common usage googling suggests meets with my experience in scientific computing. Simultaneously, the existing documentation on this functionality uses the term "multiprocessing". From the first paragraph of those docs: "... Julia provides a multiprocessing environment based on message passing to allow programs to run on multiple processes in separate memory domains at once. ..." (For the same reasons I would argue that e.g. (Edit: To clarify, this was not an argument for the term multiprocessing, but rather points out the inconsistency between this module's present use / documentation and the name distributed. Please see #20486 (comment).)
It's often just the one as well :). Best! |
Yes I agree, "distributed" means among multiple machines, coordinated over a network. Now, does this module implement such a thing? Yes it does. What should it do when the number of machines equals 1? Should that be disallowed? The airship analogy: unlike a dirigible, a car is incapable of flying through the air. But this module is capable of doing distributed computing. What we're talking about here is more like saying "the word 'airplane' is wrong, since airplanes also drive along the ground using their landing gear". |
I concur with the comments arguing that 'distributed' implies moving information around a network. Using the term to refer to multiple threads or processes on a single node is at best counter-intuitive. |
This package does in fact use message passing over a network. If you add the ability to use e.g. unix domain sockets as a communication layer in a distributed computing package, can you no longer call it a distributed computing package? |
Distributed means that there are one or more nodes nodes which is precisely the case here – one is just a degenerate case of distributed computing. Multiprocessing, on the other hand, actually does generally imply a single node – yes, with multiple processes. That implication is actively wrong here since there are potentially many nodes. As Jeff said, this is like insisting on calling a plane "a car" because it can roll around on wheels. |
You folks do understand that this is for real, actual distributed functionality, right? Just like Hadoop or Spark. It does also, as a degenerate case, function as a multiprocessing library on a single node. But you can run frameworks like Hadoop and Spark on a single node. Would the objectors here all describe Spark or Hadoop as "multiprocessing frameworks"? If so, perhaps we should write to them and let them know that they should stop calling their projects "distributed". |
re: "Just like Hadoop or Spark." vs. "Let’s try this out. Starting with julia -p n provides n worker processes on the local machine. " I add the added the emphasis on local, because I think it is important. I spent part of most days working on Hadoop -- generally not the happiest part. On our cluster, the default is distributed, not local when working on e.g. pig or hive. This matters. |
It's perfectly possible to install julia on a cluster and provide a 1-line script that starts julia on every node via this library. Hadoop also requires a non-zero amount of configuration. |
Note also that we're talking about the name of the module, and you can't rename it based on how it's used, or how the defaults are set, or on how a majority of users use it. Presumably we're not going to rename it if the common style of use shifts over time, or if we change what the |
If you look at section 1.8 of the MPI 3.0 standard (and I hope we can all agree that MPI is well established prior art for parallel computing), it makes the point that a standard which is designed for distributed memory computing can be implemented for a shared memory system. The distinction between the distributed memory and shared memory programming models is whether there is a fundamental assumption that different lines of execution (processes/threads/whatever you want to call them) have access to some common memory. This determines which constructs can be implemented efficiently. Where processes actually run is irrelevent. The important distinction is what kind of programming model the user has to work with. Renaming
@StefanKarpinski I'm not sure if this is how you meant it, but this reads rather snide to me. I enjoy a good debate as much as the next person, but lets not go overboard. |
@JeffBezanson re "Hadoop also requires a non-zero amount of configuration." That's an understatement. The defaults on Hadoop are often poor as well. If anything, Spark is worse. Julia can and should do much better. The name of the module is an indicator to users of its intended use. This code may or may not be used on a cluster, and it seems reasonable to expect further development of nodes with 64+ cpus that can handle large jobs without the hassle of cluster management. @JaredCrean2, I cannot agree that "Where processes actually run is irrelevent." Good luck making this argument to programmers working on satellites or interplanetary missions. Distance matters, and physics is a real thing. |
Agreed, but for that you'd probably want multithreading, and not this package. Such hardware shifts are part of why I'd rather name this package based on what it inherently does, and not how it will be used. And what it does is implement a distributed memory, message-based programming model. @tkelman suggested
I believe the point is that the purpose of a package like this is to provide a particular API, and the name mostly applies to the API and not to where your processes run. |
Let's take a look how wikipedia defines "multiprocessing":
It seems that calling what we have "multiprocessing" would be actively misleading since it directly contradicts the definition of the term. Well, maybe it's not actually used that way in practice. Let's take a look at some top Google hits for the word, e.g. Python's multiprocessing library:
Ok, that sounds promising – maybe they support real distributed computing like this module does! Oh wait, no, it's a client-server model where all jobs are run on a single node. But jobs can be submitted by other nodes. Not really the same thing. It's useful, but it's not a distributed computing framework. Let's look at some others, like Node: no sign of any ability to run on multiple machines. How about Lisp? Surely if the term "multiprocessing" encompasses distributed computing, then Lisp implementation will have this feature! Nope: there is no mention of anything distributed or of multiple machines there either. How about some Google search stats:
People – wikipedia, other programming languages, the internet – do not use the term "multiprocessing" to refer to systems designed to support distributed computing. They do, however, use the term "distributed" for systems that support many machines interacting with features like futures, promises and remote references. I find it hard to understand why we should use a term that is actively misleading instead of one that accurately describes what this code supports. |
35e23e8
to
bb6c0cb
Compare
bb6c0cb
to
f448222
Compare
Merging after a green CI. |
Those results merely suggest to me that Distributed is an overly broad term.
How so? I don't see that implication at all. |
This change is internal and does not need a mention in NEWS just yet. The symbols are re-exported from Base. |
CI has passed. I'll let the BDFLs take a call on the name and request them to merge (or not). |
People have been using unexported symbols via |
These are the ones - julia/base/parallel/Parallel.jl Lines 55 to 60 in 379f18e
Still available from Base. |
Even the renaming of module Lines 1344 to 1378 in 379f18e
At this time the name change is more relevant to contributors, with a zero impact on users. |
Wasn't complete, you missed |
Well those should be fixed independently and NEWS.md should not carry non-user facing changes at this time. |
|
This was the crux of the matter on my end: I fundamentally misunderstood the long-term vision for this module (as opposed to its present state, use, and documentation). Rereading all concurrency / parallelism issues and mailing list threads remedied that misunderstanding. To avoid future confusion,
A similarly descriptive alternative that does not imply data parallelism might be
💯 That the programming model / abstraction rather than realization in hardware is the important part is the sentiment I hoped to convey with #20486 (comment). These statements convey that sentiment better. Best! |
|
DistributedProcessing would be a better name for this. You don't type module names in full all that often, and we have tab completion. If this were a package (which it should be made anyway before 1.0), Distributed would be too general according to the naming guidelines - "Err on the side of clarity, even if clarity seems long-winded to you."
So much for that then? |
My $0.02 FWIW is that threads offer shared memory parallelism, and anything that crosses a process boundary is distributed memory parallelism. Whether the processes are local to a node or remote should only be relevant to the communication layer. Programmatically, a distributed memory application should not concern itself with the locality of the participating processes (excluding outlier situations like embedded platforms and such). Having said that, I like @Sacha0's suggestion best: |
|
This PR renames module Parallel to Distributed as discussed here - #20428 (comment)
The thinking is to differentiate multi-node distributed computation from other types of parallelism - threads, tasks for IO, GPU, etc.
Will keep this open for a few days.
Note that the manual needs to be updated too to better differentiate between the different types of parallelism. We have an open issue for that (#19579) and is not in the scope of this PR.