Skip to content

Commit 83aa6c2

Browse files
Wrapping up
1 parent 7a497ba commit 83aa6c2

35 files changed

+33
-2837
lines changed

DequeOrdering

-65.8 KB
Binary file not shown.

DequeOrdering_real

-5 MB
Binary file not shown.

DistributedBag

-65.8 KB
Binary file not shown.

DistributedBag_real

-5.32 MB
Binary file not shown.

README.md

Lines changed: 33 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -14,70 +14,46 @@ cluster), and a way to learn more exciting and useful knowledge. As well, I woul
1414
mentors, [**@e-kayrakli**](https://github.com/e-kayrakli) and [**@mppf**](https://github.com/mppf), who I
1515
have had the honor to server under. Finally, I would like to thank the Chapel project itself.
1616

17-
### Pull Requests & Discussions
17+
### Issues, Pull Requests & Discussions
1818

19-
Below I will list all Pull Requests. Not all are guaranteed to be merged at the time of this posting.
19+
For documentation purposes, I will list the most important information that should be taken into account for GSoC
20+
final evaluations.
2021

21-
#### GlobalAtomicObject
22+
#### Issues Honorable Mentions
2223

23-
[Discussion](https://github.com/chapel-lang/chapel/issues/6663) **(On Hold)**
24+
While I have had many issues with Chapel's compiler so far, I will only list the ones that are relatively significant.
2425

25-
[Pull Request](https://github.com/chapel-lang/chapel/pull/6717) **(Closed)**
26+
1. LICM (Loop-Invariant-Code-Motion), a compiler optimization, causes tuples to cause undefined behavior due to improper
27+
hoisting. This issue has caused me a large amount of grief and countless hours (estimated to cause me to lose a week
28+
of development time), and it deserves to be mentioned first. All-in-all I am glad it was caught while developing and
29+
not, say, for a unsuspecting user. [Link](https://github.com/chapel-lang/chapel/issues/7003).
30+
2. Code generation phase inaccurately performed comparison directly on references. In the compiler, references are actually
31+
pointers, and so during comparison it would compare it by pointer to something else. Now if you were to perform this
32+
comparison, it could lead to code inaccurately taking branches of code it was not meant to. This only caused me to lose
33+
a day. [Link](https://github.com/chapel-lang/chapel/pull/7065).
34+
3. Returning a tuple from an overloaded method, which would result in dynamically dispatching the method at runtime, would
35+
cause the compiler to crash if not captured at the callsite. This is no normal internal error, where you get a decent
36+
error message and a line number, this resulted in an actual segmentation fault during compilation. Currently, this bug
37+
is still not patched and a careless user can trigger it by not capturing the return value of `remove`. This caused me
38+
to lose a weekend of development time. [Link](https://github.com/chapel-lang/chapel/issues/6542).
2639

27-
Currently, the `GlobalAtomicObject` is an actual solution to a very big problem in distributed computing.
28-
Atomic operations on remote memory is a very tricky topic. There are multiple approaches, but as HPC demands
29-
high performance above all else, the number of valid choices dwindle to next to nothing. While specialized
30-
hardware may be developed in the future, I sought to develop a software solution that works in the here-and-now.
31-
However, to understand the actual problem, some background knowledge is required.
40+
#### GlobalAtomicObject - Spin-Off Project
3241

33-
##### Remote Atomic Operations
42+
A spin-off project that was created from the desire to solve a core problem to creating scalable data structures:
43+
performing atomic operations on class instances. It has played an important role in experimentation, and it would
44+
be invaluable as a tool to create more scalable distributed data structures, and is useful even for any arbitrary
45+
application. This sub-project could not be completed as it would require some Chapel runtime and compiler changes
46+
to make it work more efficiently, but even now it proves to be a scalable solution, and as such deserves to be
47+
mentioned here. There will be more improvements made on this, as it will become my next (unfortunately non-funded)
48+
project, as it will be another first and novel solution.
3449

35-
As a PGAS (Partitioned Global Address Space) language, Chapel allows operations on memory to be transparent
36-
with respect to which node the memory is allocated on. Hence, if you can perform atomic operations on local
37-
memory, you can make them on remote memory too. There are two ways in which these atomic operations are handled:
50+
[Discussion](https://github.com/chapel-lang/chapel/issues/6663) - Is currently on hold as the GSoC project is unfinished
51+
and is over larger priority.
3852

39-
1) **Remote Execution Atomic Operations**
40-
41-
This is the most naive, but it is performed when nodes lack NICs like Aries which support network atomics
42-
at a hardware level, and is most commonly used when applications run locally. For example, imagine if the
43-
user were want to perform a 'wait-free' atomic operation, such as `fetchAdd` on some remote memory location.
44-
Without a NIC supporting network atomics, it boils down to the below...
45-
46-
```chpl
47-
var _value : atomic int;
48-
on _value do _value.fetchAdd(1);
49-
```
50-
51-
In this case, while it is technically wait-free as it is technically bounded by network latency, it must spawn
52-
a remote task on the target node, and causes the current task to block until it returns. This is performed implicitly, but the performance penalty is severe enough to bottleneck any application. Furthermore, spawning a remote task deprives the target node of valuable resources, and as such results in degrading performance.
53-
54-
2) **Network Atomic Operations**
55-
56-
This requires very specific hardware, such as the Aries NIC, which is Cray proprietary hardware and top-of-the-line.
57-
This is required for scalable (or even acceptable) performance for ordered data structures. Using the same example
58-
as before, a `fetchAdd` in this case is 'wait-free' enough to allow scalable performance. Scalable performance can
59-
only be achieved via an algorithm that is also bounded in terms of 'retry' operations, which rules out certain synchronization
60-
patterns, such as the lock-free 'CAS Retry loop' where the cost of retrying is too expensive, hence ruling out any lock-free
61-
algorithms and methodologies.
62-
63-
##### Atomic Operations on 'Wide' Pointers
64-
65-
As memory can be accessed transparently from the user's perspective, it must be kept track of by the runtime. Hence, to determine
66-
which node the memory belongs to, the pointer is 'widened' into a 128-bit value which keeps track of both the memory address and
67-
node id. The next issue is that majority of hardware do not support 128-bit network atomic operations, even the Aries NIC. With
68-
the second approach (above) ruled out in terms of a lack of hardware support, this only allows the first approach. However, as mentioned
69-
before, this leads to degrading performance as such is not feasible as an actual solution.
70-
71-
One approach to solve the problem using the second approach is to use a technique called 'pointer compression', which takes advantage of
72-
the fact that operating systems only makes use of the first 48 bits of the virtual address space, allowing the most significant 16 bits
73-
to store the node id. This approach works very well for clusters with less than 2^16 nodes, but is a short-term solution and not fitting
74-
for a language that prides itself on portability.
75-
76-
My approach aims to solve the problem for any number of nodes. In my solution, I used descriptors to denote objects by id, and a table
77-
to store said objects (the descrirptor being the index into the table). This way, we may use the second approach by
78-
performing the atomic operations using 64-bit descriptors. This approach, while scalable, is magnitudes slower than 'pointer compression' but will work for up 2^32 nodes. Furtermore, a more practical solution, involving the Chapel runtime, is planned to
79-
significantly improve performance.
53+
[Pull Request](https://github.com/chapel-lang/chapel/pull/6717) - Currently is closed as this requires a lot more work.
8054

55+
[Repository](https://github.com/LouisJenkinsCS/Chapel-Atomic-Objects) - It has been moved to its own repository and stripped
56+
from this project.
8157

8258
#### Collections Module
8359

@@ -128,7 +104,7 @@ SynchronizedList | 1x
128104
DistributedDeque | 63x
129105
DistributedBag | 403x
130106

131-
![](Results/Collections_Add.png)
107+
![](results/Collections_Add.png)
132108

133109
#### Remove
134110

@@ -138,4 +114,4 @@ SynchronizedList | 1x
138114
DistributedDeque | 123x
139115
DistributedBag | 651x
140116

141-
![](Results/Collections_Remove.png)
117+
![](results/Collections_Remove.png)

benchmark/graph500/bin/graph500

-101 KB
Binary file not shown.
-7.85 MB
Binary file not shown.

benchmark/uts/.chpl-expect-29585

Lines changed: 0 additions & 51 deletions
This file was deleted.

benchmark/uts/COMPOPTS

Lines changed: 0 additions & 1 deletion
This file was deleted.

benchmark/uts/Makefile

Lines changed: 0 additions & 18 deletions
This file was deleted.

0 commit comments

Comments
 (0)