You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. LICM (Loop-Invariant-Code-Motion), a compiler optimization, causes tuples to cause undefined behavior due to improper
27
+
hoisting. This issue has caused me a large amount of grief and countless hours (estimated to cause me to lose a week
28
+
of development time), and it deserves to be mentioned first. All-in-all I am glad it was caught while developing and
29
+
not, say, for a unsuspecting user. [Link](https://github.com/chapel-lang/chapel/issues/7003).
30
+
2. Code generation phase inaccurately performed comparison directly on references. In the compiler, references are actually
31
+
pointers, and so during comparison it would compare it by pointer to something else. Now if you were to perform this
32
+
comparison, it could lead to code inaccurately taking branches of code it was not meant to. This only caused me to lose
33
+
a day. [Link](https://github.com/chapel-lang/chapel/pull/7065).
34
+
3. Returning a tuple from an overloaded method, which would result in dynamically dispatching the method at runtime, would
35
+
cause the compiler to crash if not captured at the callsite. This is no normal internal error, where you get a decent
36
+
error message and a line number, this resulted in an actual segmentation fault during compilation. Currently, this bug
37
+
is still not patched and a careless user can trigger it by not capturing the return value of `remove`. This caused me
38
+
to lose a weekend of development time. [Link](https://github.com/chapel-lang/chapel/issues/6542).
26
39
27
-
Currently, the `GlobalAtomicObject` is an actual solution to a very big problem in distributed computing.
28
-
Atomic operations on remote memory is a very tricky topic. There are multiple approaches, but as HPC demands
29
-
high performance above all else, the number of valid choices dwindle to next to nothing. While specialized
30
-
hardware may be developed in the future, I sought to develop a software solution that works in the here-and-now.
31
-
However, to understand the actual problem, some background knowledge is required.
40
+
#### GlobalAtomicObject - Spin-Off Project
32
41
33
-
##### Remote Atomic Operations
42
+
A spin-off project that was created from the desire to solve a core problem to creating scalable data structures:
43
+
performing atomic operations on class instances. It has played an important role in experimentation, and it would
44
+
be invaluable as a tool to create more scalable distributed data structures, and is useful even for any arbitrary
45
+
application. This sub-project could not be completed as it would require some Chapel runtime and compiler changes
46
+
to make it work more efficiently, but even now it proves to be a scalable solution, and as such deserves to be
47
+
mentioned here. There will be more improvements made on this, as it will become my next (unfortunately non-funded)
48
+
project, as it will be another first and novel solution.
34
49
35
-
As a PGAS (Partitioned Global Address Space) language, Chapel allows operations on memory to be transparent
36
-
with respect to which node the memory is allocated on. Hence, if you can perform atomic operations on local
37
-
memory, you can make them on remote memory too. There are two ways in which these atomic operations are handled:
50
+
[Discussion](https://github.com/chapel-lang/chapel/issues/6663) - Is currently on hold as the GSoC project is unfinished
51
+
and is over larger priority.
38
52
39
-
1)**Remote Execution Atomic Operations**
40
-
41
-
This is the most naive, but it is performed when nodes lack NICs like Aries which support network atomics
42
-
at a hardware level, and is most commonly used when applications run locally. For example, imagine if the
43
-
user were want to perform a 'wait-free' atomic operation, such as `fetchAdd` on some remote memory location.
44
-
Without a NIC supporting network atomics, it boils down to the below...
45
-
46
-
```chpl
47
-
var _value : atomic int;
48
-
on _value do _value.fetchAdd(1);
49
-
```
50
-
51
-
In this case, while it is technically wait-free as it is technically bounded by network latency, it must spawn
52
-
a remote task on the target node, and causes the current task to block until it returns. This is performed implicitly, but the performance penalty is severe enough to bottleneck any application. Furthermore, spawning a remote task deprives the target node of valuable resources, and as such results in degrading performance.
53
-
54
-
2)**Network Atomic Operations**
55
-
56
-
This requires very specific hardware, such as the Aries NIC, which is Cray proprietary hardware and top-of-the-line.
57
-
This is required for scalable (or even acceptable) performance for ordered data structures. Using the same example
58
-
as before, a `fetchAdd` in this case is 'wait-free' enough to allow scalable performance. Scalable performance can
59
-
only be achieved via an algorithm that is also bounded in terms of 'retry' operations, which rules out certain synchronization
60
-
patterns, such as the lock-free 'CAS Retry loop' where the cost of retrying is too expensive, hence ruling out any lock-free
61
-
algorithms and methodologies.
62
-
63
-
##### Atomic Operations on 'Wide' Pointers
64
-
65
-
As memory can be accessed transparently from the user's perspective, it must be kept track of by the runtime. Hence, to determine
66
-
which node the memory belongs to, the pointer is 'widened' into a 128-bit value which keeps track of both the memory address and
67
-
node id. The next issue is that majority of hardware do not support 128-bit network atomic operations, even the Aries NIC. With
68
-
the second approach (above) ruled out in terms of a lack of hardware support, this only allows the first approach. However, as mentioned
69
-
before, this leads to degrading performance as such is not feasible as an actual solution.
70
-
71
-
One approach to solve the problem using the second approach is to use a technique called 'pointer compression', which takes advantage of
72
-
the fact that operating systems only makes use of the first 48 bits of the virtual address space, allowing the most significant 16 bits
73
-
to store the node id. This approach works very well for clusters with less than 2^16 nodes, but is a short-term solution and not fitting
74
-
for a language that prides itself on portability.
75
-
76
-
My approach aims to solve the problem for any number of nodes. In my solution, I used descriptors to denote objects by id, and a table
77
-
to store said objects (the descrirptor being the index into the table). This way, we may use the second approach by
78
-
performing the atomic operations using 64-bit descriptors. This approach, while scalable, is magnitudes slower than 'pointer compression' but will work for up 2^32 nodes. Furtermore, a more practical solution, involving the Chapel runtime, is planned to
79
-
significantly improve performance.
53
+
[Pull Request](https://github.com/chapel-lang/chapel/pull/6717) - Currently is closed as this requires a lot more work.
80
54
55
+
[Repository](https://github.com/LouisJenkinsCS/Chapel-Atomic-Objects) - It has been moved to its own repository and stripped
0 commit comments