Avoid creating illegal byref pointers

BruceForstall · BruceForstall · commit e08df99fa993 · 2018-04-11T20:34:36.000-07:00
Byref pointers need to point within their "host" object -- thus
the alternate name "interior pointers". If the JIT creates and
reports a pointer as a "byref", but it points outside the host
object, and a GC occurs that moves the host object, the byref
pointer will not be updated. If a subsequent calculation puts
the byref "back" into the host object, it will actually be pointing
to garbage, since the host object has moved.

This occurred on ARM with array index calculations, in particular
because ARM doesn't have a single-instruction "base + scale*index + offset"
addressing mode. Thus, we were generating, for the jaggedarr_cs_do
test case, `ProcessJagged3DArray()` function:
```
// r0 = array object, r6 = computed index offset. We mark r4 as a byref.
add r4, r0, r6

// r4 - 32 is the offset of the object we care about. Then we load the array element.
// In this case, the loaded element is a gcref, so r4 becomes a gcref.
ldr r4, [r4-32]
```
We get this math because the user code uses `a[i - 10]`, which is
essentially `a + (i - 10) * 4 + 8` for element size 4. This is optimized
to `a + i * 4 - 32`. In the above code, `r6` is `i * 4`. In this case,
after the first instruction, `r4` can point beyond the array.
If a GC happens, `r4` isn't updated, and the second instruction loads garbage.

There are two fixes:
1. Change array morphing in `fgMorphArrayIndex()` to rearrange the array index
IR node creation to only create a byref pointer that is precise, and no "intermediate"
byref pointers that don't represent the actual array element address being
computed.
2. Change `fgMoveOpsLeft()` to prevent the left-weighted reassociation optimization
`[byref]+ (ref, [int]+ (int, int)) =&gt; [byref]+ ([byref]+ (ref, int), int)`. This
optimization creates "incorrect" byrefs that don't necessarily point within
the host object.

These fixes are all-platform.

Fixes #17517.

There are many, many diffs. They, perhaps surprisingly, overwhelmingly positive.

For AMD64 SuperPMI, the diffs are a 7.6% size win for 5194 functions! This
appears to be due to less code cloning, and sometimes better optimization.

For ARM32 ngen-based desktop asm diffs, it is a 0.30% improvement across all
framework assemblies. A lot of the diffs seem to be because we CSE the entire
array address offset expression, not just the index expression.
diff --git a/src/jit/morph.cpp b/src/jit/morph.cpp
@@ -5810,6 +5810,21 @@ void Compiler::fgMoveOpsLeft(GenTree* tree)
             break;
         }
 
+        // Don't split up a byref calculation and create a new byref. E.g.,
+        // [byref]+ (ref, [int]+ (int, int)) => [byref]+ ([byref]+ (ref, int), int).
+        // Doing this transformation could create a situation where the first
+        // addition (that is, [byref]+ (ref, int) ) creates a byref pointer that
+        // no longer points within the ref object. If a GC happens, the byref won't
+        // get updated. This can happen, for instance, if one of the int components
+        // is negative. It also requires the address generation be in a fully-interruptible
+        // code region.
+        //
+        if (varTypeIsGC(op1->TypeGet()) && op2->TypeGet() == TYP_I_IMPL)
+        {
+            noway_assert(varTypeIsGC(tree->TypeGet()) && (oper == GT_ADD));
+            break;
+        }
+
         /* Change "(x op (y op z))" to "(x op y) op z" */
         /* ie.    "(op1 op (ad1 op ad2))" to "(op1 op ad1) op ad2" */
 
@@ -6186,7 +6201,7 @@ GenTree* Compiler::fgMorphArrayIndex(GenTree* tree)
 
     // Create the "addr" which is "*(arrRef + ((index * elemSize) + elemOffs))"
 
-    GenTree* addr;
+    GenTree* scaledIndex;
 
 #ifdef _TARGET_64BIT_
     // Widen 'index' on 64-bit targets
@@ -6217,22 +6232,31 @@ GenTree* Compiler::fgMorphArrayIndex(GenTree* tree)
         size->gtFlags |= GTF_DONT_CSE;
 
         /* Multiply by the array element size */
-        addr = gtNewOperNode(GT_MUL, TYP_I_IMPL, index, size);
+        scaledIndex = gtNewOperNode(GT_MUL, TYP_I_IMPL, index, size);
     }
     else
     {
-        addr = index;
+        scaledIndex = index;
     }
 
-    /* Add the object ref to the element's offset */
+    // Be careful to only create the byref pointer when the full index expression is added to the array reference.
+    // We don't want to create a partial byref address expression that doesn't include the full index offset:
+    // a byref must point within the containing object. It is dangerous (especially when optimizations come into
+    // play) to create a "partial" byref that doesn't point exactly to the correct object; there is risk that
+    // the partial byref will not point within the object, and thus not get updated correctly during a GC.
+    // This is mostly a risk in fully-interruptible code regions.
 
-    addr = gtNewOperNode(GT_ADD, TYP_BYREF, arrRef, addr);
+    GenTree* addr;
 
     /* Add the first element's offset */
 
     GenTree* cns = gtNewIconNode(elemOffs, TYP_I_IMPL);
 
-    addr = gtNewOperNode(GT_ADD, TYP_BYREF, addr, cns);
+    addr = gtNewOperNode(GT_ADD, TYP_I_IMPL, scaledIndex, cns);
+
+    /* Add the object ref to the element's offset */
+
+    addr = gtNewOperNode(GT_ADD, TYP_BYREF, arrRef, addr);
 
 #if SMALL_TREE_NODES
     assert((tree->gtDebugFlags & GTF_DEBUG_NODE_LARGE) || GenTree::s_gtNodeSizes[GT_IND] == TREE_NODE_SZ_SMALL);