Skip to content

Conversation

@parth-07
Copy link
Contributor

This commit changes how symbol expression nodes gets evaluated for the absolute symbols. The key goal is to improve the layout convergence when forward references is involved. This design change is a step in the direction to improve the layout convergence.

When the linker script contains forward references, then we may need to re-evaluate the linker script and the layout multiple times to reach convergence. For example:

SECTIONS {
  u = bar_end; // A1
  .foo (u) : { *(.text.foo) }
  v = w; // A2
  .bar (v) : { *(.text.bar) }
  bar_end = .; // A3
  w = 0x2000; // A4
}

The above linker script contains 4 assignments: A1, A2, A3, and A4, and it needs more than one pass of layout evaluation for reaching convergence. In the 1st pass, the assignment A1 is evaluted incorrectly because bar_end has not been assigned a value yet.

We also need to stop the layout pass in between and recompute the layout from the beginning whenever some initial assumption changes such as a new segment needs to be inserted.

There is an important difference between these two types of layout recomputation. In the first type, that is, layout recomputation to achieve layout convergence, we must use the symbol values from the previous pass.

In the second type, that is, layout recomputation due to an initial assumption change such as a new segment needs to be inserted, we must reset the symbol values before recomputing the layout. Let's see why:

SECTIONS {
  u = v; // A1
  .foo : { *(.text.foo) }
  v = 0x1000; // A2
  .data : { *(.text.data) }}
  v = 0x2000; // A3
}

In the first evaluation of A1, u is assigned the value 0. On encountering '.data' output section, the layout is recomputed, and this time when A1 is recomputed, u is assigned the value 0x1000 because A2 was evaluated in the last pass. The value 0x1000 is incorrect for u because the value of v that is to be used in A1 must come from the last evaluation of v, that is, A3. Reusing values from the last pass in this case is error-prone and can lead to incorrect layouts.

It is difficult to add logic for when to reuse symbol values and when to reset them because these two types of layout recomputations can be intermixed.

Adding a source assignment node with the symbol expression node makes the recomputation simpler by obviating the need to reset the symbol values in both the cases. The key idea is that the value of a symbol node of an absolute symbol is not the value of the corresponding symbol, but instead is the result of the last assignment node for that symbol.

Adding the source assignment with a symbol node has additional benefits as well:

  1. It makes it easier to determine the culprit / closest assignment to
    use in diagnostics when a symbol value is not converging.

  2. It makes it easier to add heuristics such as constant expression
    evaluation to speed up the layout convergence. For example:

u = v;
.foo (u) : { *(.text.foo) }
v = 0x2000;

In this case, if the symbol v encodes the source assignment, then we
can easily add heuristic to determine if v source assignment can be
evaluated early.

  1. Selectively recompute only those assignment nodes which needs to be recomputed.
    If we reset symbol values, then all the assignment nodes always needs
    to be recomputed in each layout pass.

Resolves #468

This commit changes how symbol expression nodes gets evaluated for
the absolute symbols. The key goal is to improve the layout convergence
when forward references is involved. This design change is a step in the
direction to improve the layout convergence.

When the linker script contains forward references, then we may need to
re-evaluate the linker script and the layout multiple times to reach
convergence. For example:

```
SECTIONS {
  u = bar_end; // A1
  .foo (u) : { *(.text.foo) }
  v = w; // A2
  .bar (v) : { *(.text.bar) }
  bar_end = .; // A3
  w = 0x2000; // A4
}
```

The above linker script contains 4 assignments: A1, A2, A3, and A4, and
it needs more than one pass of layout evaluation for reaching
convergence. In the 1st pass, the assignment A1 is evaluted incorrectly
because `bar_end` has not been assigned a value yet.

We also need to stop the layout pass in between and recompute the layout
from the beginning whenever some initial assumption changes such as a
new segment needs to be inserted.

There is an important difference between these two types of layout
recomputation. In the first type, that is, layout recomputation to
achieve layout convergence, we must use the symbol values from the
previous pass.

In the second type, that is, layout recomputation due to
an initial assumption change such as a new segment needs to be inserted,
we must reset the symbol values before recomputing the layout. Let's see
why:

```
SECTIONS {
  u = v; // A1
  .foo : { *(.text.foo) }
  v = 0x1000; // A2
  .data : { *(.text.data) }}
  v = 0x2000; // A3
}
```

In the first evaluation of A1, u is assigned the value `0`.
On encountering '.data' output section, the layout is recomputed, and
this time when A1 is recomputed, u is assigned the value `0x1000`
because A2 was evaluated in the last pass. The value `0x1000` is
incorrect for `u` because the value of `v` that is to be used in A1 must
come from the last evaluation of `v`, that is, A3. Reusing values from
the last pass in this case is error-prone and can lead to incorrect
layouts.

**It is difficult to add logic for when to reuse symbol values and when to
reset them because these two types of layout recomputations can be intermixed.**

Adding a source assignment node with the symbol expression node makes
the recomputation simpler by obviating the need to reset the symbol
values in both the cases. *The key idea is that the value of a
symbol node of an absolute symbol is not the value of the corresponding
symbol, but instead is the result of the last assignment node for that
symbol.*

Adding the source assignment with a symbol node has additional benefits
as well:

1) It makes it easier to determine the culprit / closest assignment to
   use in diagnostics when a symbol value is not converging.

2) It makes it easier to add heuristics such as constant expression
   evaluation to speed up the layout convergence. For example:

```
u = v;
.foo (u) : { *(.text.foo) }
v = 0x2000;
```

  In this case, if the symbol v encodes the source assignment, then we
  can easily add heuristic to determine if `v` source assignment can be
  evaluated early.

3) Selectively recompute only those assignment nodes which needs to be recomputed.
   If we reset symbol values, then all the assignment nodes always needs
   to be recomputed in each layout pass.

Resolves qualcomm#468

Signed-off-by: Parth Arora <partaror@qti.qualcomm.com>
Copy link
Contributor

@quic-seaswara quic-seaswara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not follow which case does this patch handle ?

if (!ThisSymbol)
ThisSymbol = ThisModule.getNamePool().findSymbol(Name);

if (!SourceAssignment && ThisSymbol->resolveInfo()->isAbsolute()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be isScriptDefined() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isScriptDefined would work as well. Is there any case where an absolute symbol may not be script defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use assembly using .set

return false;
}

// FIXME: Adding more symbols this late can cause layout issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic symbols are defined before layout, why would this cause issue ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are defining these symbols after the layout is performed, right? The layout is performed above this, at line 3079, in the relax function call.

}

auto &Backend = CurModule.getBackend();
Backend.updateLatestAssignment(Name, this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wont this cause issues if the symbol assignment is PROVIDE ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this does not change how PROVIDE assignments are evaluated. Why do you think it would cause issues with PROVIDE?


void updateLatestAssignment(llvm::StringRef SymName, const Assignment *A) {
SymbolNameToLatestAssignment[SymName] = A;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why backend ? can we store this in LinkerScript ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used during the layout computation. Currently, the entire logic for layout computation is in GNULDBackend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest assignment would then become PROVIDE which will not be provided because of the preceding assignment ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assignment::assign for a PROVIDE assignment is only called if the assignment is indeed provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect linker script symbol value due to layout reiteration

2 participants