Skip to content
This repository was archived by the owner on Jan 10, 2025. It is now read-only.

elf: avoid copying read only sections #284

Merged
merged 2 commits into from
Apr 1, 2022

Conversation

alessandrod
Copy link

@alessandrod alessandrod commented Mar 15, 2022

Only copy read only sections if there's gaps between the sections. If all sections are contiguous in memory, slice them from the original ELF data.

@alessandrod alessandrod force-pushed the no-ro-copy branch 4 times, most recently from e1be28c to d07f86f Compare March 16, 2022 08:17
Section::Borrowed(range) => (range.start, &elf[range.clone()]),
};

// If offset > 0, the region will start at MM_PROGRAM_START + the offset of
Copy link
Author

@alessandrod alessandrod Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the only slightly controversial change. Given the following ELF layout:

[leading data][ro data][trailing data]
  • Accessing [trailing data] (1 byte past the last readonly byte), always returns EbpfError::InvalidVirtualAddress.
  • When there's gaps between the sections that form [ro data], the data is copied and the gaps, including [leading data] are filled with zeroes.
  • When there's no gaps between the sections that form [ro data], the data is borrowed and accessing [leading data] returns EbpfError::InvalidVirtualAddress. This is so that we don't expose bits of the input ELF that are not part of rodata.

So to sum up the difference: owned sections fill the gap until the beginning of rodata with zeros; when borrowing we can't fill, so we return an error.

I'm now going to add a config switch to toggle borrowed ro sections and my question is: if the switch is on, should I also go ahead and return EbpfError::InvalidVirtualAddress when accessing [leading data] with owned sections too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds consistent and also throws a legit error because the program should not be accessing elf data outside of the text or ro sections.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is now implemented behind Config::optimize_rodata.

@alessandrod alessandrod marked this pull request as ready for review March 16, 2022 08:35
@alessandrod alessandrod force-pushed the no-ro-copy branch 4 times, most recently from 1bbff85 to 50c7bf8 Compare March 17, 2022 11:52
@alessandrod alessandrod force-pushed the no-ro-copy branch 3 times, most recently from 70f1063 to e27789b Compare March 24, 2022 10:35
@@ -1490,8 +1490,10 @@ impl JitCompiler {
X86Instruction::cmp_immediate(OperandSize::S8, RAX, 0, Some(X86IndirectAccess::Offset(MemoryRegion::IS_WRITABLE_OFFSET))).emit(self)?; // region.is_writable == 0
emit_jcc(self, 0x84, TARGET_PC_MEMORY_ACCESS_VIOLATION + target_offset)?;
}
X86Instruction::load_immediate(OperandSize::S64, RCX, (1i64 << ebpf::VIRTUAL_ADDRESS_BITS) - 1).emit(self)?; // RCX = (1 << ebpf::VIRTUAL_ADDRESS_BITS) - 1;
emit_alu(self, OperandSize::S64, 0x21, RCX, R11, 0, None)?; // R11 &= (1 << ebpf::VIRTUAL_ADDRESS_BITS) - 1;
X86Instruction::load(OperandSize::S64, RAX, RCX, X86IndirectAccess::Offset(MemoryRegion::VM_ADDR_OFFSET)).emit(self)?; // RCX = region.vm_addr
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional load operation here could be problematic and needs to be benchmarked to estimate the performance impact.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very unscientific comparison:

  • elf load diff (old is main, new is this PR)
 running 3 tests
-test bench_load_elf                 ... bench:       2,868 ns/iter (+/- 57)
-test bench_load_elf_with_syscall    ... bench:       2,866 ns/iter (+/- 104)
-test bench_load_elf_without_syscall ... bench:       2,869 ns/iter (+/- 26)
+test bench_load_elf                 ... bench:       2,745 ns/iter (+/- 12)
+test bench_load_elf_with_syscall    ... bench:       2,742 ns/iter (+/- 15)
+test bench_load_elf_without_syscall ... bench:       2,763 ns/iter (+/- 124)
  • memory mapping diff
@@ -160,24 +169,24 @@
 test bench_gapped_randomized_access_with_1024_entries  ... bench:           6 ns/iter (+/- 0)
 test bench_mapping_with_1024_entries                   ... bench:           4 ns/iter (+/- 0)
 test bench_prng                                        ... bench:           1 ns/iter (+/- 0)
-test bench_randomized_access_with_0001_entry           ... bench:           6 ns/iter (+/- 0)
+test bench_randomized_access_with_0001_entry           ... bench:           5 ns/iter (+/- 0)
 test bench_randomized_access_with_1024_entries         ... bench:           7 ns/iter (+/- 0)
-test bench_randomized_mapping_access_with_0004_entries ... bench:          20 ns/iter (+/- 0)
+test bench_randomized_mapping_access_with_0004_entries ... bench:          19 ns/iter (+/- 0)
 test bench_randomized_mapping_access_with_0016_entries ... bench:          12 ns/iter (+/- 0)
 test bench_randomized_mapping_access_with_0064_entries ... bench:           8 ns/iter (+/- 0)
 test bench_randomized_mapping_access_with_0256_entries ... bench:           7 ns/iter (+/- 0)
 test bench_randomized_mapping_access_with_1024_entries ... bench:           7 ns/iter (+/- 0)
 test bench_randomized_mapping_with_1024_entries        ... bench:           4 ns/iter (+/- 0)

So pretty much the same

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and I just realized that the mapping benches don't exercise the jit. All the other benches are the same so I'm fairly sure there's no regressions, but I'll write some JIT mapping tests.

Copy link
Author

@alessandrod alessandrod Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some benches and this is the diff:

 name                                                        main ns/iter  no_ro ns/iter  diff ns/iter  diff %  speedup 
 bench_jit_vs_interpreter_address_translation                269,260       274,264               5,004   1.86%   x 0.98 
 bench_jit_vs_interpreter_address_translation_stack_dynamic  216,047       220,829               4,782   2.21%   x 0.98 
 bench_jit_vs_interpreter_address_translation_stack_fixed    251,375       256,889               5,514   2.19%   x 0.98 

There's a 1-2% diff. Keeping in mind that I'm running on my laptop, and that sometimes - probably because of scheduling - I get diff noise up to ~8% for the same code, I think it's fine.

@alessandrod alessandrod force-pushed the no-ro-copy branch 2 times, most recently from f93efc9 to 41eb28c Compare March 25, 2022 04:44
Only copy read only sections if there's gap between them. If sections are
contiguous in memory, slice them from the original ELF data.
@alessandrod alessandrod merged commit d4cfd40 into solana-labs:main Apr 1, 2022
@Lichtso Lichtso mentioned this pull request Jun 5, 2023
18 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants