-
Notifications
You must be signed in to change notification settings - Fork 189
Conversation
e1be28c
to
d07f86f
Compare
Section::Borrowed(range) => (range.start, &elf[range.clone()]), | ||
}; | ||
|
||
// If offset > 0, the region will start at MM_PROGRAM_START + the offset of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the only slightly controversial change. Given the following ELF layout:
[leading data][ro data][trailing data]
- Accessing
[trailing data]
(1 byte past the last readonly byte), always returnsEbpfError::InvalidVirtualAddress
. - When there's gaps between the sections that form
[ro data]
, the data is copied and the gaps, including[leading data]
are filled with zeroes. - When there's no gaps between the sections that form
[ro data]
, the data is borrowed and accessing[leading data]
returnsEbpfError::InvalidVirtualAddress
. This is so that we don't expose bits of the input ELF that are not part of rodata.
So to sum up the difference: owned sections fill the gap until the beginning of rodata with zeros; when borrowing we can't fill, so we return an error.
I'm now going to add a config switch to toggle borrowed ro sections and my question is: if the switch is on, should I also go ahead and return EbpfError::InvalidVirtualAddress
when accessing [leading data]
with owned sections too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds consistent and also throws a legit error because the program should not be accessing elf data outside of the text or ro sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this is now implemented behind Config::optimize_rodata
.
1bbff85
to
50c7bf8
Compare
70f1063
to
e27789b
Compare
@@ -1490,8 +1490,10 @@ impl JitCompiler { | |||
X86Instruction::cmp_immediate(OperandSize::S8, RAX, 0, Some(X86IndirectAccess::Offset(MemoryRegion::IS_WRITABLE_OFFSET))).emit(self)?; // region.is_writable == 0 | |||
emit_jcc(self, 0x84, TARGET_PC_MEMORY_ACCESS_VIOLATION + target_offset)?; | |||
} | |||
X86Instruction::load_immediate(OperandSize::S64, RCX, (1i64 << ebpf::VIRTUAL_ADDRESS_BITS) - 1).emit(self)?; // RCX = (1 << ebpf::VIRTUAL_ADDRESS_BITS) - 1; | |||
emit_alu(self, OperandSize::S64, 0x21, RCX, R11, 0, None)?; // R11 &= (1 << ebpf::VIRTUAL_ADDRESS_BITS) - 1; | |||
X86Instruction::load(OperandSize::S64, RAX, RCX, X86IndirectAccess::Offset(MemoryRegion::VM_ADDR_OFFSET)).emit(self)?; // RCX = region.vm_addr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional load operation here could be problematic and needs to be benchmarked to estimate the performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very unscientific comparison:
- elf load diff (old is main, new is this PR)
running 3 tests
-test bench_load_elf ... bench: 2,868 ns/iter (+/- 57)
-test bench_load_elf_with_syscall ... bench: 2,866 ns/iter (+/- 104)
-test bench_load_elf_without_syscall ... bench: 2,869 ns/iter (+/- 26)
+test bench_load_elf ... bench: 2,745 ns/iter (+/- 12)
+test bench_load_elf_with_syscall ... bench: 2,742 ns/iter (+/- 15)
+test bench_load_elf_without_syscall ... bench: 2,763 ns/iter (+/- 124)
- memory mapping diff
@@ -160,24 +169,24 @@
test bench_gapped_randomized_access_with_1024_entries ... bench: 6 ns/iter (+/- 0)
test bench_mapping_with_1024_entries ... bench: 4 ns/iter (+/- 0)
test bench_prng ... bench: 1 ns/iter (+/- 0)
-test bench_randomized_access_with_0001_entry ... bench: 6 ns/iter (+/- 0)
+test bench_randomized_access_with_0001_entry ... bench: 5 ns/iter (+/- 0)
test bench_randomized_access_with_1024_entries ... bench: 7 ns/iter (+/- 0)
-test bench_randomized_mapping_access_with_0004_entries ... bench: 20 ns/iter (+/- 0)
+test bench_randomized_mapping_access_with_0004_entries ... bench: 19 ns/iter (+/- 0)
test bench_randomized_mapping_access_with_0016_entries ... bench: 12 ns/iter (+/- 0)
test bench_randomized_mapping_access_with_0064_entries ... bench: 8 ns/iter (+/- 0)
test bench_randomized_mapping_access_with_0256_entries ... bench: 7 ns/iter (+/- 0)
test bench_randomized_mapping_access_with_1024_entries ... bench: 7 ns/iter (+/- 0)
test bench_randomized_mapping_with_1024_entries ... bench: 4 ns/iter (+/- 0)
So pretty much the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and I just realized that the mapping benches don't exercise the jit. All the other benches are the same so I'm fairly sure there's no regressions, but I'll write some JIT mapping tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some benches and this is the diff:
name main ns/iter no_ro ns/iter diff ns/iter diff % speedup
bench_jit_vs_interpreter_address_translation 269,260 274,264 5,004 1.86% x 0.98
bench_jit_vs_interpreter_address_translation_stack_dynamic 216,047 220,829 4,782 2.21% x 0.98
bench_jit_vs_interpreter_address_translation_stack_fixed 251,375 256,889 5,514 2.19% x 0.98
There's a 1-2% diff. Keeping in mind that I'm running on my laptop, and that sometimes - probably because of scheduling - I get diff noise up to ~8% for the same code, I think it's fine.
f93efc9
to
41eb28c
Compare
Only copy read only sections if there's gap between them. If sections are contiguous in memory, slice them from the original ELF data.
Only copy read only sections if there's gaps between the sections. If all sections are contiguous in memory, slice them from the original ELF data.