Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform a checked add in the ptrace dumper ModuleMemory impl. #129

Merged
merged 2 commits into from
Sep 3, 2024

Conversation

afranchuk
Copy link
Contributor

Without this, integer overflow will cause a panic when we should gracefully fail. It's very possible the start address or offset could be incorrect/corrupted.

@afranchuk
Copy link
Contributor Author

I'm augmenting this with a bunch of other arithmetic checks throughout the module_reader code.

@mstange
Copy link

mstange commented Aug 7, 2024

Do you have the /proc/pid/maps from when you saw the incorrect / corrupted state? I wonder if it's not corruption but something else going on.

@afranchuk
Copy link
Contributor Author

I do not have the inputs, though I would like to have them. It was on Android in CI testing: https://treeherder.mozilla.org/logviewer?job_id=469290206&repo=autoland&lineNumber=8170.

@gabrielesvelto
Copy link
Contributor

This is bizarre and might be hiding a deeper problem. Is the issue reproducible? Can you check what values are involved and where did they come from?

@afranchuk
Copy link
Contributor Author

afranchuk commented Aug 8, 2024

I am going to try some runs with debug logging to try to track it down.

@afranchuk
Copy link
Contributor Author

Regardless of the underlying issue, we are doing arithmetic with values derived from inputs, so I think we should not panic if overflow occurs.

@gabrielesvelto
Copy link
Contributor

Regardless of the underlying issue, we are doing arithmetic with values derived from inputs, so I think we should not panic if overflow occurs.

Yes, I agree.

@afranchuk
Copy link
Contributor Author

The problem is reproducible, now I'm trying to figure out how to exfiltrate some debug information.

@gabrielesvelto
Copy link
Contributor

The problem is reproducible, now I'm trying to figure out how to exfiltrate some debug information.

eprintln!() FTW

@afranchuk
Copy link
Contributor Author

afranchuk commented Aug 9, 2024

eprintln!() did not work! After that I took a shotgun approach with eprintln!, println! and log::error! (on the off chance it could pick up the log crate registration in gkrust). Nothing showed up in logs :(

@afranchuk
Copy link
Contributor Author

I've still been unable to figure out why this fails, however the fix here does remedy the error. While I'd like to know the root cause, it'd be nice to get this merged.

@gabrielesvelto
Copy link
Contributor

I just re-checked the failed workload and I think there might be a way to surface the contents of the memory map. You could try using the android_log-sys or android-logger crate we have vendored to dump the contents of the memory map directly to the logcat. Our automation tasks capture the entire logcat output, so we could try that to exfiltrate the information we need.

@afranchuk
Copy link
Contributor Author

I'm not sure whether android-logger will work, as I had assumed that when linking it links to the same log crate instance in gkrust (which presumably has a logger registered). However, it's entirely possible that's not happening, so I'm going to try it out.

@gabrielesvelto
Copy link
Contributor

Worst case android_log-sys should allow you to invoke __android_log_print() directly, that will surely end up in the logcat output.

@afranchuk
Copy link
Contributor Author

afranchuk commented Aug 27, 2024

android-logger didn't work; its documentation says that if a logger is already installed it will silently do nothing, so I can only guess that's what's going on. I'll try directly printing.

@afranchuk
Copy link
Contributor Author

Okay, so it looks like android-logger and __android_log_print() are both working, which means android-logger was working before and the taskcluster UI for searching the log is broken (I downloaded the log this time when I still didn't see the logs I expected).

@afranchuk
Copy link
Contributor Author

The overflow occurs here:

start address=129746557718528, offset=18446738911158816776

This offset occurs when trying to get the build id of /system/lib64/libGLESv2.so from the notes section. Based on the addresses being read according to the log and the crash backtrace, I believe the strtab section offset is being read incorrectly, though I don't see any obvious reasons for this. However we can check this offset against the known size of the module. I'm going to try to get my hands on that file to confirm this.

@afranchuk
Copy link
Contributor Author

The file works as expected when our code runs on it directly, so something odd is happening when it is loaded in memory. I noticed that the section header offset in the file is 0x16170, however according to the logs the offset during the failing test was 0x1616f, which I find a bit suspicious (though it's possible that's how it's loaded, but even the size in memory is the same as the file size). I can dump the loaded module to see what's up there. I also can't find that value in the file (if the section offset was being misread).

@gabrielesvelto
Copy link
Contributor

Could you try checking if the issue presents itself with elfhack disabled?

@afranchuk
Copy link
Contributor Author

I've failed to reproduce the test failure locally (which is fairly annoying, I'm not sure what's different), so I'm going to do a terrible thing to get the memory contents of that library on try.

@afranchuk
Copy link
Contributor Author

afranchuk commented Aug 29, 2024

It appears as if the section header table is not loaded (but a bunch of junk memory is there). This is not surprising, I've seen this before, though it's uncommon. I'll double-check the program headers to make sure this is expected. So in this case, there's two things we can do: verify the type of section headers we read/use (it's likely to be junk) before doing anything with them, and do checked arithmetic to avoid garbage values (which is already implemented).

@afranchuk
Copy link
Contributor Author

afranchuk commented Aug 29, 2024

The program headers confirm that the section header table wasn't expected to be loaded. What's disappointing is that the program header PT_NOTE segment doesn't contain the build id (which is what we're trying to get), only the section has it. The section is loaded into memory: it happens to be right after the PT_NOTE data.

@mstange
Copy link

mstange commented Aug 29, 2024

Can you paste the program header table and the section table here? (I mean the output of readelf that displays those tables)
Which program header is the section table in? How is it different in the normal case?

@afranchuk
Copy link
Contributor Author

Can you paste the program header table and the section table here? (I mean the output of readelf that displays those tables) Which program header is the section table in? How is it different in the normal case?

There are 25 section headers, starting at offset 0x16170:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000000238  00000238
       0000000000000015  0000000000000000   A       0     0     1
  [ 2] .note.androi[...] NOTE             000000000000024e  0000024e
       0000000000000018  0000000000000000   A       0     0     2
  [ 3] .note.gnu.bu[...] NOTE             0000000000000268  00000268
       0000000000000020  0000000000000000   A       0     0     4
  [ 4] .dynsym           DYNSYM           0000000000000288  00000288
       00000000000044d0  0000000000000018   A       5     1     8
  [ 5] .dynstr           STRTAB           0000000000004758  00004758
       0000000000003ccb  0000000000000000   A       0     0     1
  [ 6] .gnu.hash         GNU_HASH         0000000000008428  00008428
       0000000000001590  0000000000000000   A       4     0     8
  [ 7] .gnu.version      VERSYM           00000000000099b8  000099b8
       00000000000005bc  0000000000000002   A       4     0     2
  [ 8] .gnu.version_d    VERDEF           0000000000009f74  00009f74
       000000000000001c  0000000000000000   A       5     1     4
  [ 9] .gnu.version_r    VERNEED          0000000000009f90  00009f90
       0000000000000020  0000000000000000   A       5     1     4
  [10] .rela.dyn         RELA             0000000000009fb0  00009fb0
       0000000000000030  0000000000000018   A       4     0     8
  [11] .rela.plt         RELA             0000000000009fe0  00009fe0
       0000000000000090  0000000000000018  AI       4    12     8
  [12] .plt              PROGBITS         000000000000a070  0000a070
       0000000000000070  0000000000000010  AX       0     0     16
  [13] .text             PROGBITS         000000000000a0e0  0000a0e0
       0000000000005c45  0000000000000000  AX       0     0     16
  [14] .eh_frame         PROGBITS         000000000000fd28  0000fd28
       00000000000044f4  0000000000000000   A       0     0     8
  [15] .eh_frame_hdr     PROGBITS         000000000001421c  0001421c
       00000000000016d4  0000000000000000   A       0     0     4
  [16] .fini_array       FINI_ARRAY       0000000000016d60  00015d60
       0000000000000008  0000000000000000  WA       0     0     8
  [17] .dynamic          DYNAMIC          0000000000016d68  00015d68
       0000000000000250  0000000000000010  WA       5     0     8
  [18] .got              PROGBITS         0000000000016fb8  00015fb8
       0000000000000000  0000000000000000  WA       0     0     8
  [19] .got.plt          PROGBITS         0000000000016fb8  00015fb8
       0000000000000048  0000000000000000  WA       0     0     8
  [20] .data             PROGBITS         0000000000017000  00016000
       0000000000000008  0000000000000000  WA       0     0     8
  [21] .comment          PROGBITS         0000000000000000  00016008
       0000000000000026  0000000000000001  MS       0     0     1
  [22] .note.gnu.go[...] NOTE             0000000000000000  00016030
       000000000000001c  0000000000000000           0     0     4
  [23] .gnu_debuglink    PROGBITS         0000000000000000  0001604c
       0000000000000014  0000000000000000           0     0     1
  [24] .shstrtab         STRTAB           0000000000000000  00016060
       0000000000000109  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

Elf file type is DYN (Shared object file)
Entry point 0x0
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000001f8 0x00000000000001f8  R      0x8
  INTERP         0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x0000000000000015 0x0000000000000015  R      0x1
      [Requesting program interpreter: /system/bin/linker64]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000158f0 0x00000000000158f0  R E    0x1000
  LOAD           0x0000000000015d60 0x0000000000016d60 0x0000000000016d60
                 0x00000000000002a8 0x00000000000002a8  RW     0x1000
  DYNAMIC        0x0000000000015d68 0x0000000000016d68 0x0000000000016d68
                 0x0000000000000250 0x0000000000000250  RW     0x8
  NOTE           0x000000000000024e 0x000000000000024e 0x000000000000024e
                 0x0000000000000018 0x0000000000000018  R      0x4
  GNU_EH_FRAME   0x000000000001421c 0x000000000001421c 0x000000000001421c
                 0x00000000000016d4 0x00000000000016d4  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x0
  GNU_RELRO      0x0000000000015d60 0x0000000000016d60 0x0000000000016d60
                 0x00000000000002a0 0x00000000000002a0  RW     0x8

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.android.ident .note.gnu.build-id .dynsym .dynstr .gnu.hash .gnu.version .gnu.version_d .gnu.version_r .rela.dyn .rela.plt .plt .text .eh_frame .eh_frame_hdr 
   03     .fini_array .dynamic .got .got.plt .data 
   04     .dynamic 
   05     .note.android.ident 
   06     .eh_frame_hdr 
   07     
   08     .fini_array .dynamic .got .got.plt 

The section header table is not in a program header (which is the problem). The .note.gnu.build-id section is within the read/execute PT_LOAD header (header 2). Most binaries look like that, with an RX PT_LOAD from 0 to the end of the section contents which should be RX (they seem to be ordered intentionally to keep those sections all together).

It seems like this is the common case, I was wrong earlier to say that it's uncommon (though on linux I definitely recall having seen some binaries loaded wholly into memory, including the section header table). I think we normally succeed in getting the build id from the program headers, so we don't attempt the section headers (or we do but just get an access violation which we handle fine, rather than this overflow).

@afranchuk
Copy link
Contributor Author

afranchuk commented Aug 29, 2024

Maybe if we can't get the build id by reading the in-memory binary we could try to read it from the file (if present)? We changed to reading in-memory because in some cases the files are not present, but it's not an unreasonable fallback. It might incur a lot of extra IO though, I'm unsure how commonly we fail to find the build id. Note that we don't have this issue for SONAME because we can get everything we need from the PT_DYNAMIC segment.

The other no-good dirty-rotten hack we could do is just look ahead after the PT_NOTE segment(s) if we don't find the build id in the hopes that the note section is there (I'm willing to bet it's commonly found there).

Copy link
Contributor

@gabrielesvelto gabrielesvelto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small change to make the code leaner.

src/linux/module_reader.rs Outdated Show resolved Hide resolved
This also bumps the version of `goblin` to at least 0.8.2, which allows
us to clean up some code.
This avoids continuing the logic if the section header table isn't
present (except for the contrived case where the random bytes happen to
be the correct section type).
Copy link
Contributor

@gabrielesvelto gabrielesvelto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gabrielesvelto gabrielesvelto merged commit 2fbb435 into rust-minidump:main Sep 3, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants