Skip to content

Commit 555c888

Browse files
authored
Address PEM crash caused by parsing certain older Go application binaries (#1976)
Summary: Address PEM crash caused by parsing certain older Go application binaries This PR fixes a crash caused by certain older Go application binaries. In addition, this change includes the `//src/stirling/binaries:go_binary_parse_profiling` cli tool. This tool was helpful for debugging the previous 32 bit issue and aided in debugging this problem (see Background section for more details). This change is best reviewed commit by commit. **Background** Our Golang binary parsing was revamped in #1605 to support Go 1.20.4 applications and later (#1318) in addition to fixing a PEM crash caused by 32 bit go binaries (#1300). While this solved the aforementioned issues, it resulted in a new crash that we weren't able to reproduce (#1646). I was able to work with a Pixie Community Slack user to track down where one of these issues originate from. The overview is that Go embeds virtual addresses within the `.go.buildinfo` ELF section. These virtual addresses are used in certain cases to read the build settings used when the binary was created (toolchain version, go experiments, etc). In order to properly read these strings, these virtual addresses need to be converted into file offsets (binary addresses). This bug presents itself when the `LOAD` ELF segments in the binary are not contiguous or ordered by increasing virtual memory address. Meaning if there are LOAD segments for segments 1, 2 and 3, this bug occurs if those segments aren't adjacent to each other or don't have increasing virtual memory addresses (vaddr of segment 1 < vaddr of segment 2 < vaddr of segment 3). Instead the virtual address that needs to be looked up, should be matched against the relevant segment and that segment's virtual address offset should be used. Relevant Issues: Partially addresses #1646 -- there is one more known case, which must be investigated further Type of change: /kind bug Test Plan: Verified this change through the following - [x] User from the community slack [verified](https://pixie-community.slack.com/archives/CQ63KEVFY/p1722271309767939?thread_ts=1721315312.198319&cid=CQ63KEVFY) that the issue was fixed. - [x] New ElfReader function is covered with a test - [x] go 1.17 test case ("[little endian](https://github.com/pixie-io/pixie/blob/50ddcd32eb217e1aa5e87124883ee284a36052a1/src/stirling/obj_tools/go_syms_test.cc#L51)" case) still works despite it not triggering this bug - I was unable to recreate a binary that had the segments in an unordered fashion. Changelog Message: Fixed an issue with Go uprobe attachment that previously caused crashes for a subset of older Go applications (Go 1.17 and earlier) --------- Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai> Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
1 parent c6e18a9 commit 555c888

File tree

8 files changed

+177
-9
lines changed

8 files changed

+177
-9
lines changed

src/stirling/binaries/BUILD.bazel

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,14 @@ pl_cc_binary(
7676
],
7777
)
7878

79+
pl_cc_binary(
80+
name = "go_binary_parse_profiling",
81+
srcs = ["go_binary_parse_profiling.cc"],
82+
deps = [
83+
"//src/stirling:cc_library",
84+
],
85+
)
86+
7987
cc_image(
8088
name = "stirling_dt_image",
8189
base = ":stirling_binary_base_image",
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
/*
2+
* Copyright 2018- The Pixie Authors.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*
16+
* SPDX-License-Identifier: Apache-2.0
17+
*/
18+
19+
#include "src/common/base/base.h"
20+
#include "src/common/base/env.h"
21+
#include "src/stirling/source_connectors/socket_tracer/uprobe_symaddrs.h"
22+
23+
using px::StatusOr;
24+
using px::stirling::PopulateGoTLSDebugSymbols;
25+
using px::stirling::obj_tools::DwarfReader;
26+
using px::stirling::obj_tools::ElfReader;
27+
28+
//-----------------------------------------------------------------------------
29+
// This utility is designed to isolate parsing the debug symbols of a Go binary. This
30+
// verifies that the go version detection code is functioning as well. This is useful
31+
// for debugging when the Go elf/DWARF parsing is not working correctly and has been the
32+
// source of a few PEM crashes (gh#1300, gh#1646). This makes it easy for asking end users to run
33+
// against their binaries when they are sensitive (proprietary) and we can't debug them ourselves.
34+
//-----------------------------------------------------------------------------
35+
36+
int main(int argc, char** argv) {
37+
px::EnvironmentGuard env_guard(&argc, argv);
38+
39+
if (argc < 2) {
40+
LOG(FATAL) << absl::Substitute("Expected binary argument to be provided. Instead received $0",
41+
*argv);
42+
}
43+
44+
std::string binary(argv[1]);
45+
46+
StatusOr<std::unique_ptr<ElfReader>> elf_reader_status = ElfReader::Create(binary);
47+
if (!elf_reader_status.ok()) {
48+
LOG(WARNING) << absl::Substitute(
49+
"Failed to parse elf binary $0 with"
50+
"Message = $1",
51+
binary, elf_reader_status.msg());
52+
}
53+
std::unique_ptr<ElfReader> elf_reader = elf_reader_status.ConsumeValueOrDie();
54+
55+
StatusOr<std::unique_ptr<DwarfReader>> dwarf_reader_status =
56+
DwarfReader::CreateIndexingAll(binary);
57+
if (!dwarf_reader_status.ok()) {
58+
VLOG(1) << absl::Substitute(
59+
"Failed to get binary $0 debug symbols. "
60+
"Message = $1",
61+
binary, dwarf_reader_status.msg());
62+
}
63+
std::unique_ptr<DwarfReader> dwarf_reader = dwarf_reader_status.ConsumeValueOrDie();
64+
65+
struct go_tls_symaddrs_t symaddrs;
66+
auto status = PopulateGoTLSDebugSymbols(elf_reader.get(), dwarf_reader.get(), &symaddrs);
67+
68+
if (!status.ok()) {
69+
LOG(ERROR) << absl::Substitute("debug symbol parsing failed with: $0", status.msg());
70+
}
71+
}

src/stirling/obj_tools/elf_reader.cc

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -605,6 +605,19 @@ StatusOr<ELFIO::section*> ElfReader::SectionWithName(std::string_view section_na
605605
return error::NotFound("Could not find section=$0 in binary=$1", section_name, binary_path_);
606606
}
607607

608+
StatusOr<uint64_t> ElfReader::VirtualAddrToBinaryAddr(uint64_t virtual_addr) {
609+
for (int i = 0; i < elf_reader_.segments.size(); i++) {
610+
ELFIO::segment* segment = elf_reader_.segments[i];
611+
uint64_t virt_addr = segment->get_virtual_address();
612+
uint64_t offset = segment->get_offset();
613+
uint64_t size = segment->get_file_size();
614+
if (virtual_addr >= virt_addr && virtual_addr < virt_addr + size) {
615+
return virtual_addr - virt_addr + offset;
616+
}
617+
}
618+
return error::Internal("Could not find binary address for virtual address=$0", virtual_addr);
619+
}
620+
608621
StatusOr<utils::u8string> ElfReader::SymbolByteCode(std::string_view section,
609622
const SymbolInfo& symbol) {
610623
PX_ASSIGN_OR_RETURN(ELFIO::section * text_section, SectionWithName(section));

src/stirling/obj_tools/elf_reader.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,18 @@ class ElfReader {
160160
*/
161161
StatusOr<u8string> SymbolByteCode(std::string_view section, const SymbolInfo& symbol);
162162

163+
/**
164+
* Returns the binary address that corresponds to the given virtual address.
165+
* This virtual address will not be subject to ASLR since the calculation is based entirely on the
166+
* ELF file and its section and segment information. Given this, most of the time
167+
* ElfAddressConverter::VirtualAddrToBinaryAddr is a more appropriate utility to use.
168+
*
169+
* Certain use cases may require this function, such as cases where the Go toolchain
170+
* embeds virtual addresses within a binary and must be parsed (See ReadGoBuildVersion and
171+
* ReadGoString in go_syms.cc).
172+
*/
173+
StatusOr<uint64_t> VirtualAddrToBinaryAddr(uint64_t virtual_addr);
174+
163175
/**
164176
* Returns the virtual address in the ELF file of offset 0x0. Calculated by finding the first
165177
* loadable segment and returning its virtual address minus its file offset.

src/stirling/obj_tools/elf_reader_test.cc

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,55 @@ using ::testing::UnorderedElementsAre;
4040

4141
using ::px::operator<<;
4242

43+
// Models ELF section output information from objdump -h.
44+
// ELFIO::section's do not contain virtual memory addresses like ELFIO::segment's do
45+
// so this struct is used to store the information from objdump.
46+
// Example objdump output:
47+
//
48+
// $ objdump -j .bss -h bazel-bin/src/stirling/obj_tools/testdata/cc/test_exe/test_exe
49+
//
50+
// bazel-bin/src/stirling/obj_tools/testdata/cc/test_exe/test_exe: file format elf64-x86-64
51+
//
52+
// Sections:
53+
// Idx Name Size VMA LMA File off Algn
54+
// 27 .bss 00002068 00000000000bd100 00000000000bd100 000ba100 2**5
55+
// ALLOC
56+
struct Section {
57+
std::string name;
58+
int64_t size;
59+
int64_t vma;
60+
int64_t lma;
61+
int64_t file_offset;
62+
};
63+
64+
// TODO(ddelnano): Make this function hermetic by providing the objdump output via bazel
65+
StatusOr<Section> ObjdumpSectionNameToAddr(const std::string& path,
66+
const std::string& section_name) {
67+
Section section;
68+
std::string objdump_out =
69+
px::Exec(absl::StrCat("objdump -h -j ", section_name, " ", path)).ValueOrDie();
70+
std::vector<absl::string_view> objdump_out_lines = absl::StrSplit(objdump_out, '\n');
71+
for (auto& line : objdump_out_lines) {
72+
if (line.find(section_name) != std::string::npos) {
73+
std::vector<absl::string_view> line_split = absl::StrSplit(line, ' ', absl::SkipWhitespace());
74+
CHECK(!line_split.empty());
75+
76+
section.name = std::string(line_split[1]);
77+
section.size = std::stol(std::string(line_split[2]), nullptr, 16);
78+
section.vma = std::stol(std::string(line_split[3]), nullptr, 16);
79+
section.lma = std::stol(std::string(line_split[4]), nullptr, 16);
80+
section.file_offset = std::stol(std::string(line_split[5]), nullptr, 16);
81+
break;
82+
}
83+
}
84+
85+
if (section.name != section_name) {
86+
return error::Internal("Unable to find section with name $0", section_name);
87+
}
88+
89+
return section;
90+
}
91+
4392
StatusOr<int64_t> NmSymbolNameToAddr(const std::string& path, const std::string& symbol_name) {
4493
// Extract the address from nm as the gold standard.
4594
int64_t symbol_addr = -1;
@@ -133,6 +182,17 @@ TEST(ElfReaderTest, SymbolAddress) {
133182
}
134183
}
135184

185+
TEST(ElfReaderTest, VirtualAddrToBinaryAddr) {
186+
const std::string path = kTestExeFixture.Path().string();
187+
const std::string kDataSection = ".data";
188+
ASSERT_OK_AND_ASSIGN(const Section section, ObjdumpSectionNameToAddr(path, kDataSection));
189+
190+
ASSERT_OK_AND_ASSIGN(std::unique_ptr<ElfReader> elf_reader, ElfReader::Create(path));
191+
const int64_t offset = 1;
192+
ASSERT_OK_AND_ASSIGN(auto binary_addr, elf_reader->VirtualAddrToBinaryAddr(section.vma + offset));
193+
EXPECT_EQ(binary_addr, section.file_offset + offset);
194+
}
195+
136196
TEST(ElfReaderTest, AddrToSymbol) {
137197
const std::string path = kTestExeFixture.Path().string();
138198
const std::string kSymbolName = "CanYouFindThis";

src/stirling/obj_tools/go_syms.cc

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,14 @@ std::string_view kGoBuildInfoMagic =
5555

5656
// Reads a Go string encoded within a buildinfo header. This function is meant to provide the same
5757
// functionality as
58-
// https://github.com/golang/go/blob/master/src/debug/buildinfo/buildinfo.go#L244C37-L244C44
58+
// https://github.com/golang/go/blob/aa97a012b4be393c1725c16a78b92dea81632378/src/debug/buildinfo/buildinfo.go#L282
5959
StatusOr<std::string> ReadGoString(ElfReader* elf_reader, uint64_t ptr_size, uint64_t ptr_addr,
6060
read_ptr_func_t read_ptr) {
6161
PX_ASSIGN_OR_RETURN(u8string_view data_addr, elf_reader->BinaryByteCode(ptr_addr, ptr_size));
6262
PX_ASSIGN_OR_RETURN(u8string_view data_len,
6363
elf_reader->BinaryByteCode(ptr_addr + ptr_size, ptr_size));
6464

65-
PX_ASSIGN_OR_RETURN(uint64_t vaddr_offset, elf_reader->GetVirtualAddrAtOffsetZero());
66-
ptr_addr = read_ptr(data_addr) - vaddr_offset;
65+
PX_ASSIGN_OR_RETURN(ptr_addr, elf_reader->VirtualAddrToBinaryAddr(read_ptr(data_addr)));
6766
uint64_t str_length = read_ptr(data_len);
6867

6968
PX_ASSIGN_OR_RETURN(std::string_view go_version_bytecode,
@@ -136,10 +135,11 @@ StatusOr<std::string> ReadGoBuildVersion(ElfReader* elf_reader) {
136135
}
137136
}
138137

139-
PX_ASSIGN_OR_RETURN(uint64_t vaddr_offset, elf_reader->GetVirtualAddrAtOffsetZero());
140-
141-
PX_ASSIGN_OR_RETURN(auto s, binary_decoder.ExtractString<u8string_view::value_type>(ptr_size));
142-
uint64_t ptr_addr = read_ptr(s) - vaddr_offset;
138+
// Reads the virtual address location of the runtime.buildVersion symbol.
139+
PX_ASSIGN_OR_RETURN(auto runtime_version_vaddr,
140+
binary_decoder.ExtractString<u8string_view::value_type>(ptr_size));
141+
PX_ASSIGN_OR_RETURN(uint64_t ptr_addr,
142+
elf_reader->VirtualAddrToBinaryAddr(read_ptr(runtime_version_vaddr)));
143143

144144
return ReadGoString(elf_reader, ptr_size, ptr_addr, read_ptr);
145145
}

src/stirling/source_connectors/socket_tracer/uprobe_symaddrs.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -464,6 +464,8 @@ Status PopulateHTTP2DebugSymbols(DwarfReader* dwarf_reader, std::string_view ven
464464
return Status::OK();
465465
}
466466

467+
} // namespace
468+
467469
Status PopulateGoTLSDebugSymbols(ElfReader* elf_reader, DwarfReader* dwarf_reader,
468470
struct go_tls_symaddrs_t* symaddrs) {
469471
PX_ASSIGN_OR_RETURN(std::string build_version, ReadGoBuildVersion(elf_reader));
@@ -510,8 +512,6 @@ Status PopulateGoTLSDebugSymbols(ElfReader* elf_reader, DwarfReader* dwarf_reade
510512
return Status::OK();
511513
}
512514

513-
} // namespace
514-
515515
StatusOr<struct go_common_symaddrs_t> GoCommonSymAddrs(ElfReader* elf_reader,
516516
DwarfReader* dwarf_reader) {
517517
struct go_common_symaddrs_t symaddrs;

src/stirling/source_connectors/socket_tracer/uprobe_symaddrs.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,5 +73,9 @@ StatusOr<struct openssl_symaddrs_t> OpenSSLSymAddrs(obj_tools::RawFptrManager* f
7373
StatusOr<struct node_tlswrap_symaddrs_t> NodeTLSWrapSymAddrs(const std::filesystem::path& node_exe,
7474
const SemVer& ver);
7575

76+
px::Status PopulateGoTLSDebugSymbols(obj_tools::ElfReader* elf_reader,
77+
obj_tools::DwarfReader* dwarf_reader,
78+
struct go_tls_symaddrs_t* symaddrs);
79+
7680
} // namespace stirling
7781
} // namespace px

0 commit comments

Comments
 (0)