Skip to content

Conversation

@paulnoalhyt
Copy link
Collaborator

@paulnoalhyt paulnoalhyt commented Sep 26, 2025

  • I have reviewed the OFRAK contributor guide and attest that this pull request is in accordance with it.
  • I have made or updated a changelog entry for the changes in this pull request.

One sentence summary of this PR (This should go in the CHANGELOG!)

Add ProgramAttributes from the OFRAK GUI. Rewrote the ihex components.

Please describe the changes in your request.

Allow users to add ProgramAttributes from the GUI:

Allow users to add ProgramAttributes to a resource in the OFRAK GUI. This is particularly useful when dealing with Intel Hex files which do not contain info about the CPU architecture, making them impossible to analyze in the GUI before.

  • added 2 endpoints in the server to:
    • get all possible ProgramAttributes fields (InstructionSet, SubInstructionSet, BitWidth, Endianness, ProcessorType).
    • add ProgramAttributes to a resource.
  • added tests for these 2 endpoints.
  • test_pyghidra_components.py: added test for unpacking ihex file with pyghidra, after adding ProgramAttributes.
Rewriting of the ihex components (analyzer/packer/unpacker):
some context about how ELF files are handled in OFRAK:
  • ELF file has the Program tag, so when we analyze an ELF with a backend, we give the backend the full ELF with headers, etc.
  • The ELF unpacker creates CodeRegions that can be unpacked by the backend.
some context about how ihex files were handled before this PR:
  • the Ihex is just considered as a GenericFile.
  • the IhexProgram gets the Program tag. It's the binary content of all sections present in the ihex (binary blob of concatenated sections). This is given to the backend. The issue is that the backend doesn't receive the ihex file, so it has no idea about the sections present in this binary blob.
  • the IhexUnpacker doesn't create CodeRegions but an IhexProgram (binary blob mentioned above).
    • the IhexProgramUnpacker then creates ProgramSections which are not really CodeRegions.
  • it's harder to map the backend segments to these ProgramSections. For example, the PyGhidraCodeRegionUnpacker is made to run on CodeRegions and not ProgramSections.
proposed rewriting:
  • the Ihex file gets the Program tag. That way the full ihex file is given to the backend for analysis.
  • the IhexUnpacker created CodeRegions, like the ElfUnpacker. This makes sense because when loading an ihex file in Ghidra, it assumes that all sections are RWX.
  • remove the binary blob intermediary representation, which doesn't add any value. The binary content is contained in the CodeRegions data.
  • tests were updated.

Happy to discuss this rewriting in comments!

Bug fixes/improvements:
  • pyghidra_components.py:
    • when running the PyGhidraAutoAnalyzer, the resource was flushed to disk, so it was packed. This removes all child resources that were created before running the PyGhidraAutoAnalyzer. This was an issue if for example, you load an ihex file, unpack it (so it creates CodeRegions), then run unpack_recursively(). As the binary was packed by the PyGhidraAutoAnalyzer, the CodeRegions were be deleted, and the unpack_recursively failed because the CodeRegions it was supposed to unpack disapeared. To avoid that, I put pack=False in flush_data_to_disk. This solves my issue, but is not ideal (what if the user modified some data in the ihex before running the PyGhidraAutoAnalyzer?!). Happy to discuss other ideas to solve this issue!
    • the PyGhidraAutoAnalyzer now checks for ProgramAttributes before running unpack.
  • pyghidra_analysis.py: added better error message when hitting ghidra.app.util.opinion.LoadException: No load spec found, to give the user a hint that they should add ProgramAttributes.
  • AddTagView.svelte and AddProgramAttributesView.svelte: updated refreshResource() so that the attribute panel in the GUI is actually refreshed with the new tag/ProgramAttributes.

Anyone you think should look at this, specifically?

No

@whyitfor whyitfor requested a review from rbs-afflitto October 20, 2025 13:55
Copy link
Collaborator

@rbs-afflitto rbs-afflitto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Approved

Copy link
Member

@rbs-jacob rbs-jacob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor suggestions

throw Error(JSON.stringify(await r.json(), undefined, 2));
}
//console.log(r.json())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//console.log(r.json())

Comment on lines +138 to +140
r.json().then((ofrakProgramAttributes) => {
ofrakProgramAttributesPromise = ofrakProgramAttributes;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this whole function is async, you can await r.json() here without using a .then callback.

onMount(async () => {
try {
await fetch(`${$settings.backendUrl}/get_all_program_attributes`).then(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this whole function is async, you can await fetch here without using a .then callback.

Comment on lines +109 to +121
selected_attribute["isa"] &&
selected_attribute["bit_width"] &&
selected_attribute["endianness"]
) {
modifierView = undefined;
let program_attributes = JSON.stringify([
"ofrak.core.architecture.ProgramAttributes",
{
isa: selected_attribute["isa"],
sub_isa: selected_attribute["sub_isa"],
bit_width: selected_attribute["bit_width"],
endianness: selected_attribute["endianness"],
processor: selected_attribute["processor"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In JavaScript, object field access is equivalent to dictionary-style field access. So selected_attribute.isa is the same as selected_attribute["isa"].

Might be worth changing these to the more concise syntax?

Comment on lines +126 to +133
except Exception as e:
if "toString" in dir(e) and "No load spec found" in e.toString():
raise Exception(
e.toString()
+ "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!"
)
else:
raise e
Copy link
Member

@rbs-jacob rbs-jacob Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except Exception as e:
if "toString" in dir(e) and "No load spec found" in e.toString():
raise Exception(
e.toString()
+ "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!"
)
else:
raise e
except <SOMETHING MORE SPECIFIC> as e:
if "toString" in dir(e) and "No load spec found" in e.toString():
raise <SOMETHING MORE SPECIFIC>(
e.toString()
+ "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!"
)
else:
raise

We want to avoid catching raw Exceptions if possible. We also want to avoid raising them. We should probably replace these with a more specific Exception instance such as RuntimeError, ValueError, or something even more specific.

Worth changing in other parts of the code where you see this, as well.

Comment on lines 19 to +22
def unpack(program_file, decompiled, language=None, base_address=None):
with pyghidra.open_program(program_file, language=language) as flat_api:
# Java packages must be imported after pyghidra.start or pyghidra.open_program
from ghidra.app.decompiler import DecompInterface, DecompileOptions
from ghidra.util.task import TaskMonitor
from ghidra.program.model.block import BasicBlockModel
from ghidra.program.model.symbol import RefType
from java.math import BigInteger

# If base_address is provided, rebase the program
if base_address is not None:
# Convert base_address to int if it's a string
if isinstance(base_address, str):
if base_address.startswith("0x"):
base_address = int(base_address, 16)
else:
base_address = int(base_address)

# Rebase the program to the specified base address
program = flat_api.getCurrentProgram()
address_factory = program.getAddressFactory()
new_base_addr = address_factory.getDefaultAddressSpace().getAddress(hex(base_address))
program.setImageBase(new_base_addr, True)

main_dictionary = {}
code_regions = _unpack_program(flat_api)
main_dictionary["metadata"] = {}
main_dictionary["metadata"]["backend"] = "ghidra"
main_dictionary["metadata"]["decompiled"] = decompiled
main_dictionary["metadata"]["path"] = program_file
if base_address is not None:
main_dictionary["metadata"]["base_address"] = base_address
with open(program_file, "rb") as fh:
data = fh.read()
md5_hash = hashlib.md5(data)
main_dictionary["metadata"]["hash"] = md5_hash.digest().hex()

for code_region in code_regions:
seg_key = f"seg_{code_region['virtual_address']}"
main_dictionary[seg_key] = code_region
func_cbs = _unpack_code_region(code_region, flat_api)
code_region["children"] = []

decomp_interface = DecompInterface()
prog_options = DecompileOptions()
prog_options.grabFromProgram(flat_api.getCurrentProgram())
decomp_interface.setOptions(prog_options)
init = decomp_interface.openProgram(flat_api.getCurrentProgram())
if not init:
raise RuntimeError("Could not open program for decompilation")

for func, cb in func_cbs:
cb_key = f"func_{cb['virtual_address']}"
code_region["children"].append(cb_key)
if decompiled:
try:
decompilation = _decompile(func, decomp_interface, TaskMonitor.DUMMY)
except Exception as e:
print(e, traceback.format_exc())
decompilation = ""
cb["decompilation"] = decompilation
bb_model = BasicBlockModel(flat_api.getCurrentProgram())
basic_blocks, data_words = _unpack_complex_block(
func, flat_api, bb_model, BigInteger.ONE
try:
with pyghidra.open_program(program_file, language=language) as flat_api:
# Java packages must be imported after pyghidra.start or pyghidra.open_program
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not worth blocking this PR for this, but I wonder if this function can be refactored to be a bit smaller? I think that might make it a bit easier to read.

There are helper functions like _unpack_basic_block that get called, but there is still a lot of logic before and after those calls. Can any of that logic before and after the helper function calls be moved into the helper functions themselves?

### Changed
- Reduce the decompilation time of PyGhidra by reusing cached unpacking results. ([#623](https://github.com/redballoonsecurity/ofrak/pull/623))
- Improve `ofrak_pyghidra` decompilation: more strings and symbol names for cross-references in decompilation. ([#633](https://github.com/redballoonsecurity/ofrak/pull/633))
- Look for `ProgramAttributes` before unpacking. Do not pack when flushing to disk. Better error message when Ghidra raises "No load spec found". Added test for unpacking ihex with `ofrak_pyghidra` ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Look for `ProgramAttributes` before unpacking. Do not pack when flushing to disk. Better error message when Ghidra raises "No load spec found". Added test for unpacking ihex with `ofrak_pyghidra` ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))
- Improve unpacking logic, error messages, and testing for `ofrak_pyghidra` auto analyzer ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))

Comment on lines +56 to +71
children = (CodeRegion,)

async def unpack(self, resource: Resource, config=None):
ihex_program = await resource.view_as(IhexProgram)
for seg_vaddr_range in ihex_program.segments:
# Segment is mapped into the program at an offset starting at the difference between
# the segment's vaddr range and the program's base address
segment_data_range = seg_vaddr_range.translate(-ihex_program.address_limits.start)
_, binfile = _binfile_analysis(await resource.get_data(), self)

for segment in binfile.segments:
segment_data = bytes(binfile.as_binary())[
segment.minimum_address
- binfile.minimum_address : segment.maximum_address
- binfile.minimum_address
]
await resource.create_child_from_view(
ProgramSection(seg_vaddr_range.start, seg_vaddr_range.length()),
data_range=segment_data_range,
CodeRegion(
segment.minimum_address, segment.maximum_address - segment.minimum_address
),
data=segment_data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessarily the case that all unpacked IHEX segments are code regions? No data segments? I don't know enough about IHEX to say for sure, but wanted to confirm.

I generally like that Ihex now inherits from Program, but am uncertain about this particular change.

- Update `DataSummary` handling so that it is no longer stored in `ResourceModel`, but rather retrievable via `DataSummaryAnalyzer.get_data_summary` ([#598](https://github.com/redballoonsecurity/ofrak/pull/598))
- Update dependencies: aiohttp>=3.12.14, beartype~=0.20.0, ubi-reader==0.8.12 ([#613](https://github.com/redballoonsecurity/ofrak/pull/613))
- Add `ofrak_pyghidra` to OFRAK GUI script builder ([#631](https://github.com/redballoonsecurity/ofrak/pull/631))
- rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))
- Rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))

Might be worth adding a couple words on what justified the rewrite and/or what changed. Thinking from the perspective of someone reading the CHANGELOG, they're going to wonder whether and how this affects them, so we want to clarify that as much as possible up-front.

assert project["session_id"] == id


async def test_get_all_program_attributes(ofrak_client: TestClient):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need docstrings for these tests. Look at examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants