Add `ProgramAttributes` from the GUI #637

paulnoalhyt · 2025-09-26T15:43:02Z

I have reviewed the OFRAK contributor guide and attest that this pull request is in accordance with it.
I have made or updated a changelog entry for the changes in this pull request.

One sentence summary of this PR (This should go in the CHANGELOG!)

Add ProgramAttributes from the OFRAK GUI. Rewrote the ihex components.

Please describe the changes in your request.

Allow users to add `ProgramAttributes` from the GUI:

Allow users to add ProgramAttributes to a resource in the OFRAK GUI. This is particularly useful when dealing with Intel Hex files which do not contain info about the CPU architecture, making them impossible to analyze in the GUI before.

added 2 endpoints in the server to:
- get all possible ProgramAttributes fields (InstructionSet, SubInstructionSet, BitWidth, Endianness, ProcessorType).
- add ProgramAttributes to a resource.
added tests for these 2 endpoints.
test_pyghidra_components.py: added test for unpacking ihex file with pyghidra, after adding ProgramAttributes.

Rewriting of the ihex components (analyzer/packer/unpacker):

some context about how ELF files are handled in OFRAK:

ELF file has the Program tag, so when we analyze an ELF with a backend, we give the backend the full ELF with headers, etc.
The ELF unpacker creates CodeRegions that can be unpacked by the backend.

some context about how ihex files were handled before this PR:

the Ihex is just considered as a GenericFile.
the IhexProgram gets the Program tag. It's the binary content of all sections present in the ihex (binary blob of concatenated sections). This is given to the backend. The issue is that the backend doesn't receive the ihex file, so it has no idea about the sections present in this binary blob.
the IhexUnpacker doesn't create CodeRegions but an IhexProgram (binary blob mentioned above).
- the IhexProgramUnpacker then creates ProgramSections which are not really CodeRegions.
it's harder to map the backend segments to these ProgramSections. For example, the PyGhidraCodeRegionUnpacker is made to run on CodeRegions and not ProgramSections.

proposed rewriting:

the Ihex file gets the Program tag. That way the full ihex file is given to the backend for analysis.
the IhexUnpacker created CodeRegions, like the ElfUnpacker. This makes sense because when loading an ihex file in Ghidra, it assumes that all sections are RWX.
remove the binary blob intermediary representation, which doesn't add any value. The binary content is contained in the CodeRegions data.
tests were updated.

Happy to discuss this rewriting in comments!

Bug fixes/improvements:

pyghidra_components.py:
- when running the PyGhidraAutoAnalyzer, the resource was flushed to disk, so it was packed. This removes all child resources that were created before running the PyGhidraAutoAnalyzer. This was an issue if for example, you load an ihex file, unpack it (so it creates CodeRegions), then run unpack_recursively(). As the binary was packed by the PyGhidraAutoAnalyzer, the CodeRegions were be deleted, and the unpack_recursively failed because the CodeRegions it was supposed to unpack disapeared. To avoid that, I put pack=False in flush_data_to_disk. This solves my issue, but is not ideal (what if the user modified some data in the ihex before running the PyGhidraAutoAnalyzer?!). Happy to discuss other ideas to solve this issue!
- the PyGhidraAutoAnalyzer now checks for ProgramAttributes before running unpack.
pyghidra_analysis.py: added better error message when hitting ghidra.app.util.opinion.LoadException: No load spec found, to give the user a hint that they should add ProgramAttributes.
AddTagView.svelte and AddProgramAttributesView.svelte: updated refreshResource() so that the attribute panel in the GUI is actually refreshed with the new tag/ProgramAttributes.

Anyone you think should look at this, specifically?

No

rbs-afflitto

Looks good to me. Approved

rbs-jacob

Couple of minor suggestions

rbs-jacob · 2025-10-22T17:38:38Z

frontend/src/views/AddProgramAttributesView.svelte

+            throw Error(JSON.stringify(await r.json(), undefined, 2));
+          }
+
+          //console.log(r.json())


Suggested change

//console.log(r.json())

rbs-jacob · 2025-10-22T17:39:22Z

frontend/src/views/AddProgramAttributesView.svelte

+          r.json().then((ofrakProgramAttributes) => {
+            ofrakProgramAttributesPromise = ofrakProgramAttributes;
+          });


Since this whole function is async, you can await r.json() here without using a .then callback.

rbs-jacob · 2025-10-22T17:40:08Z

frontend/src/views/AddProgramAttributesView.svelte

+
+  onMount(async () => {
+    try {
+      await fetch(`${$settings.backendUrl}/get_all_program_attributes`).then(


Since this whole function is async, you can await fetch here without using a .then callback.

rbs-jacob · 2025-10-22T18:09:52Z

frontend/src/views/AddProgramAttributesView.svelte

+      selected_attribute["isa"] &&
+      selected_attribute["bit_width"] &&
+      selected_attribute["endianness"]
+    ) {
+      modifierView = undefined;
+      let program_attributes = JSON.stringify([
+        "ofrak.core.architecture.ProgramAttributes",
+        {
+          isa: selected_attribute["isa"],
+          sub_isa: selected_attribute["sub_isa"],
+          bit_width: selected_attribute["bit_width"],
+          endianness: selected_attribute["endianness"],
+          processor: selected_attribute["processor"],


In JavaScript, object field access is equivalent to dictionary-style field access. So selected_attribute.isa is the same as selected_attribute["isa"].

Might be worth changing these to the more concise syntax?

rbs-jacob · 2025-10-22T18:15:05Z

disassemblers/ofrak_pyghidra/src/ofrak_pyghidra/standalone/pyghidra_analysis.py

+    except Exception as e:
+        if "toString" in dir(e) and "No load spec found" in e.toString():
+            raise Exception(
+                e.toString()
+                + "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!"
+            )
+        else:
+            raise e


Suggested change

except Exception as e:

if "toString" in dir(e) and "No load spec found" in e.toString():

raise Exception(

e.toString()

+ "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!"

)

else:

raise e

except <SOMETHING MORE SPECIFIC> as e:

if "toString" in dir(e) and "No load spec found" in e.toString():

raise <SOMETHING MORE SPECIFIC>(

e.toString()

+ "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!"

)

else:

raise

We want to avoid catching raw Exceptions if possible. We also want to avoid raising them. We should probably replace these with a more specific Exception instance such as RuntimeError, ValueError, or something even more specific.

Worth changing in other parts of the code where you see this, as well.

rbs-jacob · 2025-10-22T18:17:47Z

disassemblers/ofrak_pyghidra/src/ofrak_pyghidra/standalone/pyghidra_analysis.py

 def unpack(program_file, decompiled, language=None, base_address=None):
-    with pyghidra.open_program(program_file, language=language) as flat_api:
-        # Java packages must be imported after pyghidra.start or pyghidra.open_program
-        from ghidra.app.decompiler import DecompInterface, DecompileOptions
-        from ghidra.util.task import TaskMonitor
-        from ghidra.program.model.block import BasicBlockModel
-        from ghidra.program.model.symbol import RefType
-        from java.math import BigInteger
-
-        # If base_address is provided, rebase the program
-        if base_address is not None:
-            # Convert base_address to int if it's a string
-            if isinstance(base_address, str):
-                if base_address.startswith("0x"):
-                    base_address = int(base_address, 16)
-                else:
-                    base_address = int(base_address)
-
-            # Rebase the program to the specified base address
-            program = flat_api.getCurrentProgram()
-            address_factory = program.getAddressFactory()
-            new_base_addr = address_factory.getDefaultAddressSpace().getAddress(hex(base_address))
-            program.setImageBase(new_base_addr, True)
-
-        main_dictionary = {}
-        code_regions = _unpack_program(flat_api)
-        main_dictionary["metadata"] = {}
-        main_dictionary["metadata"]["backend"] = "ghidra"
-        main_dictionary["metadata"]["decompiled"] = decompiled
-        main_dictionary["metadata"]["path"] = program_file
-        if base_address is not None:
-            main_dictionary["metadata"]["base_address"] = base_address
-        with open(program_file, "rb") as fh:
-            data = fh.read()
-            md5_hash = hashlib.md5(data)
-            main_dictionary["metadata"]["hash"] = md5_hash.digest().hex()
-
-        for code_region in code_regions:
-            seg_key = f"seg_{code_region['virtual_address']}"
-            main_dictionary[seg_key] = code_region
-            func_cbs = _unpack_code_region(code_region, flat_api)
-            code_region["children"] = []
-
-            decomp_interface = DecompInterface()
-            prog_options = DecompileOptions()
-            prog_options.grabFromProgram(flat_api.getCurrentProgram())
-            decomp_interface.setOptions(prog_options)
-            init = decomp_interface.openProgram(flat_api.getCurrentProgram())
-            if not init:
-                raise RuntimeError("Could not open program for decompilation")
-
-            for func, cb in func_cbs:
-                cb_key = f"func_{cb['virtual_address']}"
-                code_region["children"].append(cb_key)
-                if decompiled:
-                    try:
-                        decompilation = _decompile(func, decomp_interface, TaskMonitor.DUMMY)
-                    except Exception as e:
-                        print(e, traceback.format_exc())
-                        decompilation = ""
-                    cb["decompilation"] = decompilation
-                bb_model = BasicBlockModel(flat_api.getCurrentProgram())
-                basic_blocks, data_words = _unpack_complex_block(
-                    func, flat_api, bb_model, BigInteger.ONE
+    try:
+        with pyghidra.open_program(program_file, language=language) as flat_api:
+            # Java packages must be imported after pyghidra.start or pyghidra.open_program


Not worth blocking this PR for this, but I wonder if this function can be refactored to be a bit smaller? I think that might make it a bit easier to read.

There are helper functions like _unpack_basic_block that get called, but there is still a lot of logic before and after those calls. Can any of that logic before and after the helper function calls be moved into the helper functions themselves?

rbs-jacob · 2025-10-22T18:26:31Z

disassemblers/ofrak_pyghidra/CHANGELOG.md

 ### Changed
 - Reduce the decompilation time of PyGhidra by reusing cached unpacking results. ([#623](https://github.com/redballoonsecurity/ofrak/pull/623))
 - Improve `ofrak_pyghidra` decompilation: more strings and symbol names for cross-references in decompilation. ([#633](https://github.com/redballoonsecurity/ofrak/pull/633))
+- Look for `ProgramAttributes` before unpacking. Do not pack when flushing to disk. Better error message when Ghidra raises "No load spec found". Added test for unpacking ihex with `ofrak_pyghidra` ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))


Suggested change

- Look for `ProgramAttributes` before unpacking. Do not pack when flushing to disk. Better error message when Ghidra raises "No load spec found". Added test for unpacking ihex with `ofrak_pyghidra` ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))

- Improve unpacking logic, error messages, and testing for `ofrak_pyghidra` auto analyzer ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))

rbs-jacob · 2025-10-22T18:27:24Z

ofrak_core/src/ofrak/core/ihex.py

+    children = (CodeRegion,)

    async def unpack(self, resource: Resource, config=None):
-        ihex_program = await resource.view_as(IhexProgram)
-        for seg_vaddr_range in ihex_program.segments:
-            # Segment is mapped into the program at an offset starting at the difference between
-            # the segment's vaddr range and the program's base address
-            segment_data_range = seg_vaddr_range.translate(-ihex_program.address_limits.start)
+        _, binfile = _binfile_analysis(await resource.get_data(), self)
+
+        for segment in binfile.segments:
+            segment_data = bytes(binfile.as_binary())[
+                segment.minimum_address
+                - binfile.minimum_address : segment.maximum_address
+                - binfile.minimum_address
+            ]
            await resource.create_child_from_view(
-                ProgramSection(seg_vaddr_range.start, seg_vaddr_range.length()),
-                data_range=segment_data_range,
+                CodeRegion(
+                    segment.minimum_address, segment.maximum_address - segment.minimum_address
+                ),
+                data=segment_data,


Is it necessarily the case that all unpacked IHEX segments are code regions? No data segments? I don't know enough about IHEX to say for sure, but wanted to confirm.

I generally like that Ihex now inherits from Program, but am uncertain about this particular change.

rbs-jacob · 2025-10-22T18:34:36Z

ofrak_core/CHANGELOG.md

 - Update `DataSummary` handling so that it is no longer stored in `ResourceModel`, but rather retrievable via `DataSummaryAnalyzer.get_data_summary` ([#598](https://github.com/redballoonsecurity/ofrak/pull/598))
 - Update dependencies: aiohttp>=3.12.14, beartype~=0.20.0, ubi-reader==0.8.12 ([#613](https://github.com/redballoonsecurity/ofrak/pull/613))
 - Add `ofrak_pyghidra` to OFRAK GUI script builder ([#631](https://github.com/redballoonsecurity/ofrak/pull/631))
+- rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))


Suggested change

- rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))

- Rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))

Might be worth adding a couple words on what justified the rewrite and/or what changed. Thinking from the perspective of someone reading the CHANGELOG, they're going to wonder whether and how this affects them, so we want to clarify that as much as possible up-front.

whyitfor · 2025-10-31T13:42:29Z

ofrak_core/tests/unit/test_ofrak_server.py

    assert project["session_id"] == id
+
+
+async def test_get_all_program_attributes(ofrak_client: TestClient):


We need docstrings for these tests. Look at examples.

Paul Noalhyt added 7 commits September 26, 2025 11:06

Add ProgramAttributes from the OFRAK GUI. Rewrote the ihex components

3a49e41

lint

3c13b3a

Fix icon

8f6f9da

Update changelogs

7658da9

Merge branch 'master' into feature/add-programattributes-gui

cbc8d61

Lint frontend

a8742c2

Fix mypy issue

550db43

whyitfor requested a review from rbs-afflitto October 20, 2025 13:55

rbs-afflitto approved these changes Oct 21, 2025

View reviewed changes

rbs-jacob requested changes Oct 22, 2025

View reviewed changes

whyitfor reviewed Oct 31, 2025

View reviewed changes

Merge branch 'master' into feature/add-programattributes-gui

6e38d0f

	- Look for `ProgramAttributes` before unpacking. Do not pack when flushing to disk. Better error message when Ghidra raises "No load spec found". Added test for unpacking ihex with `ofrak_pyghidra` ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))
	- Improve unpacking logic, error messages, and testing for `ofrak_pyghidra` auto analyzer ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))

	- rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))
	- Rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637))

		assert project["session_id"] == id


		async def test_get_all_program_attributes(ofrak_client: TestClient):

Add ProgramAttributes from the GUI #637

Are you sure you want to change the base?

Add ProgramAttributes from the GUI #637

Uh oh!

Conversation

paulnoalhyt commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Allow users to add ProgramAttributes from the GUI:

Rewriting of the ihex components (analyzer/packer/unpacker):

some context about how ELF files are handled in OFRAK:

some context about how ihex files were handled before this PR:

proposed rewriting:

Bug fixes/improvements:

Uh oh!

rbs-afflitto left a comment

Choose a reason for hiding this comment

Uh oh!

rbs-jacob left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rbs-jacob Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add `ProgramAttributes` from the GUI #637

Add `ProgramAttributes` from the GUI #637

paulnoalhyt commented Sep 26, 2025 •

edited

Loading

Allow users to add `ProgramAttributes` from the GUI:

rbs-jacob Oct 22, 2025 •

edited

Loading