- 
                Notifications
    
You must be signed in to change notification settings  - Fork 145
 
          Add ProgramAttributes from the GUI
          #637
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Approved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor suggestions
| throw Error(JSON.stringify(await r.json(), undefined, 2)); | ||
| } | ||
| //console.log(r.json()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| //console.log(r.json()) | 
| r.json().then((ofrakProgramAttributes) => { | ||
| ofrakProgramAttributesPromise = ofrakProgramAttributes; | ||
| }); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this whole function is async, you can await r.json() here without using a .then callback.
| onMount(async () => { | ||
| try { | ||
| await fetch(`${$settings.backendUrl}/get_all_program_attributes`).then( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this whole function is async, you can await fetch here without using a .then callback.
| selected_attribute["isa"] && | ||
| selected_attribute["bit_width"] && | ||
| selected_attribute["endianness"] | ||
| ) { | ||
| modifierView = undefined; | ||
| let program_attributes = JSON.stringify([ | ||
| "ofrak.core.architecture.ProgramAttributes", | ||
| { | ||
| isa: selected_attribute["isa"], | ||
| sub_isa: selected_attribute["sub_isa"], | ||
| bit_width: selected_attribute["bit_width"], | ||
| endianness: selected_attribute["endianness"], | ||
| processor: selected_attribute["processor"], | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In JavaScript, object field access is equivalent to dictionary-style field access. So selected_attribute.isa is the same as selected_attribute["isa"].
Might be worth changing these to the more concise syntax?
| except Exception as e: | ||
| if "toString" in dir(e) and "No load spec found" in e.toString(): | ||
| raise Exception( | ||
| e.toString() | ||
| + "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!" | ||
| ) | ||
| else: | ||
| raise e | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| except Exception as e: | |
| if "toString" in dir(e) and "No load spec found" in e.toString(): | |
| raise Exception( | |
| e.toString() | |
| + "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!" | |
| ) | |
| else: | |
| raise e | |
| except <SOMETHING MORE SPECIFIC> as e: | |
| if "toString" in dir(e) and "No load spec found" in e.toString(): | |
| raise <SOMETHING MORE SPECIFIC>( | |
| e.toString() | |
| + "\nTry adding ProgramAttributes to you binary before running a Ghidra analyzer/unpacker!" | |
| ) | |
| else: | |
| raise | 
We want to avoid catching raw Exceptions if possible. We also want to avoid raising them. We should probably replace these with a more specific Exception instance such as RuntimeError, ValueError, or something even more specific.
Worth changing in other parts of the code where you see this, as well.
| def unpack(program_file, decompiled, language=None, base_address=None): | ||
| with pyghidra.open_program(program_file, language=language) as flat_api: | ||
| # Java packages must be imported after pyghidra.start or pyghidra.open_program | ||
| from ghidra.app.decompiler import DecompInterface, DecompileOptions | ||
| from ghidra.util.task import TaskMonitor | ||
| from ghidra.program.model.block import BasicBlockModel | ||
| from ghidra.program.model.symbol import RefType | ||
| from java.math import BigInteger | ||
| 
               | 
          ||
| # If base_address is provided, rebase the program | ||
| if base_address is not None: | ||
| # Convert base_address to int if it's a string | ||
| if isinstance(base_address, str): | ||
| if base_address.startswith("0x"): | ||
| base_address = int(base_address, 16) | ||
| else: | ||
| base_address = int(base_address) | ||
| 
               | 
          ||
| # Rebase the program to the specified base address | ||
| program = flat_api.getCurrentProgram() | ||
| address_factory = program.getAddressFactory() | ||
| new_base_addr = address_factory.getDefaultAddressSpace().getAddress(hex(base_address)) | ||
| program.setImageBase(new_base_addr, True) | ||
| 
               | 
          ||
| main_dictionary = {} | ||
| code_regions = _unpack_program(flat_api) | ||
| main_dictionary["metadata"] = {} | ||
| main_dictionary["metadata"]["backend"] = "ghidra" | ||
| main_dictionary["metadata"]["decompiled"] = decompiled | ||
| main_dictionary["metadata"]["path"] = program_file | ||
| if base_address is not None: | ||
| main_dictionary["metadata"]["base_address"] = base_address | ||
| with open(program_file, "rb") as fh: | ||
| data = fh.read() | ||
| md5_hash = hashlib.md5(data) | ||
| main_dictionary["metadata"]["hash"] = md5_hash.digest().hex() | ||
| 
               | 
          ||
| for code_region in code_regions: | ||
| seg_key = f"seg_{code_region['virtual_address']}" | ||
| main_dictionary[seg_key] = code_region | ||
| func_cbs = _unpack_code_region(code_region, flat_api) | ||
| code_region["children"] = [] | ||
| 
               | 
          ||
| decomp_interface = DecompInterface() | ||
| prog_options = DecompileOptions() | ||
| prog_options.grabFromProgram(flat_api.getCurrentProgram()) | ||
| decomp_interface.setOptions(prog_options) | ||
| init = decomp_interface.openProgram(flat_api.getCurrentProgram()) | ||
| if not init: | ||
| raise RuntimeError("Could not open program for decompilation") | ||
| 
               | 
          ||
| for func, cb in func_cbs: | ||
| cb_key = f"func_{cb['virtual_address']}" | ||
| code_region["children"].append(cb_key) | ||
| if decompiled: | ||
| try: | ||
| decompilation = _decompile(func, decomp_interface, TaskMonitor.DUMMY) | ||
| except Exception as e: | ||
| print(e, traceback.format_exc()) | ||
| decompilation = "" | ||
| cb["decompilation"] = decompilation | ||
| bb_model = BasicBlockModel(flat_api.getCurrentProgram()) | ||
| basic_blocks, data_words = _unpack_complex_block( | ||
| func, flat_api, bb_model, BigInteger.ONE | ||
| try: | ||
| with pyghidra.open_program(program_file, language=language) as flat_api: | ||
| # Java packages must be imported after pyghidra.start or pyghidra.open_program | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not worth blocking this PR for this, but I wonder if this function can be refactored to be a bit smaller? I think that might make it a bit easier to read.
There are helper functions like _unpack_basic_block that get called, but there is still a lot of logic before and after those calls. Can any of that logic before and after the helper function calls be moved into the helper functions themselves?
| ### Changed | ||
| - Reduce the decompilation time of PyGhidra by reusing cached unpacking results. ([#623](https://github.com/redballoonsecurity/ofrak/pull/623)) | ||
| - Improve `ofrak_pyghidra` decompilation: more strings and symbol names for cross-references in decompilation. ([#633](https://github.com/redballoonsecurity/ofrak/pull/633)) | ||
| - Look for `ProgramAttributes` before unpacking. Do not pack when flushing to disk. Better error message when Ghidra raises "No load spec found". Added test for unpacking ihex with `ofrak_pyghidra` ([#637](https://github.com/redballoonsecurity/ofrak/pull/637)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Look for `ProgramAttributes` before unpacking. Do not pack when flushing to disk. Better error message when Ghidra raises "No load spec found". Added test for unpacking ihex with `ofrak_pyghidra` ([#637](https://github.com/redballoonsecurity/ofrak/pull/637)) | |
| - Improve unpacking logic, error messages, and testing for `ofrak_pyghidra` auto analyzer ([#637](https://github.com/redballoonsecurity/ofrak/pull/637)) | 
| children = (CodeRegion,) | ||
| 
               | 
          ||
| async def unpack(self, resource: Resource, config=None): | ||
| ihex_program = await resource.view_as(IhexProgram) | ||
| for seg_vaddr_range in ihex_program.segments: | ||
| # Segment is mapped into the program at an offset starting at the difference between | ||
| # the segment's vaddr range and the program's base address | ||
| segment_data_range = seg_vaddr_range.translate(-ihex_program.address_limits.start) | ||
| _, binfile = _binfile_analysis(await resource.get_data(), self) | ||
| 
               | 
          ||
| for segment in binfile.segments: | ||
| segment_data = bytes(binfile.as_binary())[ | ||
| segment.minimum_address | ||
| - binfile.minimum_address : segment.maximum_address | ||
| - binfile.minimum_address | ||
| ] | ||
| await resource.create_child_from_view( | ||
| ProgramSection(seg_vaddr_range.start, seg_vaddr_range.length()), | ||
| data_range=segment_data_range, | ||
| CodeRegion( | ||
| segment.minimum_address, segment.maximum_address - segment.minimum_address | ||
| ), | ||
| data=segment_data, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessarily the case that all unpacked IHEX segments are code regions? No data segments? I don't know enough about IHEX to say for sure, but wanted to confirm.
I generally like that Ihex now inherits from Program, but am uncertain about this particular change.
| - Update `DataSummary` handling so that it is no longer stored in `ResourceModel`, but rather retrievable via `DataSummaryAnalyzer.get_data_summary` ([#598](https://github.com/redballoonsecurity/ofrak/pull/598)) | ||
| - Update dependencies: aiohttp>=3.12.14, beartype~=0.20.0, ubi-reader==0.8.12 ([#613](https://github.com/redballoonsecurity/ofrak/pull/613)) | ||
| - Add `ofrak_pyghidra` to OFRAK GUI script builder ([#631](https://github.com/redballoonsecurity/ofrak/pull/631)) | ||
| - rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637)) | |
| - Rewrote intel hex components ([#637](https://github.com/redballoonsecurity/ofrak/pull/637)) | 
Might be worth adding a couple words on what justified the rewrite and/or what changed. Thinking from the perspective of someone reading the CHANGELOG, they're going to wonder whether and how this affects them, so we want to clarify that as much as possible up-front.
| assert project["session_id"] == id | ||
| 
               | 
          ||
| 
               | 
          ||
| async def test_get_all_program_attributes(ofrak_client: TestClient): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need docstrings for these tests. Look at examples.
One sentence summary of this PR (This should go in the CHANGELOG!)
Add ProgramAttributes from the OFRAK GUI. Rewrote the ihex components.
Please describe the changes in your request.
Allow users to add
ProgramAttributesfrom the GUI:Allow users to add
ProgramAttributesto a resource in the OFRAK GUI. This is particularly useful when dealing with Intel Hex files which do not contain info about the CPU architecture, making them impossible to analyze in the GUI before.ProgramAttributesfields (InstructionSet,SubInstructionSet,BitWidth,Endianness,ProcessorType).ProgramAttributesto a resource.test_pyghidra_components.py: added test for unpacking ihex file with pyghidra, after addingProgramAttributes.Rewriting of the ihex components (analyzer/packer/unpacker):
some context about how ELF files are handled in OFRAK:
Programtag, so when we analyze an ELF with a backend, we give the backend the full ELF with headers, etc.CodeRegionsthat can be unpacked by the backend.some context about how ihex files were handled before this PR:
Ihexis just considered as aGenericFile.IhexProgramgets theProgramtag. It's the binary content of all sections present in the ihex (binary blob of concatenated sections). This is given to the backend. The issue is that the backend doesn't receive the ihex file, so it has no idea about the sections present in this binary blob.IhexUnpackerdoesn't createCodeRegionsbut anIhexProgram(binary blob mentioned above).IhexProgramUnpackerthen createsProgramSectionswhich are not reallyCodeRegions.ProgramSections. For example, thePyGhidraCodeRegionUnpackeris made to run onCodeRegionsand notProgramSections.proposed rewriting:
Ihexfile gets theProgramtag. That way the full ihex file is given to the backend for analysis.IhexUnpackercreatedCodeRegions, like theElfUnpacker. This makes sense because when loading an ihex file in Ghidra, it assumes that all sections are RWX.CodeRegionsdata.Happy to discuss this rewriting in comments!
Bug fixes/improvements:
pyghidra_components.py:PyGhidraAutoAnalyzer, the resource was flushed to disk, so it was packed. This removes all child resources that were created before running thePyGhidraAutoAnalyzer. This was an issue if for example, you load an ihex file, unpack it (so it createsCodeRegions), then rununpack_recursively(). As the binary was packed by thePyGhidraAutoAnalyzer, theCodeRegionswere be deleted, and theunpack_recursivelyfailed because theCodeRegionsit was supposed to unpack disapeared. To avoid that, I putpack=Falseinflush_data_to_disk. This solves my issue, but is not ideal (what if the user modified some data in the ihex before running thePyGhidraAutoAnalyzer?!). Happy to discuss other ideas to solve this issue!PyGhidraAutoAnalyzernow checks forProgramAttributesbefore runningunpack.pyghidra_analysis.py: added better error message when hittingghidra.app.util.opinion.LoadException: No load spec found, to give the user a hint that they should addProgramAttributes.AddTagView.svelteandAddProgramAttributesView.svelte: updatedrefreshResource()so that the attribute panel in the GUI is actually refreshed with the new tag/ProgramAttributes.Anyone you think should look at this, specifically?
No