Skip to content

Conversation

@TwoUnderscorez
Copy link

I attempted to export a fairly large idb of a jar file and it was taking forever, as well as filling my RAM and swap to 100%, at which point the Linux kernel decided to terminate IDA Pro.

I started digging around the code and realized that either an invalid address was getting sent to GetString , or I guess it was hitting an invalid address due to a non-null terminated string. In either case, it was causing an infinite loop.

It turns out that if we pass an invalid address to get_byte, it returns 0xFF, so I added that to the break condition which allowed me to export my database.

I'm not sure this is the best fix because I'm getting the following error when trying to load the BinExport into python:

IPython traceback
Python 3.13.5 (main, Jun 11 2025, 15:36:57) [GCC 14.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.3.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: The `%timeit` magic has a `-o` flag, which returns the results, making it easy to plot. See `%timeit?`.

In [1]: from bindiff import BinDiff

In [2]: diff = BinDiff('artifacts/...BinExport', 'artifacts/...BinExport', 'artifacts/...BinDiff')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[2], line 1
----> 1 diff = BinDiff('artifacts/...BinExport', 'artifacts/...BinExport', 'artifacts/...BinDiff'

File .../venv/lib/python3.13/site-packages/bindiff/bindiff.py:95, in BinDiff.__init__(self, primary, secondary, diff_file)
     92 super(BinDiff, self).__init__(diff_file)
     94 #: Primary BinExport object
---> 95 self.primary = ProgramBinExport(primary) if isinstance(primary, str) else primary
     96 #: Secondary BinExport object
     97 self.secondary = ProgramBinExport(secondary) if isinstance(secondary, str) else secondary

File .../venv/lib/python3.13/site-packages/binexport/program.py:39, in ProgramBinExport.__init__(self, file)
     36 self.path: pathlib.Path = pathlib.Path(file)  #: Binexport file path
     38 with open(file, "rb") as f:
---> 39     self._pb.ParseFromString(f.read())
     40 self.mask = 0xFFFFFFFF if self.architecture.endswith("32") else 0xFFFFFFFFFFFFFFFF
     41 self.fun_names: dict[str, FunctionBinExport] = {}  #: dictionary function name -> name

File .../venv/lib/python3.13/site-packages/google/protobuf/message.py:230, in Message.ParseFromString(self, serialized)
    222 """Parse serialized protocol buffer data in binary form into this message.
    223
    224 Like :func:`MergeFromString()`, except we clear the object first.
   (...)    227   message.DecodeError if the input cannot be parsed.
    228 """
    229 self.Clear()
--> 230 return self.MergeFromString(serialized)

File .../venv/lib/python3.13/site-packages/google/protobuf/internal/python_message.py:1189, in _AddMergeFromStringMethod.<locals>.MergeFromString(self, serialized)
   1187 length = len(serialized)
   1188 try:
-> 1189   if self._InternalParse(serialized, 0, length) != length:
   1190     # The only reason _InternalParse would return early is if it
   1191     # encountered an end-group tag.
   1192     raise message_mod.DecodeError('Unexpected end-group tag.')
   1193 except (IndexError, TypeError):
   1194   # Now ord(buf[p:p+1]) == ord('') gets TypeError.

File .../venv/lib/python3.13/site-packages/google/protobuf/internal/python_message.py:1248, in _AddMergeFromStringMethod.<locals>.InternalParse(self, buffer, pos, end, current_depth)
   1246 _MaybeAddDecoder(cls, field_des)
   1247 field_decoder = field_des._decoders[is_packed]
-> 1248 pos = field_decoder(
   1249     buffer, new_pos, end, self, field_dict, current_depth
   1250 )
   1251 if field_des.containing_oneof:
   1252   self._UpdateOneofState(field_des)

File .../venv/lib/python3.13/site-packages/google/protobuf/internal/decoder.py:774, in MessageDecoder.<locals>.DecodeRepeatedField(buffer, pos, end, message, field_dict, current_depth)
    769 if current_depth > _recursion_limit:
    770   raise _DecodeError(
    771       'Error parsing message: too many levels of nesting.'
    772   )
    773 if (
--> 774     value.add()._InternalParse(buffer, pos, new_pos, current_depth)
    775     != new_pos
    776 ):
    777   # The only reason _InternalParse would return early is if it
    778   # encountered an end-group tag.
    779   raise _DecodeError('Unexpected end-group tag.')
    780 current_depth -= 1

File .../venv/lib/python3.13/site-packages/google/protobuf/internal/python_message.py:1248, in _AddMergeFromStringMethod.<locals>.InternalParse(self, buffer, pos, end, current_depth)
   1246 _MaybeAddDecoder(cls, field_des)
   1247 field_decoder = field_des._decoders[is_packed]
-> 1248 pos = field_decoder(
   1249     buffer, new_pos, end, self, field_dict, current_depth
   1250 )
   1251 if field_des.containing_oneof:
   1252   self._UpdateOneofState(field_des)

File .../venv/lib/python3.13/site-packages/google/protobuf/internal/decoder.py:629, in StringDecoder.<locals>.DecodeField(***failed resolving arguments***)
    627   field_dict.pop(key, None)
    628 else:
--> 629   field_dict[key] = _ConvertToUnicode(buffer[pos:new_pos])
    630 return new_pos

File .../venv/lib/python3.13/site-packages/google/protobuf/internal/decoder.py:585, in StringDecoder.<locals>._ConvertToUnicode(memview)
    583 byte_str = memview.tobytes()
    584 try:
--> 585   value = str(byte_str, 'utf-8')
    586 except UnicodeDecodeError as e:
    587   # add more information to the error message and re-raise it.
    588   e.reason = '%s in field: %s' % (e, key.full_name)

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 1-2: 'utf-8' codec can't decode bytes in position 1-2: unexpected end of data in field: BinExport2.Expression.symbol

It's not too bad because you can just:

import codecs
codecs.register_error("strict", codecs.ignore_errors)

But it's not ideal. Let me know if there is a better way to fix this. Perhaps use a different function from the IDA API, get_bytes?

In GetString there is a call to IDA's `get_byte` in a loop till a null is reached.

Given that `get_byte` returns 0xFF if an invalid address is requested, there is chance for an infinite loop here.

To fix that, add a break if `get_byte` returns 0xFF.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant