Skip to content

Conversation

@myd7349
Copy link

@myd7349 myd7349 commented Jan 7, 2026

When I attempted to read the patient metadata using the following code, an exception was raised:

from pprint import pprint

import mffpy

reader = mffpy.Reader(r"./examples/example_2.mff") 
with reader.directory.filepointer("subject") as fp:
    subject_xml = mffpy.XML.from_file(fp)
    fields = subject_xml.get_content()
    pprint(fields)

This results in the following traceback:

Traceback (most recent call last):
  File "d:\mffpy\mffpy\cached_property.py", line 34, in _cached_property
    return getattr(self, cached_name)
AttributeError: 'Patient' object has no attribute '_cached_fields'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\mffpy\test.py", line 6, in <module>
    fields = subject_xml.get_content()
  File "d:\mffpy\mffpy\xml_files.py", line 462, in get_content
    'fields': self.fields
              ^^^^^^^^^^^
Traceback (most recent call last):
  File "d:\mffpy\test.py", line 6, in <module>
    fields = subject_xml.get_content()
  File "d:\mffpy\mffpy\xml_files.py", line 462, in get_content
    'fields': self.fields
              ^^^^^^^^^^^
  File "d:\mffpy\test.py", line 6, in <module>
    fields = subject_xml.get_content()
  File "d:\mffpy\mffpy\xml_files.py", line 462, in get_content
    'fields': self.fields
              ^^^^^^^^^^^
    fields = subject_xml.get_content()
  File "d:\mffpy\mffpy\xml_files.py", line 462, in get_content
    'fields': self.fields
              ^^^^^^^^^^^
  File "d:\mffpy\mffpy\xml_files.py", line 462, in get_content
    'fields': self.fields
              ^^^^^^^^^^^
    'fields': self.fields
              ^^^^^^^^^^^
  File "d:\mffpy\mffpy\cached_property.py", line 36, in _cached_property
    ans = fn(self)
              ^^^^^^^^^^^
  File "d:\mffpy\mffpy\cached_property.py", line 36, in _cached_property
    ans = fn(self)
  File "d:\mffpy\mffpy\cached_property.py", line 36, in _cached_property
    ans = fn(self)
  File "d:\mffpy\mffpy\xml_files.py", line 439, in fields
    ans = fn(self)
  File "d:\mffpy\mffpy\xml_files.py", line 439, in fields
  File "d:\mffpy\mffpy\xml_files.py", line 439, in fields
    data = self._type_converter[data.get('dataType')](data.text)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'date'

The root cause of the issue can be found in the following section of the code:

mffpy/mffpy/xml_files.py

Lines 420 to 441 in 2509423

class Patient(XML):
_xmlns = r'{http://www.egi.com/subject_mff}'
_xmlroottag = r'patient'
_default_filename = 'subject.xml'
_type_converter = {
'string': str,
None: lambda x: x
}
@cached_property
def fields(self):
ans = {}
for field in self.find('fields'):
assert self.nsstrip(field.tag) == 'field', f"""
Unknown field with tag {self.nsstrip(field.tag)}"""
name = self.find('name', field).text
data = self.find('data', field)
data = self._type_converter[data.get('dataType')](data.text)
ans[name] = data
return ans

As shown there, _type_converter only defines converters for the keys 'string' and None. However, patient metadata fields are not limited to these two data types. As a result, when encountering a field with a dataType such as 'date', the following line:

data = self._type_converter[data.get('dataType')](data.text)

attempts to access a non-existent key in _type_converter, which leads to a KeyError.

In this PR, dict.get(key, default) is used to make the conversion more robust. When the specified dataType is not found in _type_converter, the logic now falls back to the default converter associated with None (i.e., lambda x: x), thereby avoiding the exception and preserving the original value.

@myd7349
Copy link
Author

myd7349 commented Jan 7, 2026

In addition, I noticed another related issue in mffpy regarding the handling of empty string values in XML.

Currently, when a <string> field in the XML is empty, mffpy converts it to the literal string "None" rather than the Python value None. This behavior appears to originate from the following converter definition:

'string': str,

Since str(None) evaluates to "None", an empty or missing XML value that is represented as None ends up being converted into the string "None".

I am wondering whether one of the following approaches would be more appropriate:

'string': lambda x: x if x is None else str(x),

or even simply:

'string': lambda x: x,

Both options preserve None values as None, instead of converting them into the string "None". The latter approach also avoids unnecessary type coercion, since XML text content is already a string when present.

I have observed the same issue in other XML string fields as well, not just in patient metadata. Unifying the string handling logic in this way may therefore improve overall consistency and reduce surprising behavior across the XML parsing layer.

@myd7349 myd7349 force-pushed the fix-type-converter branch from 4a7eea3 to a95b212 Compare January 7, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant