Skip to content

Conversation

tomolopolis
Copy link
Member

  • merge_cdb: migrated from v1, refactored and tested for v2
  • other utils: ported from trainer / medcat_utils.py, documented and tested , as they are to be used elsewhere

@mart-r
Copy link
Collaborator

mart-r commented Oct 15, 2025

Looks like now it's just a failure of the new test. Seems like it was failing before as well, but it got buried under everything else that was happening workflow wise.

@tomolopolis
Copy link
Member Author

tomolopolis commented Oct 15, 2025

ah yep will fix. Looks like an issue with the test CDB.

Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. But looks good overall.

)

# Should have 3 concepts total
self.assertEqual(len(merged_cdb.cui2info), 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do this check dynamic - add together the sets of keys from both and len them for comparison.

all_links = []
if cui not in cuis2nodes:
# Get preferred name from the new CDB structure
preferred_name = cdb.cui2info.get(cui, {}).get('preferred_name', cui)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just use cdb.get_name for this.

'links': all_links
}
except KeyError as e:
logger.warning(f'Cannot find path concept path:{e}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to include exception in exc_info=e instead of printing out only the message.
Or if only the message is important, then format in a lazy way (i.e include %s in string and pass e as argument like "...path: %s", e)).

# Should have 3 concepts total
self.assertEqual(len(merged_cdb.cui2info), 3)

# CUI1 should be merged
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's all hard coded. Couldn't we just iterate over a) both the merged CDBs and b) all CUIs in each, and then do the checks under a self.subTest?
Wouldn't need a separate section for each CUI that way.

)

# Should be identical to cdb2
self.assertEqual(len(merged_cdb.cui2info), len(self.cdb2.cui2info))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the entire dicts be equal If there's nothing added from the empty one?

@tomolopolis tomolopolis merged commit 260d63c into main Oct 16, 2025
21 of 22 checks passed
@tomolopolis tomolopolis deleted the cdb_utils branch October 16, 2025 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants