Skip to content

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Sep 19, 2025

When is provided, the user (probably) expects the data to be saved on disk upon method call. But the current implementation forced the user to iterate over the results to force the annotation to actually happen.

So this change allows the method to materialise the list internally to force the annotation to happen and results to be saved on disk. Additionally, it adds 2 other options:

  1. The lazy iteration (what happens when no is provided) where the iteration of data is left to the user
  2. The combined / saved and return option where the results are materialised, but also yielded. Notably, this will take up a lot of memory if/when used with large data sets

Instead of the above, the current version of the PR introduces a new method:
CAT.save_entities_multi_texts that:
a) requires a save_dir_path
b) always saves data on disk
c) never returns anything

The save + return option is still (kind of) available with get_entities_multi_texts, but it still requires manual iteration over the result.

This appraoch separates this concern into a separate method and makes the get_entities_multi_texts method less complex.

When  is provided, the user (probably) expects the data to be saved on disk upon method call.
But the current implementation forced the user to iterate over the results to force the annotation to actually happen.
So this change allows the method to materialise the list internally to force the annotation to happen and results to be saved on disk.
Additionally, it adds 2 other options:
1. The lazy iteration (what happens when no  is provided) where the iteration of data is left to the user
2. The combined / saved and return option where the results are materialised, but also yielded. Notably, this will take up a lot of memory if/when used with large data sets
@tomolopolis
Copy link
Member

The previous implementation would always consider the method a generator. And as such, the work would never be done
at call time, regardless of whether or not the  was provided. This commit fixes that by making the
wrapper method a regular method that (sometimes) returns the iterator and other times just a (potentially empty) list.
Copy link
Collaborator

@alhendrickson alhendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@alhendrickson alhendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@mart-r mart-r merged commit 228aab9 into main Sep 24, 2025
20 checks passed
@mart-r mart-r deleted the feat/medcat/CU-869ahw0mw-allow-mp-to-realise-iterator branch September 24, 2025 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants