Skip to content
This repository was archived by the owner on Jul 28, 2025. It is now read-only.

Conversation

@mart-r
Copy link
Collaborator

@mart-r mart-r commented Sep 9, 2024

Previously usage monitoring wouldn't work for multiprocessing.

The issue was 2-fold.

  1. For CAT.get_entities_multi_texts and CAT.multiprocessing_batch_docs_size (which uses the former internally)
    • Uses the CAT.pipe and Pipe.batch_multi_processHad directly
    • Had to add logging to the input/output separately
  2. For CAT.multiprocessing_batch_char_size
    • Runs on different processes, which means that different instances are used
      • And (for some reason) cleanup (i.e a __del__ call - which would flush automatically) is not done
    • Had to flush the usage monitor (i.e write to disk) at the end of each multiprocessing method (CAT._mp_cons) explicitly

The PR also adds tests to make sure the monitoring actually works as well. The 3 added tests failed in the previous state of the project.

When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method.
When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output
@tomolopolis
Copy link
Member

…sig.

Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.
Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mart-r mart-r merged commit eb912d6 into master Sep 17, 2024
mart-r added a commit to mart-r/MedCAT that referenced this pull request Oct 14, 2024
* CU-8695pvhfe: Rename a test class

* CU-8695pvhfe: Add tests for multiprocessig usage monitoring

* CU-8695pvhfe: Fix usage monitor for multiprocessig.

When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method.
When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output

* CU-8695pvhfe: Fix remaining issues with usage monitor for multiprocessig.

Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.
@mart-r mart-r deleted the CU-8695pvhfe-fix-usage-monitor-mp branch November 18, 2024 16:23
alhendrickson pushed a commit to CogStack/cogstack-nlp that referenced this pull request Jul 1, 2025
…T#488)

* CU-8695pvhfe: Rename a test class

* CU-8695pvhfe: Add tests for multiprocessig usage monitoring

* CU-8695pvhfe: Fix usage monitor for multiprocessig.

When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method.
When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output

* CU-8695pvhfe: Fix remaining issues with usage monitor for multiprocessig.

Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.
alhendrickson pushed a commit to CogStack/cogstack-nlp that referenced this pull request Jul 1, 2025
…T#488)

* CU-8695pvhfe: Rename a test class

* CU-8695pvhfe: Add tests for multiprocessig usage monitoring

* CU-8695pvhfe: Fix usage monitor for multiprocessig.

When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method.
When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output

* CU-8695pvhfe: Fix remaining issues with usage monitor for multiprocessig.

Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants