Skip to content
This repository was archived by the owner on Jul 28, 2025. It is now read-only.
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: CogStack/MedCAT
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.13.2
Choose a base ref
...
head repository: CogStack/MedCAT
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v1.14.0
Choose a head ref
  • 20 commits
  • 29 files changed
  • 2 contributors

Commits on Aug 30, 2024

  1. Production/master sync (#483)

    mart-r authored Aug 30, 2024
    Configuration menu
    Copy the full SHA
    b8bb4e3 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2024

  1. Configuration menu
    Copy the full SHA
    b28fa05 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2024

  1. Pushing bug fix for metacat (#487)

    * Pushing bug fix for metacat
    
    2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data
    
    * Pushing change for lazy logging
    
    * Pushing update for lazy logging
    
    * Pushing lint fix
    shubham-s-agarwal authored Sep 5, 2024
    Configuration menu
    Copy the full SHA
    6127f77 View commit details
    Browse the repository at this point in the history

Commits on Sep 16, 2024

  1. Configuration menu
    Copy the full SHA
    2588670 View commit details
    Browse the repository at this point in the history
  2. CU-8695uhe5n: Update docs dependency pins (#491)

    * CU-8695uhe5n: Update docs dependency pins
    
    * CU-8695uhe5n: Fix typo in fsspec version pin
    mart-r authored Sep 16, 2024
    Configuration menu
    Copy the full SHA
    56a2856 View commit details
    Browse the repository at this point in the history

Commits on Sep 17, 2024

  1. Configuration menu
    Copy the full SHA
    394e17b View commit details
    Browse the repository at this point in the history
  2. CU-8695pvhfe fix usage monitoring for multiprocessing (#488)

    * CU-8695pvhfe: Rename a test class
    
    * CU-8695pvhfe: Add tests for multiprocessig usage monitoring
    
    * CU-8695pvhfe: Fix usage monitor for multiprocessig.
    
    When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method.
    When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output
    
    * CU-8695pvhfe: Fix remaining issues with usage monitor for multiprocessig.
    
    Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.
    mart-r authored Sep 17, 2024
    Configuration menu
    Copy the full SHA
    eb912d6 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2024

  1. CU-8695knfbg Add name edits to regression suite (#486)

    * CU-8695knfbg: Decouple the edit finder methods from the spell checker
    
    * CU-8695knfbg: Add methods for random edit picking and variant estimation to utils; Plus a few tests
    
    * CU-8695knfbg: Add edit distance option and use to CLI
    
    * CU-8695knfbg: Allow retaining order of elements in generator when getting edits for run-to-run consistency
    
    * CU-8695knfbg: Add safeguard for name order to be consistent across runs
    
    * CU-8695knfbg: Sort names when getting from CDB to avoid run to run variance
    
    * CU-8695knfbg: Move edit finding methods back to BasicSpellChecker class, but make the 1-distance method a class method
    
    * CU-8695knfbg: Move validation earlier in edit finder
    
    * CU-8695knfbg: Simplify edit finder somewhat
    mart-r authored Sep 30, 2024
    Configuration menu
    Copy the full SHA
    cbae5b3 View commit details
    Browse the repository at this point in the history
  2. CU-869574kvp update snomed preprocessing naming (#469)

    * CU-869574kvp: Add pattern based release version identifying for Snomed preprocessing
    
    * CU-869574kvp: Add tests for pattern-based snomed release identification
    
    * CU-869574kvp: Update Snomed preprocessing:
    
    Separate extensions into an Enum.
    Do the release/paths check at init to allow for early failures in case of issues
    
    * CU-869574kvp: Simplify mappings somewhat.
    
    Move common avoids to a common location.
    Fix UK Drug relationship name
    
    * CU-869574kvp: Simplify mappings somewhat more.
    
    Remove some clutter by separating common prefixes for release types and file names.
    
    * CU-869574kvp: Simplify mappings somewhat more, agai.
    
    Remove some clutter by separating common suffixes for release types.
    
    * CU-869574kvp: Update preprocessing.
    
    New abstraction. Use supprted extensions which describe their file formats along with bundles which give some further insight and control.
    
    * CU-869574kvp: Fix data class init
    
    * CU-869574kvp: Fix issue with file paths
    
    * CU-869574kvp: Fix a UK Clinical description file path
    
    * CU-869574kvp: Add (optional) 2nd part of folder name to extension.
    
    For AU models, the folder name seems to be 'SnomedCT_Release_AU1000036_20240630T120000Z', so the 1st part is just 'Release' and the 2nd part is indicative of AU.
    Add usage of this where relevant.
    
    * CU-869574kvp: Fix preprocessing tests.
    
    Add patch for files/folders where applicable.
    Change the paths of attributes where applicable.
    mart-r authored Sep 30, 2024
    Configuration menu
    Copy the full SHA
    a9544f7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b433195 View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2024

  1. CU-8695ucw9b deid transformers fix (#490)

    * CU-8695ucw9b: Fix older DeID models due to changes in transformers.
    
    Since transformers 4.42.0, the tokenizer is expected to have the 'split_special_tokens' attribute. But the version we've saved does not. So when it's loaded, this causes an exception to be raised (which is currently caught and logged by medcat).
    
    * CU-8695ucw9b: Add functionality for transformers NER to spectacularly fail upon consistent consecutive exceptions.
    
    The idea is that this way, if something in the underlying models is consistently failing, the exception is raised rather than simply logged
    
    * CU-8695ucw9b: Add tests for exception raising after a pre-defined number of failed document processes
    
    * CU-8695ucw9b: Change conditions for raising exception on consecutive failure.
    
    Now only raise the exception if the consecutive failure is identical (or similar). We determine that from the type and string-representation of the exception being raised.
    
    * CU-8695ucw9b: Small additional cleanup on successful TNER processing
    
    * CU-8695ucw9b: Use custom exception when failing due to consecutive exceptions
    
    * CU-8695ucw9b: Remove try-except when processing transformers NER to force immediate raising of exception
    mart-r authored Oct 7, 2024
    Configuration menu
    Copy the full SHA
    44db08b View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2024

  1. Configuration menu
    Copy the full SHA
    909cfad View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2024

  1. MetaCAT fixes and upgrades (#495)

    * MetaCAT fixes and upgrades
    
    Pushing for 3 updates:
    1) Removed the check and update for labels with zero data, as this was causing issues during evaluation
    2) Resolved an issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3)
    3) Updated the attention mask creation to dynamically use the actual pad_idx value instead of assuming it to be 0
    
    * Pushing type fix
    
    * Pushing for type fix
    
    * Fixing type issues
    
    * Pushing change
    
    * Pushing update w/o try except block
    
    For the issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3), pushing an optimized version w/o the try except block
    shubham-s-agarwal authored Oct 14, 2024
    Configuration menu
    Copy the full SHA
    3e01747 View commit details
    Browse the repository at this point in the history

Commits on Oct 28, 2024

  1. CU-869671bn4: Update requirements to fix workflow issue due to mypy (#…

    …497)
    
    * CU-869671bn4: Update requirements (GHA should fail due to mypy)
    
    * CU-869671bn4: Update mypy dev requirement to be less than 1.12
    mart-r authored Oct 28, 2024
    Configuration menu
    Copy the full SHA
    976adc2 View commit details
    Browse the repository at this point in the history
  2. CU-86967nnra Drop python 3.8 support (EoL) (#498)

    * CU-86967nnra: Remove python 3.8 from GHA
    
    * CU-86967nnra: Remove python 3.8 from classifiers
    
    * CU-86967nnra: Add python version requirements to setup.py (allowing from 3.9 to 3.11)
    
    * CU-86967nnra: Remove upper bound from python requirements.
    
    Upper bound could be lifted as soon as `spacy` releases a compatible versions. And it _shouldn't_ require any changes from our side. And it isn't possible to install it on higher versions (currently) due to no `spacy` being available for those versions
    mart-r authored Oct 28, 2024
    Configuration menu
    Copy the full SHA
    04efda5 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2024

  1. CU-86964zm4d fix preprocessing (#496)

    * CU-86964zm4d: Use ignore tag correctly to ignore certain parts of UK release
    
    * CU-86964zm4d: Use OPCS4 later refset ID by default (and switch to older if needed)
    
    * CU-86964zm4d: Fix OPCS4 refset ID tests.
    
    Fix the default value being tested for (i.e in case of international release that'll be shown).
    Add a test for old UK extension.
    
    * CU-86964zm4d: Add note regarding OPCS refset ID relevance only for UK extensions.
    
    * CU-86964zm4d: Fix checking of extension outside loops.
    
    I.e determinie if a UK release/bundle is used for OPCS4/ICD10 mappings splitting.
    Always returning separate refsets for ICD10 and OSC internally, even if the latter is None.
    mart-r authored Nov 1, 2024
    Configuration menu
    Copy the full SHA
    e924798 View commit details
    Browse the repository at this point in the history
  2. CU-8695hghww backwards compatibility workflow (#478)

    * CU-8695hghww: Add bash script to run backwards compatibility
    
    * CU-8695hghww: Rename backwards compatibility running bash script
    
    * CU-8695hghww: Add new step to workflow to run model backwards compatibility
    
    * CU-8695hghww: Fix model compatibility regression suite path
    
    * CU-8695hghww: Simplify creation and removal of fake model folder
    mart-r authored Nov 1, 2024
    Configuration menu
    Copy the full SHA
    c0082ef View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2024

  1. Configuration menu
    Copy the full SHA
    df3df66 View commit details
    Browse the repository at this point in the history

Commits on Nov 19, 2024

  1. CU-8696m1mch: Remove versioning utility since all its parts were depr…

    …ecated (#500)
    
    * CU-8696m1mch: Remove versioning utility since all its parts were deprecated
    
    * CU-8696m1mch: Remove tests for versioning utility
    
    * CU-8696m1mch: Remove unused test-specific binary (CDB)
    mart-r authored Nov 19, 2024
    Configuration menu
    Copy the full SHA
    37a8a63 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #506 from CogStack/master

    v1.14.0 release PR
    mart-r authored Nov 19, 2024
    Configuration menu
    Copy the full SHA
    ceb74b1 View commit details
    Browse the repository at this point in the history
Loading