medcat/v2.0.0
·
1404 commits
to main
since this release
We’re excited to announce the release of MedCAT v2. This is a major refactor that brings a more modular, flexible, and maintainable foundation for clinical NLP, while staying compatible with existing v1 models.
This release focuses on:
- Refactored structure for lower coupling and greater extensibility
- Modularity via optional install extras (install only what you need)
- Improved flexibility in tokenization, NER, and annotation pipelines
- Backwards compatibility for v1 models, with automatic conversion
✨ What’s New
- Decoupled from
spacy
→ now possible to use lightweight regex tokenizer or other (custom) backends - Optional extras: install support only for the components you need (
spacy
,meta-cat
,deid
,rel-cat
,dict-ner
) - Training is now structured around dedicated classes for clearer workflows
- Tutorials and scripts have been rebuilt from the ground up for v2
- Added support for a supervised training web service (experimental, under development)
⚠️ Breaking Changes
- Saving/Loading:
- Save method has a new name (
CAT.save_model_pack
) - v2 saves models in a new format (but still loads v1 models, with slower load times due to conversion)
- Save method has a new name (
- Training:
- Training APIs now go through separate trainer classes
- Defaults:
- Default install no longer includes spacy or advanced components (see migration guide for how to enable them)
For a complete list, see: BREAKING_CHANGES.md
📖 Migration Guide
If you’re upgrading from v1, please read the dedicated Migration Guide. It covers:
- Installation instructions
- Changes to saving/loading
- v1 model compatibility notes
- Updated tutorials and example scripts
- FAQ and troubleshooting
🔗 Useful Links
📦 PyPI
🛠️ Repository
Feedback
v2 is a big step forward, and we’d love your input!
Please open a GitHub issue or join the discussion forum for:
- Missing documentation
- Bugs or breaking behaviour
- Feedback on error/log messages
- Suggestions for future improvements