Releases: terrier-org/pyterrier
0.13.0
What's Changed
A new feature release, demonstrating the new Artifact API, and making a fix in when fields are created in Terrier indexes by IterDictIndexer.
Significant improvements:
- New feature: Artifact API by @seanmacavaney in #436 - we'll be improving this feature and its documentation in future releases.
- Improvement: terrier.IterDictIndexer doesn't create a field-index when not requested by @cmacdonald in #525
Minor changes:
- bibliography file in documentation by @seanmacavaney in #519
- ruff github action by @seanmacavaney in #523
- Remaining mypy errors by @seanmacavaney in #524
- RemoteDataset: get one file from a zip by @cmacdonald in #529
- use decorate_batch properly in Terrier by @cmacdonald in #528
- saves process name such that jps can shows a name for Pyterrier processes by @cmacdonald in #527
Full Changelog: 0.12.1...0.13.0
0.12.1
0.12.1 - released 19/12/2024
Wrapping up various improvements developed in the last few weeks.
New Feature:
- Common prefix pipeline computation in pt.Experiment by @cmacdonald, with @seanmacavaney and @Parry-Parry in #514
Improvements:
- PRF compilation by @cmacdonald in #504
- More compilation improvements by @cmacdonald in #506
- pt.debug.pdb by @seanmacavaney in #510
- Guessing batch size for indexing pipelines by @seanmacavaney in #510
Minor:
- Dataset Decompression Improvements by @cmacdonald in #512
- support verbose=True on by_query iter by @cmacdonald in #513
- pass kwargs through to the underlying open functions by @cmacdonald in #518
Documentation:
- more extension docs by @seanmacavaney in #511
- Transformer docs by @cmacdonald in #507
Full Changelog: 0.12.0...0.12.1
0.12.0
What's Changed in PyTerrier 0.12.0
0.12.0 has an API change for Transformer, making it easier to both implement and call Transformers using iter-dicts rather than DataFrames (DataFrame support is also maintained). This release also completely refreshes the .compile()
implementation, making it easier to implement pipelines that can be optimised.
Major:
- API change:
Transformer.transform_iter()
returns an iter-dict generator by @cmacdonald in #481 Transformer.compile()
improvements by @seanmacavaney in #480- deprecate caching operator
~
by @cmacdonald in #483 - more extensions with integrated documentation by @seanmacavaney in #503
Minor:
- addresses #423 IRDS warnings on
pt.list_datasets()
by @cmacdonald in #485 - bump maven plugin versions by @cmacdonald in #495
- integrated extension documentation by @seanmacavaney in #497
- move to pyproject.toml by @cmacdonald in #490
- Auto-generated Citations from DBLP by @seanmacavaney in #498
- dont consume arbitrary unused kwargs in TerrierIndexer, update overridden properties by @cmacdonald in #500
- More types by @cmacdonald in #484
- Add Apple Silicon GHAs by @cmacdonald in #478
Full Changelog: 0.11.0...0.12.0
0.11.0
What's Changed
Significant update that has refactored much of the PyTerrier source code and renamed many classes as we progress towards a PyTerrier 1.0 release.
The most significant changes are:
pt.init()
is no longer required 😃. If necessarypt.java
methods can be used to change Java initialisationpt.BatchRetrieve
is nowpt.terrier.Retriever
, and similar changes for other Terrier indexers and retrieverspt.AnseriniBatchRetrieve
is now in its own separate project, PyTerrier-Anserini, with various improvements
All changes are backwards compatible in this release - deprecation warnings will guide you how to update your code.
More details below:
Improvements
- Move all Java/JNIUS code into pt.java, move all Terrier code into pt.terrier; remove pt.init() by @seanmacavaney in #447
- dynamic module loading by @seanmacavaney in #461
- Incorporate Retrieval Scores into RM3 by @mam10eks in #453
- pt.apply for making an indexer by @cmacdonald in #467
- query_toks support for terrier.Retriever by @cmacdonald in #466
- add save_mode='warn' and save_mode='error' to pt.Experiment (warn as default) by @cmacdonald in #408
### Refactoring
- Deprecate DFIndexer by @cmacdonald in #457
- pt.terrier.rewrite revisions - remove Axiomatic, remove terrier-prf by @seanmacavaney in #472
- shims for deprecated modules by @seanmacavaney in #476
- text_loader abstraction for pt.text.get_text by @seanmacavaney in #469
- move Anserini to a separate project by @seanmacavaney in #473
Documentation
- Add RankVicuna and RankZephyr Plugins by @kaustubhdhole in #441
- Update tuning.rst by @albertoueda in #446
- Add PyTerrier_ChatNoir to the plugin section by @mam10eks in #452
- Remove nptyping dependency to assure numpy 2 compatability by @cmacdonald in #445
Minor
- change all tests to use new terrier retriever names, but check old names too by @cmacdonald in #458
- Parallel fixes by @seanmacavaney in #462
- fix logger error by @seanmacavaney in #464
- Add comments to requirements.txt by @cmacdonald in #465
- failing anserini tests due to version 0.36.0, disabling for now by @seanmacavaney in #468
- remove the writing of a default terrier.properties file by @cmacdonald in #470
- fix test_maven by @seanmacavaney in #471
- Python 3.12 in GHA by @cmacdonald in #459
- Bump most JDK version tested in GHA to 21 by @cmacdonald in #475
- Update pt.terrier.Retriever str and repr #474
New Contributors
- @kaustubhdhole made their first contribution in #441
- @mam10eks made their first contribution in #452
Full Changelog: 0.10.1...0.11.0
0.10.1
Minor release with minor improvements and bug fixes.
What's Changed
- Bugfix: Delete baseline pvalue from correction method input by @JorgeGabin in #440
- Fix: fix msmarco location by @cmacdonald in #435
- Feature: added corpus_iter for Terrier index by @cmacdonald in #426
- remove sklearn as required dependency by @cmacdonald in #410
- Add troubleshoot for installation and certification error by @Krissy510 in #411
- fix parsing of trecxml topics by @lukaszett in #414
- paired t-tost by @seanmacavaney in #420
- read_results optimization by @seanmacavaney in #421
- pickling QE pipelines to parallelised QE gridsearch by @cmacdonald in #430
- Require Python 3.8 minimum by @cmacdonald in #431
- Bump logback from 1.2.0 to 1.2.13 in /terrier-python-helper by @dependabot
- improved error message pt.apply.query - from #433 by @cmacdonald in #434
- Improved testing of FeaturesBatchRetrieve by @cmacdonald in #437
New Contributors
- @Krissy510 made their first contribution in #411
- @JorgeGabin made their first contribution in #440
Full Changelog: 0.10.0...0.10.1
0.10.0
What's Changed
New Features
Transformer.__call__
now supports both dataframe and iterdicts by @cmacdonald in #381- Terrier: Custom stopwords by @cmacdonald in #372
- Terrier: Access the stemmer of Terrier from PyTerrier by @cmacdonald in #382
- Terrier: Improved API for loading Terrier indices into memory by @cmacdonald in #386
Improvements
- added tokenizer as arg for pt.text.sliding by @mihirs16 in #387
- addresses #367 - include qid in pt.apply Exception by @cmacdonald in #370
- addresses #377: pt.apply.query() raises exception if the query column does not exist by @cmacdonald in #380
- let pt.tqdm exist without pt.init() by @cmacdonald in #399
- deprecate pt.Utils by @cmacdonald in #384
- removes two warnings by @cmacdonald in #385
- work on test failure by @cmacdonald in #401
- Test pyterrier with newer Python versions by @cmacdonald in #400
- bump supported Anserini version by @cmacdonald in #406, addresses #404
- Terrier: allow to put term and LexiconEntry into a tuple by @cmacdonald in #369
Bugs:
- stringify properties and controls, addresses #357 by @cmacdonald in #358
- fix bug in metadata size warning by @seanmacavaney in #362
Documentation
- Update pipeline_examples.md by @gurcankavakci in #359
- Fixed typo by @hermlon in #364
- Update ltr.rst by @Hermi-Mire in #371
- Update transformer.rst by @albertoueda in #383
- clarify docstring for indexing with regards to metadata by @lukaszett in #394
- Query Rewriting & Expansion by @cakiki in #402, #403
New Contributors
- @gurcankavakci made their first contribution in #359
- @hermlon made their first contribution in #364
- @Hermi-Mire made their first contribution in #371
- @lukaszett made their first contribution in #394
- @cakiki made their first contribution in #402
- @mihirs16 made their first contribution in #387
Full Changelog: 0.9.2...0.10.0
0.9.2
Minor release with minor improvements and bug fixes.
What's Changed
- add sbert example notebook by @cmacdonald in #344
- Update scikit-learn requirement from the deprecated sklearn, which was causing build errors at some times.
- adding batching operations to
apply.generic()
andapply.by_query()
by @cmacdonald in #351 - thanks to Xun Zhou, University of Michigan via #350 - improve error messages for invalid indexing configurations by @cmacdonald in #349 -- thanks to @maxhenze in #348
- Various empty dataframe fixes by @cmacdonald in #353 -- thanks to report by Prithvijit Dasgupta, University of Michigan in #352
- improved error message for add_ranks by @cmacdonald in #354
Full Changelog: 0.9.1...0.9.2
0.9.1
Bugfix release addressing a problem with pretokenised indices on Windows
What's Changed
- Nofifo pretok indexing fixes by @cmacdonald in #343
Full Changelog: 0.9.0...0.9.1
0.9.0
Significant update - refactoring of public API (e.g. pt.transformer.TransformerBase
-> pt.Transformer
) and support in the Terrier backend for making indices from pre-tokenised documents. Python 3.10 is now supported.
What's Changed
- fix error in IRDSDataset when a query field is named "query" by @seanmacavaney in #303
- Fix type annotation by @heinrichreimer in #313
- addresses #315 IRDS corpus_iter are not subscriptable by @cmacdonald in #316
- Missing comma in bm25_qe example by @JohnGiorgi in #319
- Argument meta should be supplied as dictionary by @JohnGiorgi in #320
- use Jnius 1.4 by @cmacdonald in #249
- Python 3.10 support by @cmacdonald in #322
- Lz4 support for pt.io.autoopen() by @cmacdonald in #323
- addresses #326 faster version of add_ranks for single queries by @cmacdonald in #327
- addresses #321 pt.apply.doc_score batching by @cmacdonald in #325
- IterDictIndexer can index pre-tokenised documents by @cmacdonald in #328
- Bump logback-core from 1.2.0 to 1.2.9 in /terrier-python-helper by @dependabot in #336
- documenting BM25F controls and tuning by @cmacdonald in #296, addresses #294
- 0.9refactor by @cmacdonald in #314, #339, addresses #271
- pt.Experiment() alters the input measures list to drop "mrt" #301
- Expose Termpipelines in Terrier index backend by @cmacdonald in #338
- pt.rewrite.tokenise() impl by @cmacdonald in #340 addresses #252 #253
- upgraded GitHub actions by @cmacdonald in #341, #342
- fix LTR groupby for xgboost & lightgbm by @cmacdonald in #284
New Contributors
- @heinrichreimer made their first contribution in #313
- @JohnGiorgi made their first contribution in #319
Full Changelog: 0.8.1...0.9.0
0.8.1
Minor release with minor improvements and bug fixes.
What's Changed
- fixed bug with is_transformer by @seanmacavaney in #274
- addresses #275 issue k in kmaxavg, improved testing by @cmacdonald in #276
- defer loading ir_datasets by @seanmacavaney in #280
- Set meta and meta_lengths in constructor by @MWschutte in #282
- Anserini fixes by @cmacdonald in #279, reported by @Azouu
- prevent use of nptyping v2 by @cmacdonald in #291, reported by @tabonnet
- SourceTransformer pass through extra columns, addresses #287 by @cmacdonald in #288, reported by @Xiao0728
- more transformers with repr by @cmacdonald in #289
New Contributors
- @MWschutte made their first contribution in #282
Full Changelog: 0.8.0...0.8.1