Skip to content

Conversation

@percevalw
Copy link
Member

Changelog

Added

  • New unified edspdf.data api (pdf files, pandas, parquet) and LazyCollection object
    to efficiently read / write data from / to different formats & sources. This API is
    has been heavily inspired by the edsnlp.data API.
  • New unified processing API to select the execution backend via data.set_processing(...)
    to replace the old accelerators API (which is now deprecated, but still available).
  • eds.huggingface-embedding now supports quantization and other AutoModel.from_pretrained kwargs

Fixed

  • eds.huggingface-embedding now resize bbox features for large PDFs, instead of making the model crash

@percevalw percevalw linked an issue Feb 9, 2024 that may be closed by this pull request
@percevalw percevalw force-pushed the api-update branch 4 times, most recently from f098dc0 to 7bacbfc Compare February 9, 2024 02:18
@codecov
Copy link

codecov bot commented Feb 9, 2024

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (06c527b) 98.39% compared to head (85b02eb) 98.60%.
Report is 1 commits behind head on main.

Files Patch % Lines
edspdf/processing/multiprocessing.py 98.83% 4 Missing ⚠️
edspdf/data/files.py 97.97% 2 Missing ⚠️
edspdf/data/parquet.py 98.03% 2 Missing ⚠️
edspdf/processing/simple.py 96.49% 2 Missing ⚠️
edspdf/trainable_pipe.py 98.24% 2 Missing ⚠️
edspdf/data/pandas.py 97.77% 1 Missing ⚠️
edspdf/pipeline.py 99.10% 1 Missing ⚠️
edspdf/utils/lazy_module.py 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #25      +/-   ##
==========================================
+ Coverage   98.39%   98.60%   +0.20%     
==========================================
  Files          36       46      +10     
  Lines        2370     3012     +642     
==========================================
+ Hits         2332     2970     +638     
- Misses         38       42       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@percevalw percevalw force-pushed the api-update branch 8 times, most recently from 04780b1 to 72b6688 Compare February 16, 2024 12:58
@percevalw percevalw merged commit 1c76a5a into main Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: deprecate accelerators and follow edsnlp.data-like API

2 participants