-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] *2Vec SaveLoad improvements #2939
Conversation
@gojomo I checked the pickle sizes and they seem as expected. Do you have a reproducing example? One potential pitfall is that |
When I last ran the exact same |
OK, I'll double check tmr. |
Yes, I can replicate on 3.8.327034172 bytes in pickle protocol 2 (after 4.0.068765086 bytes in pickle protocol 2 (after I'll investigate tomorrow. The timings are for #2887; trained with |
I'd especially suspect the backward-compatibility I also just noticed that the |
This is the core of it: In [88]: type(model.wv.index_to_key[0])
Out[88]: numpy.str_ I'll check where "numpy strings" (fixed width) creep in. |
OK I think I found it, fixed. Have to run now, will clean up the |
5509f9e
to
e5057c1
Compare
e5057c1
to
dc9c3fc
Compare
Previous discussion of #1777 changed some, but not all, uses of The
But #1777 & other contemporaneous work starting using 'entity' as the generic term in lots of places, over the objections of myself & other reviewers. ('Entity' is incredibly generic, and doesn't match the class name: it's The incomplete & ill-considered shift toward Using |
To be clear, I have no problem with Checking "previous discussion" now. Edit: aha, you said we're already using both Seems I'm consistent with my objections over time, at least. |
I'd also note that where it's practical & the underlying structure remains unchanged (an ordered list of lookup keys), the latest changes retain aliases for backward-compatible access: Retaining The biggest pain to any advanced API users is likely to be the removal of the |
I actually changed this in recent commits, following your:
They hard-fail now. I was thinking whether to keep those aliases "alive" or not… but with so many breaking changes, I also figured it is best to "rip off the bandaid" rather than introduce new deprecations (again).
Can you put the rationale for reworking |
If |
If you've added hard-fails for those aliases, which I think is alright, that's in some still-pending PR. I'll add an overview of the rationale on the wiki page. I was unfamiliar with that limitation on |
Yes,
I added them to this PR: |
@mpenkov ping – please review & merge, so we can move on. |
Happy to see the more-specific and -tangible My inline comments aren't things that should hold this up, just related concerns to discuss/address at some point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed this together with @piskvorky earlier today.
In general, I think the changes are good to merge, but there are several things that could be better.
I think we should document expandos
in code for the benefit of future developers and maintainers. So, add a multi-line code comment explaining what it is, what it does, and why it does things this way.
The same thing goes for specials
.
It looks like the above information is available from comments in this PR (and others), but it's highly unlikely that anyone wanting to learn more about expandos
and specials
will know to look there.
My notes from our call with @mpenkov ; FYI @gojomo : Decisions
Actions
|
@mpenkov @gojomo I improved this in 782f7ff . I didn't want to start a new PR for that, so it's a part of #2954 , where I'm fixing docs anyway. |
It's a bit awkward to discuss release todos/criteria deep in a PR that's already been merged/closed. I'd suggest creating a specific 'rollup' issue that's for meta-discussion, and in/out decisions, around the "4.0.0" release. My main hope would be to get more eyes on the big changes via one or more "not yet for everyone" test releases, as further described in new issue #2963. |
Continued from #2892 (subsumes #2892):
save()
/load()
into_save_specials()
/_load_specials()
, to better handle cases when oneSaveLoad
contains another.This PR does not include additional changes around serialization:
These will come in separate PRs later. In particular, we have to decide what to do with compatibility for models trained and stored pre-3.8.3.