-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Refactor convert_observed_data #7299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor convert_observed_data #7299
Conversation
|
Good news is that all tests except Flaky tests are annoying because we can't XFAIL them (I forgot). |
@lhelleckes you can rebase now :) |
b939c22
to
c2f7bec
Compare
c2f7bec
to
b97a5b0
Compare
if isgenerator(data): | ||
return floatX(generator(data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was on purpose, for something fancy in VI. @ferrine can you confirm whether it's true and whether we still need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the refactor only moves the code, but if we can delete it that'd be even better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't wrapping the generator in floatX consume it immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm it was also done before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User-provided observed
data comes through this, therefore I'm pretty certain that this generator branch is still relevant for VI.
One next step after this PR could be to split the function into two overloads - one of which applies to generator
type data
, and the other to all the rest.
The more important next step should be the introduction of a dtype
kwarg so we can get #7277 fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a guess, but that might be for stochastic optimization? If so, that generator could even be infinite.
Why not (floatX(value) for value in generator)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the output can be a generator, the type signature is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some black magic.
Couldn't the observed value be supposed to be an integer? |
Yes, in that case it will result in a NumPy array too:
|
I meant that it might be an array (pytensor or numpy) with integer dtype. Unless I'm missing some context we can't just convert that to a float type. |
If you follow the branching, you'll find that we made that conversion all the time already. In my opinion we should merge this and continue adding a The whole "preparing generator data for VI" should be refactored. I would probably even give it it's own If y'all agree I can take the first step towards putting |
Sorry, I don't know what you mean. Can you point me to an example? I don't think we are converting data that a users specified as an int type to a float type automatically, do we? |
main branch: Lines 119 to 133 in fd11cf0
When If we should do that is a separate, VI-specific question which is IMO best dealt with by separating the generator case away. |
Description
In order to improve type hints in the
convert_observed_data
function and to ultimately resolve issue #7277, the generator part of the code was separated in a statement with return. This will make it easier to apply dtypes to the other data structures in the next step.Related Issue
Checklist
Type of change
📚 Documentation preview 📚: https://pymc--7299.org.readthedocs.build/en/7299/