- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6
 
          Fix unsupported dtype from PyMC sampler warnings with ClickHouseBackend
          #75
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5080fcd    to
    5ead2f6      
    Compare
  
    5ead2f6    to
    16400d9      
    Compare
  
    
          Codecov ReportBase: 95.46% // Head: 95.65% // Increases project coverage by  
 Additional details and impacted files@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
+ Coverage   95.46%   95.65%   +0.19%     
==========================================
  Files           9        9              
  Lines         617      644      +27     
==========================================
+ Hits          589      616      +27     
  Misses         28       28              
 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov.  | 
    
7be3a9f    to
    13ec55e      
    Compare
  
    In earlier versions of ClickHouse, the `UInt8` type had to be used, but now this is causing trouble with NumPy bool arrays.
13ec55e    to
    eb639ca      
    Compare
  
    | 
           @lhelleckes @qacwnfq this was a somewhat tricky PR. If you have a chance to review that's cool, but don't feel obliged to do so. Knowing that you two are short on time and not familiar with the codebase, I challenged myself to do the changes such that each commit passes the tests. The  The   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| list(self.var_shapes[name]), | ||
| dims=list(self.model.RV_dims[name]) if name in self.model.RV_dims else [], | ||
| dims=list(self.model.named_vars_to_dims[name]) | ||
| if name in self.model.named_vars_to_dims | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a little unorthodox to use a ternary operator to compute the default arg. Couldn't it be just None and if it is None then it is computed? Just a nitpick though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did this because it's in a list comprehension, and there's no way to conditionally NOT pass the kwarg
| # This 👇 is needed until PyMC provides shapes ahead of time. | ||
| undefined_ndim=True, | ||
| ) | ||
| if statname == "warning": | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning that if we would want to record warnings with hopsy we need to give the stat the name "warning"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for now this is only for stats named "warning", which also get special treatment by PyMC.
One could extend this to more arbitrary objects, but generally I think we should try to stick to standard data types and avoid objects, so I didn't want to add more flexibility than needed.
| tune=3, | ||
| draws=5, | ||
| tune=30, | ||
| draws=50, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the rng set somewhere to guarantee a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
within these steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, but with this few steps the NUTS is pretty much guaranteed to emit warnings.
And even if it occasionally does not, that should not break things.
This MR fixes some incompatibilities with
NUTSsampling on recent PyMC versions.Background
Since PyMC v4.3.0 or more specifically since pymc-devs/pymc#6192, the
SamplerWarnings emitted by some step methods --- most importantlyNUTS--- are no longer collected on the PyMCBaseTraceobject, but rather as any other sampler stat.This created an incompatibility with McBackend, where the
ClickHouseBackendcould not store theseobject-dtyped things into the database.Solution
Instead of filtering/ignoring the
"warning"stat, I went all the way to actually support storing it by pickling it, base64-encoding the bytes and storing the base64 encoded data as an ASCII string.For this, the following changes were needed:
strdtypes in theNumPyBackendandClickHouseBackend.mcbackend.adapters.pymc.TraceBackendto encode/decode the"warning"stat such that themcbackend.Chainonly seesstrdata coming in/out.NUTSto provoke needing to storeSamplerWarning.Further changes along the way:
"localhost", viaCLICKHOUSE_HOST,CLICKHOUSE_PORTandCLICKHOUSE_PASSenvironment variables.ClickHouseChainbatch inserting assumes that all calls to.append()pass similar dicts as the first call #74 was added.booldata as the ClickHouseBooltype, which was not supported with earlier versions.