-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Hi there guys
I'd like to create a model that I want to fit many times to different datasets for cross validation purposes.
One of my columns of input data is categorical, so I use it to index a vector of RVs depending on which category is presented in each sample of data. Something like this:
cat_mvs = [pymc3.Normal(c, mu = 0, sd = 0.05) for c in unique_categories]
cat_mv_vector = pymc3.math.stack(cat_mvs)
cat_data = pymc3.Data('Categorical Input Data', category_codes)
sample_cat_mv = cat_mv_vector[cat_data]
Note that my category_codes data is a numpy array of integers
That last line of code above triggers an error, here's the traceback within pymc3:
~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/tensor/var.py in __getitem__(self, args)
568 TensorVariable, TensorConstant,
569 theano.tensor.sharedvar.TensorSharedVariable))):
--> 570 return self.take(args[axis], axis)
571 else:
572 return theano.tensor.subtensor.advanced_subtensor(self, *args)
~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/tensor/var.py in take(self, indices, axis, mode)
612
613 def take(self, indices, axis=None, mode='raise'):
--> 614 return theano.tensor.subtensor.take(self, indices, axis, mode)
615
616 # COPYING
~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/tensor/subtensor.py in take(a, indices, axis, mode)
2448 return advanced_subtensor1(a.flatten(), indices)
2449 elif axis == 0:
-> 2450 return advanced_subtensor1(a, indices)
2451 else:
2452 if axis < 0:
~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
613 """
614 return_list = kwargs.pop('return_list', False)
--> 615 node = self.make_node(*inputs, **kwargs)
616
617 if config.compute_test_value != 'off':
~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/tensor/subtensor.py in make_node(self, x, ilist)
1701 ilist_ = theano.tensor.as_tensor_variable(ilist)
1702 if ilist_.type.dtype not in theano.tensor.integer_dtypes:
-> 1703 raise TypeError('index must be integers')
1704 if ilist_.type.ndim != 1:
1705 raise TypeError('index must be vector')
TypeError: index must be integers
It seems that within pymc3.Data(), my category_codes data is being coerced to float64, which is not a valid indexing type.
Looking at the source for pymc3.Data() I think the problem is ultimately in the called function pymc3.model.pandas_to_array which converts its input data to a float on its last line, see https://github.com/pymc-devs/pymc3/blob/master/pymc3/model.py#L1495
Can pymc3.Data() and/or pymc3.model.pandas_to_array be changed to be preserve the input data type?
Thanks!