Skip to content

pymc3.Data converts input data to float64 type - so int data cannot later be used as an index #3813

@JonAnCla

Description

@JonAnCla

Hi there guys

I'd like to create a model that I want to fit many times to different datasets for cross validation purposes.

One of my columns of input data is categorical, so I use it to index a vector of RVs depending on which category is presented in each sample of data. Something like this:

cat_mvs = [pymc3.Normal(c, mu = 0, sd = 0.05) for c in unique_categories]

cat_mv_vector = pymc3.math.stack(cat_mvs)
cat_data = pymc3.Data('Categorical Input Data', category_codes)
sample_cat_mv = cat_mv_vector[cat_data]

Note that my category_codes data is a numpy array of integers

That last line of code above triggers an error, here's the traceback within pymc3:

~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/tensor/var.py in __getitem__(self, args)
    568                             TensorVariable, TensorConstant,
    569                             theano.tensor.sharedvar.TensorSharedVariable))):
--> 570                 return self.take(args[axis], axis)
    571             else:
    572                 return theano.tensor.subtensor.advanced_subtensor(self, *args)

~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/tensor/var.py in take(self, indices, axis, mode)
    612 
    613     def take(self, indices, axis=None, mode='raise'):
--> 614         return theano.tensor.subtensor.take(self, indices, axis, mode)
    615 
    616     # COPYING

~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/tensor/subtensor.py in take(a, indices, axis, mode)
   2448             return advanced_subtensor1(a.flatten(), indices)
   2449         elif axis == 0:
-> 2450             return advanced_subtensor1(a, indices)
   2451         else:
   2452             if axis < 0:

~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

~/.pyenv/versions/3.7.2/envs/jupyterlab-3.7.2/lib/python3.7/site-packages/theano/tensor/subtensor.py in make_node(self, x, ilist)
   1701         ilist_ = theano.tensor.as_tensor_variable(ilist)
   1702         if ilist_.type.dtype not in theano.tensor.integer_dtypes:
-> 1703             raise TypeError('index must be integers')
   1704         if ilist_.type.ndim != 1:
   1705             raise TypeError('index must be vector')

TypeError: index must be integers

It seems that within pymc3.Data(), my category_codes data is being coerced to float64, which is not a valid indexing type.

Looking at the source for pymc3.Data() I think the problem is ultimately in the called function pymc3.model.pandas_to_array which converts its input data to a float on its last line, see https://github.com/pymc-devs/pymc3/blob/master/pymc3/model.py#L1495

Can pymc3.Data() and/or pymc3.model.pandas_to_array be changed to be preserve the input data type?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions