IndexingError: Unalignable boolean #446

xiuwei1026 · 2021-05-25T18:15:20Z

Got IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match) error after adding one more greaterthan constraint.

unique_notes_constraint = UniqueCombinations(
    columns=['ADMISSION_TYPE', 'DIAGNOSIS'],handling_strategy='reject_sampling')

def cal_age(data):
    data['adm_time'] = pd.to_datetime(data.ADMITTIME).dt.date
    data['dob'] = pd.to_datetime(data.DOB).dt.date
    return data.apply(lambda e: (e['adm_time'] - e['dob']).days/365, axis=1)

age_constraint = ColumnFormula(
    column = 'age',
    formula = cal_age,
    handling_strategy='transform')

adm_time_constraint = GreaterThan(
    low = 'DOB',
    high = 'ADMITTIME',
    handling_strategy='reject_sampling')

constraints = [unique_notes_constraint,age_constraint,adm_time_constraint]

Without adm_time_constraint, there was no such error. Error info is as below. Any thoughts?

new_data = model_CopulaGAN.sample(1000)
C:\Users\xwei\Anaconda3\lib\site-packages\sdv\constraints\base.py:189: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  return table_data[valid]
Traceback (most recent call last):

  File "<ipython-input-179-bd7e5014d8df>", line 1, in <module>
    new_data= model_CopulaGAN.sample(1000)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\tabular\base.py", line 378, in sample
    return self._sample_batch(num_rows, max_retries, max_rows_multiplier)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\tabular\base.py", line 293, in _sample_batch
    sampled, num_valid = self._sample_rows(

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\tabular\base.py", line 218, in _sample_rows
    sampled = self._metadata.filter_valid(sampled)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\metadata\table.py", line 583, in filter_valid
    data = constraint.filter_valid(data)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\constraints\base.py", line 189, in filter_valid
    return table_data[valid]

  File "C:\Users\xwei\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2893, in __getitem__
    return self._getitem_bool_array(key)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2945, in _getitem_bool_array
    key = check_bool_indexer(self.index, key)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 2184, in check_bool_indexer
    raise IndexingError(

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

The text was updated successfully, but these errors were encountered:

fealho · 2021-05-26T04:29:14Z

Hi @xiuwei1026, I was not able to reproduce this error. Could you provide some more information? Namely,

The data type of the DOB, ADMITTIME columns;
Whether there are None values in the dataset;
The rest of the code you ran (specifically how you are initializing model_CopulaGAN);
Some small sample of your dataset (only a few rows should suffice)

xiuwei1026 · 2021-05-26T13:46:55Z

Hi @fealho, thanks for taking care of this.

Data type of DOB and ADMITIME are object, but the former is date, the later one contains time.
No None values in the dataset
The whole code is as follows
You can find the sample data here

# Algorithm: Copula GAN
import pandas as pd
from sdv.tabular import CopulaGAN

# Load test data
data = pd.read_csv('c:/XP/Projects/data creation/test.csv',index_col=False)

# Define constraints
from sdv.constraints import UniqueCombinations

unique_notes_constraint = UniqueCombinations(
    columns=['ADMISSION_TYPE', 'DIAGNOSIS'],handling_strategy='reject_sampling')


def cal_age(data):
    data['adm_time'] = pd.to_datetime(data.ADMITTIME).dt.date
    data['dob'] = pd.to_datetime(data.DOB).dt.date
    return data.apply(lambda e: (e['adm_time'] - e['dob']).days/365, axis=1)

from sdv.constraints import ColumnFormula

age_constraint = ColumnFormula(
    column = 'age',
    formula = cal_age,
    handling_strategy='transform')

from sdv.constraints import GreaterThan

adm_time_constraint = GreaterThan(
    low = 'DOB',
    high = 'ADMITTIME',
    handling_strategy='reject_sampling')

constraints = [unique_notes_constraint,age_constraint,adm_time_constraint]

# Define model
model_CopulaGAN = CopulaGAN(constraints=constraints)

# Fit model
model_CopulaGAN.fit(data)

# Generate synthetic data
new_data_CopulaGAN = model_CopulaGAN.sample(1000)
```
`

xiuwei1026 closed this as completed May 26, 2021

xiuwei1026 reopened this May 26, 2021

fealho mentioned this issue May 28, 2021

Fix Issue IndexingError Issue #455

Closed

fealho mentioned this issue Aug 26, 2021

Fix IndexingError Issue #572

Merged

fealho closed this as completed in #572 Aug 27, 2021

katxiao added this to the 0.12.1 milestone Oct 7, 2021

katxiao assigned fealho Oct 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexingError: Unalignable boolean #446

IndexingError: Unalignable boolean #446

xiuwei1026 commented May 25, 2021 •

edited

Loading

fealho commented May 26, 2021

xiuwei1026 commented May 26, 2021 •

edited

Loading

IndexingError: Unalignable boolean #446

IndexingError: Unalignable boolean #446

Comments

xiuwei1026 commented May 25, 2021 • edited Loading

fealho commented May 26, 2021

xiuwei1026 commented May 26, 2021 • edited Loading

xiuwei1026 commented May 25, 2021 •

edited

Loading

xiuwei1026 commented May 26, 2021 •

edited

Loading