Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexingError: Unalignable boolean #446

Closed
xiuwei1026 opened this issue May 25, 2021 · 2 comments · Fixed by #572
Closed

IndexingError: Unalignable boolean #446

xiuwei1026 opened this issue May 25, 2021 · 2 comments · Fixed by #572
Assignees
Milestone

Comments

@xiuwei1026
Copy link

xiuwei1026 commented May 25, 2021

Got IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match) error after adding one more greaterthan constraint.

unique_notes_constraint = UniqueCombinations(
    columns=['ADMISSION_TYPE', 'DIAGNOSIS'],handling_strategy='reject_sampling')

def cal_age(data):
    data['adm_time'] = pd.to_datetime(data.ADMITTIME).dt.date
    data['dob'] = pd.to_datetime(data.DOB).dt.date
    return data.apply(lambda e: (e['adm_time'] - e['dob']).days/365, axis=1)

age_constraint = ColumnFormula(
    column = 'age',
    formula = cal_age,
    handling_strategy='transform')

adm_time_constraint = GreaterThan(
    low = 'DOB',
    high = 'ADMITTIME',
    handling_strategy='reject_sampling')

constraints = [unique_notes_constraint,age_constraint,adm_time_constraint]

Without adm_time_constraint, there was no such error. Error info is as below. Any thoughts?

new_data = model_CopulaGAN.sample(1000)
C:\Users\xwei\Anaconda3\lib\site-packages\sdv\constraints\base.py:189: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  return table_data[valid]
Traceback (most recent call last):

  File "<ipython-input-179-bd7e5014d8df>", line 1, in <module>
    new_data= model_CopulaGAN.sample(1000)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\tabular\base.py", line 378, in sample
    return self._sample_batch(num_rows, max_retries, max_rows_multiplier)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\tabular\base.py", line 293, in _sample_batch
    sampled, num_valid = self._sample_rows(

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\tabular\base.py", line 218, in _sample_rows
    sampled = self._metadata.filter_valid(sampled)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\metadata\table.py", line 583, in filter_valid
    data = constraint.filter_valid(data)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\sdv\constraints\base.py", line 189, in filter_valid
    return table_data[valid]

  File "C:\Users\xwei\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2893, in __getitem__
    return self._getitem_bool_array(key)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2945, in _getitem_bool_array
    key = check_bool_indexer(self.index, key)

  File "C:\Users\xwei\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 2184, in check_bool_indexer
    raise IndexingError(

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
@fealho
Copy link
Member

fealho commented May 26, 2021

Hi @xiuwei1026, I was not able to reproduce this error. Could you provide some more information? Namely,

  1. The data type of the DOB, ADMITTIME columns;
  2. Whether there are None values in the dataset;
  3. The rest of the code you ran (specifically how you are initializing model_CopulaGAN);
  4. Some small sample of your dataset (only a few rows should suffice)

@xiuwei1026
Copy link
Author

xiuwei1026 commented May 26, 2021

Hi @fealho, thanks for taking care of this.

  1. Data type of DOB and ADMITIME are object, but the former is date, the later one contains time.
  2. No None values in the dataset
  3. The whole code is as follows
  4. You can find the sample data here
# Algorithm: Copula GAN
import pandas as pd
from sdv.tabular import CopulaGAN

# Load test data
data = pd.read_csv('c:/XP/Projects/data creation/test.csv',index_col=False)

# Define constraints
from sdv.constraints import UniqueCombinations

unique_notes_constraint = UniqueCombinations(
    columns=['ADMISSION_TYPE', 'DIAGNOSIS'],handling_strategy='reject_sampling')


def cal_age(data):
    data['adm_time'] = pd.to_datetime(data.ADMITTIME).dt.date
    data['dob'] = pd.to_datetime(data.DOB).dt.date
    return data.apply(lambda e: (e['adm_time'] - e['dob']).days/365, axis=1)

from sdv.constraints import ColumnFormula

age_constraint = ColumnFormula(
    column = 'age',
    formula = cal_age,
    handling_strategy='transform')

from sdv.constraints import GreaterThan

adm_time_constraint = GreaterThan(
    low = 'DOB',
    high = 'ADMITTIME',
    handling_strategy='reject_sampling')

constraints = [unique_notes_constraint,age_constraint,adm_time_constraint]

# Define model
model_CopulaGAN = CopulaGAN(constraints=constraints)

# Fit model
model_CopulaGAN.fit(data)

# Generate synthetic data
new_data_CopulaGAN = model_CopulaGAN.sample(1000)
```
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants