You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I'd like to say this lib looks really good - it's simple, easy to integrate and has very usable API. I've been meaning to integrate it into my workflow for a while, and I finally got a chance to today. However, it's been an unsuccessful experience.
I'm on Windows, using Python 3.6.
I tried following the docs example for has_dtypes decorator. In short, it seems you really can't use the usual Python types in the dtype schema information. Instead of int, you have to use np.int4, and instead of str, np.object_ should be used instead.
I'd imagine this would be very confusing for newcomers, when they get assertion errors by just following the examples.
Here's a simple reprex:
importengarde.decoratorsasedimportpandasaspdimportnumpyasnpsample_df=pd.DataFrame([
dict(a=1, b='test 1'),
dict(a=2, b='test 2'),
])
expected_schema=dict(
a=int,
b=str
)
# I expect this to work, following the doc example. However, it fails@ed.has_dtypes(items=expected_schema)defexpected_process(df):
returndf# Fails with AssertionError: a has the wrong dtype (<class 'int'>)# comment it out to see the working example belowexpected_process(sample_df)
working_schema=dict(
a=np.int64,
b=np.object_
)
# fixed the schema@ed.has_dtypes(items=working_schema)defworking_process(df):
returndf# this worksworking_process(sample_df)
The text was updated successfully, but these errors were encountered:
First of all, I'd like to say this lib looks really good - it's simple, easy to integrate and has very usable API. I've been meaning to integrate it into my workflow for a while, and I finally got a chance to today. However, it's been an unsuccessful experience.
I'm on Windows, using Python 3.6.
I tried following the docs example for
has_dtypes
decorator. In short, it seems you really can't use the usual Python types in the dtype schema information. Instead ofint
, you have to usenp.int4
, and instead ofstr
,np.object_
should be used instead.I'd imagine this would be very confusing for newcomers, when they get assertion errors by just following the examples.
Here's a simple reprex:
The text was updated successfully, but these errors were encountered: