-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easy solution for restoring original dtypes #26
Comments
Thanks for the suggestion @aldolamberti ! Yes, as I just mentioned in @oregonpillow's PR, we'd be happy to accept a contribution about this! Just one comment: I think the dtype conversion could be done more efficiently if you use The solution would then be as simple as storing the original self.dtypes = train_data.dtypes And then restoring them back in the last line of sampled = self.transformer.inverse_transform(data, None)
return sampled.astype(self.dtypes) |
@csala thanks for the tip. However, when i tried presumably because it's a numpy array and not a pandas dataframe, right? What I did:
but now i get this error when i try to sample after fitting: |
@oregonpillow I think that is because you are extracting the dtypes too late inside the Then, you should be able to get |
Thanks @csala I tested the code manually in colab first: Google Colab and it works now! However, when implementing this into my local fork and run tests I get the following error during
which makes no sense to me since it runs fine within colab. Any suggestions? |
Sorry for the confusion @oregonpillow ! You are totally right: I overlooked the fact that the DataFrame vs numpy issue is being taken care by the transformer, not the synthesizer. So the dtypes conversion should also be done there, inside The way to go would be to capture the dypes inside the self.dtypes = data.dtypes And then restore the dtypes in the current line 172, right after the output = np.column_stack(output).astype(self.dtypes) Would you mind trying it this way? |
@csala I reverted the synthesizer back it's default code in 0.2.1 and only changed the transformer.py as you suggested. It passed Can you explain why yesterday when I tried the dtypes in synthesizer in Colab it worked great, yet the same code inside my local environment and using |
You're right again @oregonpillow I had not tried it before myself, and it turns out that the dtype assignment only works when working with a DataFrame, as numpy has a single And I just spotted another related problem as well, which is that if data is passed as a numpy array since the beginning, the dtypes are not properly taken from it. I would do the following changes:
This time I made my homework and tried it myself, so I'm quite sure this works now ;-) Can you give it a try? |
That works! :) Good idea @csala ! I was not aware of the To clarify my understanding on your 2nd point: If we built a DF only if Therefore your solution is by always creating a dataframe, we can always restore the dtypes, and if the input was a numpy array ( Is my understanding correct? |
That's correct! There is also one additional thing to consider, which is that the numpy array in most cases will ignore the dtypes, and simply become the most broad one (i.e. So, in other words: If a numpy is given as input, even if the individual columns are different dtypes, array dtype will be |
Closed via #33 Thanks again @oregonpillow ! |
Description
After having sampled a dataset, we (@oregonpillow and I) encountered the fact that all numerical columns are converted to floats. However, we can simply restore the original dtype after sampling.
What I Did
Question
Is this something we could consider implementing?
The text was updated successfully, but these errors were encountered: