-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
writing SPSS sav file with long strings changes the column names and values #260
Comments
And here another variation. My aim was to reproduce #241. If I introduce an international character at the end, the result is the same as reported before, i.e. with a length shorter than 756 everything looks fine including the international character, but with 757 I get the split as shown before, also with the international character appearing correctly. However if I replace the numbers with NANs, the file is written ok, but when converting it to csv with readstat, then the error arises:
Actually it is not needed to have an international character at all, it also happens with normal characters as shown below, so this is caused by the NANs. I can see the same effect using python (no international character is needed to cause the error), also there I need 757 characters to cause the error and with 756 it is fine. In python with the international character 756 characters were enough to cause the issue as I guess the international characters is two bytes. Program
|
Thanks for the report. The best way to confirm a bug is to create a test case in https://github.com/WizardMac/ReadStat/blob/master/src/test/test_list.h This will perform a round-trip on the data and confirm whether the same values are read and written. (You can run the test suite with "make check".) I will dig further into it when I get a chance. |
While doing some experiments I found the following strange thing:
If I compile the program described below, where I write three variables, the first one string, the two last ones double. Next I read it with the readstat binary converting it to csv, and then visually inspect the resulting csv. If the length of the string is 756 (does not matter if the last character is the null char as here or it is a normal char) or less, the csv looks as expected. i.e :
But if the length of the string is 757 (again, independently of the last character being null or not), then I get this:
where there are two new variables AAAAA2 and AAAAA1, the two numeric variables have disappeared and the values are all "a"s. The length of every value seems to be 255, except the last one being 246.
Maybe I did something wrong in the program, sorry if that is the case.
Also not sure if this is of any help, if not you can ignore it and close the issue. My thinking is that this is maybe somehow related to #236 and #241.
Here the program :
The text was updated successfully, but these errors were encountered: