-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot read correctly variable name #268
Comments
here another file with a similar issue, this file has apparently been created using SPSS (not the dlls as in the previous example). Here the variable which name has been truncated is XC0DAB1_1 (truncated to XC0DAB1), it is the variable in position 85 (counting from 1). Again pspp reads the variable correctly. |
Yes - I opened it in SPSS and saved it again. |
I am experiencing an error that seems related to this. I am sorry, but I cannot share the (customer's) data file, and haven't been able (had the time) to generate a synthesized example file that triggers the bug. However, I have been able to narrow down the issue a little bit:
So, trying to see if the error is caused by
./extract_metadata file_ok.sav file_ok-metadata.json
./readstat file_ok.sav file_ok.csv
Converted 489 variables and 88013 rows in 4.49 seconds
Error processing file_ok.sav: Unable to convert string to the requested encoding (invalid byte sequence)
./readstat file_broken.sav file_broken.csv
sed 's/"FORN_1"/"forn_1"/g' file_broken.csv > file_ok.csv
./readstat file_ok.csv file_ok-metadata.json output.sav
./extract_metadata output.sav output-metadata.json
When I wrote this I was surprised by the error during my first attempt at converting data from .sav to .csv. I guess I will inspect the data file around row 88013. I am sorry that I cannot provide a reproducible error report, but thought that this might shed some light on where to look for the cause of this bug. |
When reading the attached file, there should be a variable name "BRANDAA_SUN_1", I get instead "BRANDAA". PSPP can read the variable name correctly. I think the file has been created using the IBM spss dll files instead of the full application. If the file is opened in spss and saved, then it is read correctly. I have tested with a simple C program that the issue is indeed coming from Readstat:
test.SAV.zip
original report: Roche/pyreadstat#165
The text was updated successfully, but these errors were encountered: