readstat not converting encoding of sas7bcat labels

Sas7bcat labels with special characters are not  correctly translated into UTF-8. 

As an example coming from this [R-Haven issue](https://github.com/tidyverse/haven/issues/312), when reading the file "formats.sas7bcat" coming from [here](https://github.com/antuki/reprex/tree/master/tidyverse_haven_312), I get labels like "modalit\xe9 \xe01", which are not valid UTF8 but are valid windows-1252 or latin1. The thing is that readstat correctly sets the file.encoding to windows-1252, so that string should be already valid UTF-8 when my function readstat_value_label_handler gets it. This happens in pyreadstat, in R-Haven and debugging readstat with gdb. An user found a [similar issue](https://github.com/Roche/pyreadstat/issues/4) in pyreadstat for another file of his. 

Looking at readstat_sas7bcat_read.c, in the function sas7bcat_parse_value_labels, it seems to me that the variable label never gets converted. I inserted the following after line 91 and cures the problem:

```
       const char *label = &lbp2[10]; // this is line 91
        //added! 20181011
        char *label2[label_len];
        retval = readstat_convert(label2, sizeof(label2),
                    label, label_len, ctx->converter);
        if (retval != READSTAT_OK)
                goto cleanup;
```

As my understanding of readstat and iconv is still low (hope to improve it!) I am not sure if this is the proper solution, and therefore I did not dare to send a PR, but I can do after your suggestions. 

Another smaller, but still confusing thing is that if I set the encoding manually with readstat_set_file_character_encoding, to let's say something like LATIN1, and later I want to recover the file encoding with readstat_get_file_encoding, I still get WINDOWS-1252. The reason for this I think is because in readstat_sas7bcat_read.c line 371:

```
.file_encoding = hinfo->encoding
```
should be: 
```
.file_encoding = ctx->input_encoding
```

as it is in readstat_sas7bdat_read.c line 594, to reflect that the user set the encoding manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readstat not converting encoding of sas7bcat labels #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

readstat not converting encoding of sas7bcat labels #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions