Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding Error ISO-8859-1 SimpleForm #85

Open
blackode opened this issue Mar 17, 2021 · 3 comments
Open

Encoding Error ISO-8859-1 SimpleForm #85

blackode opened this issue Mar 17, 2021 · 3 comments

Comments

@blackode
Copy link

XML String

data = "<?xml version='1.0' encoding='ISO-8859-1'?><OTA_HotelAvailRS xmlns=\"http://parsec.es/hotelapi/OTA2014Compact\" TimeStamp=\"2021-03-17T08:56:29Z\" PrimaryLangID=\"en-GB\" Id=\"11,33667649,72545\"><Hotels HotelCount=\"0\"><DateRange Start=\"2021-04-20\" End=\"2021-04-21\" /><RoomCandidates><RoomCandidate RPH=\"0\"><Guests><Guest AgeCode=\"A\" Count=\"2\" /></Guests></RoomCandidate></RoomCandidates></Hotels></OTA_HotelAvailRS>"

Here, the encoding value is ISO-8859-1.

Now, if I try run simple form it is giving me the following error

Error

iex(12)> Saxy.SimpleForm.parse_string(data)
{:error,
 %Saxy.ParseError{
   binary: "<?xml version='1.0' encoding='ISO-8859-1'?><OTA_HotelAvailRS xmlns=\"http://parsec.es/hotelapi/OTA2014Compact\" TimeStamp=\"2021-03-17T08:56:29Z\" PrimaryLangID=\"en-GB\" Id=\"11,33667649,72545\"><Hotels HotelCount=\"0\"><DateRange Start=\"2021-04-20\" End=\"2021-04-21\" /><RoomCandidates><RoomCandidate RPH=\"0\"><Guests><Guest AgeCode=\"A\" Count=\"2\" /></Guests></RoomCandidate></RoomCandidates></Hotels></OTA_HotelAvailRS>",
   position: 30,
   reason: {:invalid_encoding, "ISO-8859-1"}
 }}

I tried to replace ISO-8859-1 with 'UTF-8' and it is working fine.

Is there a way to parse ISO-8859-1 encoded xml?

@qcam
Copy link
Owner

qcam commented Mar 17, 2021

At the current state, Saxy only expects UTF-8 encoding, the parser will stop when the XML document explicitly says it's not UTF-8 encoded. Saxy very likely would not support other encodings.

If you are really sure that the input string is in UTF-8, for short term solution you could remove the encoding information before calling Saxy. For long term we could maybe provide an option to ignore encoding or override it to UTF-8, so the parser will continue even on unsupported encoding.

@blackode
Copy link
Author

@qcam Thanks for the update and quick solution

@jhonathas
Copy link

Is it currently supported or not yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants