-
Notifications
You must be signed in to change notification settings - Fork 715
Description
I discovered that the Windows platform version uses the UTF-16le
internal database encoding while the other platform versions use the UTF-8
database encoding. The results of using the HEX function on TEXT string values indicate that Android/iOS WebKit Web SQL uses the UTF-8
encoding as well. I found the following official descriptions:
- http://www.sqlite.org/pragma.html#pragma_encoding
- "Support for UTF-8 and UTF-16" section of https://www.sqlite.org/version3.html
It is very clear in those and other places that the necessary conversions are done automatically and there should be no difference between UTF-8 and UTF-16 database encoding at the API level. However I discovered some hidden Gotchas:
- The result of using the sqlite HEX function with a string value is different depending on which internal database encoding is used.
- According to http://www.sqlite.org/pragma.html#pragma_encoding:
- It is not possible to change the internal encoding of an existing sqlite database.
- There is no way to ATTACH to a database created with a different encoding.
- According to https://www.sqlite.org/version3.html it should be possible for a developer to store and retrieve TEXT strings with ISO-8859 (Latin-1) encoded characters in case of UTF-8 database encoding.
From some research I discovered that it is generally more efficient to store the data in UTF-8 format:
- http://www.mimec.org/node/297
- http://sqlite.1065341.n5.nabble.com/UTF-16-API-a-second-class-citizen-td46048.html
For the reasons above I think it would be beneficial to fix the Windows version to use the UTF-8 encoding by default. (The easy way is to use PRAGMA encoding right after opening the database.) The user can then change the internal database encoding using PRAGMA encoding before writing any data. (See http://www.sqlite.org/pragma.html#pragma_encoding.)
ADDITIONAL IMPORTANT READING: http://sqlite.1065341.n5.nabble.com/UTF-16-API-a-second-class-citizen-td46048.html links to https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ which looks like essential reading for all serious sqlite users.