You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I understand it, zookeeper stores data as array of bytes internally, which should make it encoding-agnostic. However, when I use a zk client library, I expect the data I store in zookeeper to have consistent encoding all the time.
The problem
UTF-8 encoded strings stored in zookeeper with zk.create turn into ASCII-8BIT encoded strings (displaying the UTF-8 bytes) after being retrieved with zk.get. This is true for characters in the ascii range (where it is not too bad) and also for characters outside the ascii range (where it is more problematic because those strings will not encode back to UTF-8 without raising Encoding::UndefinedConversionError).
Workaround: in application code, force the encoding after retrieving the data (eg. data.force_encoding('UTF-8').
PS: Using ruby 1.9.3 and 2.1 on Ubuntu 14.04 LTS (amd 64).
How to reproduce
Save and run the following script (zk must be installed or part of the current bundle):
#!/usr/bin/env ruby1.9.1# -*- encoding: utf-8 -*-require'zk'defencoding_bug(zk,val,path='/testme-encoding')puts"* we would expect the original value and its copy retrieved from zk to be the same"puts"* however the retrieved value lost has its original encoding and must be force-encoded"puts"* to be usable with eg. JSON.encode() which cast non-UTF-8 strings to UTF-8."puts"original value #{val.inspect}, with encoding: " + val.encoding.inspectzk.create(path,val)beginval2=zk.get(path).firstputs"retrieved value #{val2.inspect}, with encoding: " + val2.encoding.inspectprint"attempting to encode val2 to UTF-8, its real encoding => "beginval2.encode('UTF-8')raise"should be failing with Encoding::UndefinedConversionError!"rescueEncoding::UndefinedConversionError=>eputs"as expected, raises " + e.inspectendprint"attempting to force encoding to 'UTF-8', its original encoding => "beginval2.force_encoding('UTF-8')puts"succeeds, val2 is now " + val2.inspectrescue=>eputs"encountered an unexpected exception: " + e.inspectendensurezk.delete(path)endenduri=ARGV.first || 'localhost:2181'encoding_bug(ZK.new(uri),'é')
The text was updated successfully, but these errors were encountered:
"I expect the data I store in zookeeper to have consistent encoding
all the time."
Apparently it's ASCII-8BIT :)
All kidding aside, I never stored anything but coordination data in
zookeeper (and IMHO if you're using zk as a database, you're doing it
wrong, but that's immaterial) so I've never paid close attention to
encoding issues.
zk cannot make assumptions about the type of data people are storing.
One place was serializing structs into binary and storing that. How
would ZK handle that use case if it mandated UTF8 data?
Actually, we use zk as an index to store location of mirrors to access content of a resources (that may change all the time). The problem on our side was that we didn't validate our input data enough ended up with some of the URIs containing characters that should have been escaped.
Introduction
As I understand it, zookeeper stores data as array of bytes internally, which should make it encoding-agnostic. However, when I use a zk client library, I expect the data I store in zookeeper to have consistent encoding all the time.
The problem
UTF-8 encoded strings stored in zookeeper with
zk.create
turn into ASCII-8BIT encoded strings (displaying the UTF-8 bytes) after being retrieved withzk.get
. This is true for characters in the ascii range (where it is not too bad) and also for characters outside the ascii range (where it is more problematic because those strings will not encode back to UTF-8 without raisingEncoding::UndefinedConversionError
).Workaround: in application code, force the encoding after retrieving the data (eg.
data.force_encoding('UTF-8')
.PS: Using ruby 1.9.3 and 2.1 on Ubuntu 14.04 LTS (amd 64).
How to reproduce
Save and run the following script (zk must be installed or part of the current bundle):
The text was updated successfully, but these errors were encountered: