Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ruby] #to_h and #to_json with Hash/Array/Struct etc returns {"descriptor":[{}...{}]} when active_support/core_ext/object/json loaded #17037

Open
jufemaiz opened this issue Jun 5, 2024 · 8 comments
Assignees
Labels

Comments

@jufemaiz
Copy link
Contributor

jufemaiz commented Jun 5, 2024

What version of protobuf and what language are you using?

Version: main/v3.27.0+ etc.
Language: Ruby

What operating system (Linux, Windows, ...) and version?

MacOS 14.5

What runtime / compiler are you using (e.g., python version or gcc version) Ruby v3.2

What did you do?

message Thing {
  string whatsit = 1;
}

Build a library with it (or anything else). Then include it and build something containing a protobuf message. For example:

require 'active_support'
require 'active_support/core_ext/object/json'

...

thing_proto = Thing.new(whatsit: "Something random")

ThingInAThing = Struct.new(thing:)

thing_struct = ThingInAThing.new(thing: thing_proto)
thing_hash = { thing: thing_proto }
thing_array = [ thing_proto ]

A key thing to note is this quote from Matz a decade ago:

Ruby often has two conversion methods for an object, e.g. #to_s and #to_str, #to_i and #to_int, #to_h and #to_hash.
The former is used for explicit conversion, the latter is used for implicit conversion (from an object with identical method signature, for example proxy).
Ref: https://bugs.ruby-lang.org/issues/10180

ActiveSupport rightfully makes use of the :to_hash method for implicitly transforming an Object to JSON format.
Ref: https://github.com/rails/rails/blame/v7.1.3.4/activesupport/lib/active_support/core_ext/object/json.rb#L58-L66

Unfortunately attempting to call #to_hash as opposed to #to_h fails on protobuf messages. Aliasing #to_h for #to_hash calls seems a sensible option here in the absence of shifting the #to_json method to also make use of #to_hash as Matz implies is correct.

What did you expect to see

First, when calling #to_h expect to see the marshalled hash even when nested. That is:

thing_struct.to_h => { thing: { whatsit: "Something random" } }
thing_hash.to_h => { thing: { whatsit: "Something random" } }

Second, when calling #to_json expect to see the marshalled json, even when nested. That is:

thing_struct.to_json => "{\"thing\":{\"whatsit\":\"Something random\"}}"
thing_hash.to_json => "{\"thing\":{\"whatsit\":\"Something random\"}}"
thing_array.to_json => "[{\"whatsit\":\"Something random\"}]"

What did you see instead?

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

When calling #to_h:

thing_struct.to_h => { thing: <Thing: whatsit: "Something Random"> }
thing_hash.to_h => { thing: <Thing: whatsit: "Something Random"> }

When calling #to_json:

thing_struct.to_json => "{\"thing\":{\"descriptor\":[{}]}}"
thing_hash.to_json => "{\"thing\":{\"descriptor\":[{}]}}"
thing_array.to_json => "[{\"descriptor\":[{}]}]"

Anything else we should know about your project / environment

Possibly related to #9500

@jufemaiz jufemaiz added the untriaged auto added to all issues by default when created. label Jun 5, 2024
@jufemaiz jufemaiz changed the title [Ruby] .to_h and .to_json with Hash/Array/Struct etc returns {"descriptor":[{}{}{}...{}]} [Ruby] .to_h and .to_json with Hash/Array/Struct etc returns {"descriptor":[{}...{}]} Jun 5, 2024
@jufemaiz jufemaiz changed the title [Ruby] .to_h and .to_json with Hash/Array/Struct etc returns {"descriptor":[{}...{}]} [Ruby] .to_h and .to_json with Hash/Array/Struct etc returns {"descriptor":[{}...{}]} when active_support/core_ext/object/json loaded Jun 5, 2024
@jufemaiz jufemaiz changed the title [Ruby] .to_h and .to_json with Hash/Array/Struct etc returns {"descriptor":[{}...{}]} when active_support/core_ext/object/json loaded [Ruby] #to_h and #to_json with Hash/Array/Struct etc returns {"descriptor":[{}...{}]} when active_support/core_ext/object/json loaded Jun 5, 2024
@acozzette acozzette added ruby and removed untriaged auto added to all issues by default when created. labels Jun 7, 2024
@haberman
Copy link
Member

Hi @jufemaiz, can you clarify exactly what the proposed resolution is here?

Would that change break any existing use cases?

@jufemaiz
Copy link
Contributor Author

There's two options, with one that I would lean towards due to Matz's language design.

  1. (preferred): protobuf to add #to_hash as the default internally consistent method, but also expose #to_h for direct use (whether this differs is a discussion more broadly if there is need for additional fields visible internalls).
  2. (alternative): protobuf adds #to_hash as an alias of #to_h.

Digging into this further,

def to_h_internal(msg, message_descriptor)
return nil if msg.nil? or msg.null?
hash = {}
iter = ::FFI::MemoryPointer.new(:size_t, 1)
iter.write(:size_t, Google::Protobuf::FFI::Upb_Message_Begin)
message_value = Google::Protobuf::FFI::MessageValue.new
field_def_ptr = ::FFI::MemoryPointer.new :pointer
while Google::Protobuf::FFI::message_next(msg, message_descriptor, nil, field_def_ptr, message_value, iter) do
field_descriptor = FieldDescriptor.from_native field_def_ptr.get_pointer(0)
if field_descriptor.map?
hash_entry = map_create_hash(message_value[:map_val], field_descriptor)
elsif field_descriptor.repeated?
hash_entry = repeated_field_create_array(message_value[:array_val], field_descriptor, field_descriptor.type)
else
hash_entry = scalar_create_hash(message_value, field_descriptor.type, field_descriptor: field_descriptor)
end
hash[field_descriptor.name.to_sym] = hash_entry
end
there's already a non-private #to_h implementation (that #to_h sometimes (?) directly calls), however the conventions of ruby are not being adopted, causing issues elsewhere there is an expectation of convention.

@haberman
Copy link
Member

Please note that the ffi/ directory is an experimental implementation that is not currently the default. The default implementation is written in C:

/*
* call-seq:
* Message.to_h => {}
*
* Returns the message as a Ruby Hash object, with keys as symbols.
*/
static VALUE Message_to_h(VALUE _self) {
Message* self = ruby_to_Message(_self);
return Message_CreateHash(self->msg, self->msgdef);
}

I don't quite understand how options (1) and (2) are different. What's an example of how we might want #to_hash and #to_h to differ?

@jufemaiz
Copy link
Contributor Author

Ah fair enough on the ffi/ directory.

I am not sure why, but there may be reasons that the core team want a, by convention, internal function to differ from the external function. If there's no reason, aliasing one way or another would resolve the issue – and given the lack of implementation of #to_hash I cannot imagine there would be a major issue in doing so.

@haberman
Copy link
Member

Introducing #to_hash as an alias of #to_h seems reasonable to me.

@jufemaiz
Copy link
Contributor Author

How would you like to proceed on this? Changes to the c code?

Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

@github-actions github-actions bot added the inactive Denotes the issue/PR has not seen activity in the last 90 days. label Sep 24, 2024
@haberman
Copy link
Member

haberman commented Oct 7, 2024

Ideally this could be done at the Ruby level, with a method alias like some of the ones we have here: https://github.com/protocolbuffers/protobuf/blob/main/ruby/lib/google/protobuf/message_exts.rb Does that seem viable?

That would save us from having to duplicate this between the different backends we currently have (C, FFI, JRuby).

@github-actions github-actions bot removed the inactive Denotes the issue/PR has not seen activity in the last 90 days. label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants