Skip to content

GH-45908: [C++][Docs] Expose basic {Array,...}FromJSON helpers as public APIs #46180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

amoeba
Copy link
Member

@amoeba amoeba commented Apr 18, 2025

Rationale for this change

These functions are generally useful and stable so it would be a good idea to clearly include them in the public API. I'm starting here with just the basic ones in order to make the PR small. See #45908 for more information.

What changes are included in this PR?

  • Moves ArrayFromJSON, ChunkedArrayFromJSON, DictArrayFromJSON, ScalarFromJSON, DictScalarFromJSON from arrow::ipc::internal namespace to arrow::util so it's clearer they're part of the public API
  • Renames json_simple.{h,cc} to from_json.{h,cc} which is clearer IMO
  • Adds User Guide and adds a listing in the API docs for moved functions

Are these changes tested?

Yes.

Are there any user-facing changes?

This expands the scope of our public API but does not break any existing public APIs though it's possible users

This comment was marked as resolved.

amoeba added 2 commits April 20, 2025 18:00
This moves the following functions from the IPC namespace to util to make it clear these are useful outside of their use in Arrow IPC.

- ArrayFromJSON
- ChunkedArrayFromJSON
- DictArrayFromJSON
- ScalarFromJSON
- DictScalarFromJSON
@amoeba amoeba force-pushed the feature/GH-45908--expose-from-json branch from 3b6015e to cf808f9 Compare April 21, 2025 01:00
@amoeba amoeba changed the title GH-45908: [C++] Expose {Array,...}FromJSON as public APIs GH-45908: [C++] Expose basic {Array,...}FromJSON helpers as public APIs Apr 21, 2025
@amoeba amoeba marked this pull request as ready for review April 21, 2025 02:36
@amoeba amoeba requested review from bkietz, zanmato1984 and pitrou April 21, 2025 02:37
@amoeba
Copy link
Member Author

amoeba commented Apr 21, 2025

@github-actions crossbow submit preview-docs

@amoeba amoeba changed the title GH-45908: [C++] Expose basic {Array,...}FromJSON helpers as public APIs GH-45908: [C++][Docs] Expose basic {Array,...}FromJSON helpers as public APIs Apr 21, 2025
Copy link

Revision: cf808f9

Submitted crossbow builds: ursacomputing/crossbow @ actions-7a9d15701e

Task Status
preview-docs GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Apr 21, 2025

Docs previews:

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this up! I have some minor comments.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 21, 2025
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 22, 2025
Co-authored-by: Enrico Minack <github@enrico.minack.dev>
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Apr 22, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 22, 2025
@amoeba
Copy link
Member Author

amoeba commented Apr 22, 2025

Thanks for the review @pitrou

Some general comments:

  • Moves ArrayFromJSON, ChunkedArrayFromJSON, DictArrayFromJSON, ScalarFromJSON, DictScalarFromJSON from arrow::ipc::internal namespace to arrow::util so it's clearer they're part of the public API
  • Renames json_simple.{h,cc} to from_json.{h,cc} which is clearer IMO

Perhaps we can find something more specific than arrow/util/from_json.h?

I'm not sure what a good name can be, but those are basically parsing Arrow data from a single JSON object or row (instead of a JSONL stream of objects), so the name should reflect that. For example arrow/from_json_object.h (probably too wordy?).

Adding an extra word wouldn't make the files longer than other files we already have so I think that's fine. Since "object" has a specific meaning in the context of JSON that differs from what these helpers are ingesting, maybe from_json_single or from_json_value? Maybe from_json_simple like it was before?

  • Includes from_json.{h,cc} in api.h

I don't think we should inflate api.h. It's enough that it includes the most fundamental Arrow APIs (and it may already be including too much, but of course we can't undo that :)).

Makes sense. I reverted the api.h change and updated the PR description.

@amoeba
Copy link
Member Author

amoeba commented Apr 22, 2025

@github-actions crossbow submit preview-docs

Copy link

Revision: 11da4d4

Submitted crossbow builds: ursacomputing/crossbow @ actions-bfb9efe751

Task Status
preview-docs GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Apr 22, 2025

Latest docs preview:

I think the only remaining review item on this PR is wrapping up what's being discussed in #46180 (comment).

@pitrou
Copy link
Member

pitrou commented Apr 23, 2025

Since "object" has a specific meaning in the context of JSON that differs from what these helpers are ingesting, maybe from_json_single or from_json_value? Maybe from_json_simple like it was before?

I like from_json_single and from_json_value. What do others think? @kou @bkietz @felipecrv

@kou
Copy link
Member

kou commented Apr 23, 2025

It focus on filename not function names (such as ArrayFromJSON()), right?

I like from_json.{cc,h} because functions in the file use XXXFromJSON().

(I think that developers don't think that the API accepts JSONL by from_json.{cc,h} because the filename doesn't use JSONL explicitly.)

@EnricoMi
Copy link
Contributor

I find single confusing and value redundant. Creating arrow from a simple JSON value should be from_json, and creating from something more complex JSON should be from_json_XXX.

As @kou says, it mimics the method suffix of all methods.

@pitrou
Copy link
Member

pitrou commented Apr 23, 2025

Our JSONL reader lives in arrow::json and can be included from arrow/json/reader.h. So we need to disambiguate somehow.

@pitrou
Copy link
Member

pitrou commented Apr 23, 2025

(I think that developers don't think that the API accepts JSONL by from_json.{cc,h} because the filename doesn't use JSONL explicitly.)

No, but developers will not understand the difference between arrow/from_json.h and arrow/json/reader.h, for example. Since arrow/json/reader.h is already a public API, we can't really change it.

@pitrou
Copy link
Member

pitrou commented Apr 23, 2025

And, by the way, we could also rename the functions since they are not public yet ;)

@bkietz
Copy link
Member

bkietz commented Apr 23, 2025

I like from_json_single and from_json_value

from_json_value seems fine, and I'll toss from_json_document into the mix as well

developers will not understand the difference between arrow/from_json.h and arrow/json/reader.h

Agreed. The most correct solution seems to be renaming to arrow/jsonl/reader.h maybe with a namespace alias that we can put through a deprecation cycle, which unfortunately looks like the path with maximum work

@kou
Copy link
Member

kou commented Apr 23, 2025

The most correct solution seems to be renaming to arrow/jsonl/reader.h maybe with a namespace alias that we can put through a deprecation cycle

I like this.

@amoeba
Copy link
Member Author

amoeba commented Apr 24, 2025

I'm not sure the benefit of putting these helpers in the public API outweighs the downsides of renaming arrow/json for users and the work required for us. @pitrou's point about avoiding confusion is important though. I also don't think arrow/jsonl is really more correct than arrow/json since JSONL (the format) falls under the umbrella of JSON (the concept/standard/grammar).

I should also point out that there are other *FromJSON helpers we should consider moving into the public API in future PRs (notably RecordBatchFromJSON/TableFromJSON but probably others) and what we do here may interact with that work. The ones I'm thinking of currently span arrow/util, arrow/compute, arrow/dataset, and could even include arrow/engine/substrait. I was thinking we'd have a few from_json.{h,cc} files in various sub-trees of the repo but maybe this is even more confusing. Thoughts?

@kou
Copy link
Member

kou commented Apr 24, 2025

If we think that JSON means JSONL too, I think that we don't need to distinguish *FromJSON and arrow/json/.

If we think that JSON doesn't mean JSONL, I think that renaming arrow/json/ to arrow/jsonl/ (or renaming only arrow::json::TableReader to arrow::json::JSONLReader and arrow::json::StreamingReader to arrow::json::JSONLStreamingReader or something) is better than renaming *FromJSON to something like *FromJSON{Simple,Value,...}. (BTW, is it really difficult deprecating arrow/json/ with keeping backward compatibility?)

@amoeba
Copy link
Member Author

amoeba commented Apr 24, 2025

(BTW, is it really difficult deprecating arrow/json/ with keeping backward compatibility?)

I'll admit it's entirely subjective. If most of those who have commented here are fine making the arrow/json to arrow/jsonl change, that's fine.

@pitrou
Copy link
Member

pitrou commented Apr 24, 2025

I also don't think arrow/jsonl is really more correct than arrow/json since JSONL (the format) falls under the umbrella of JSON (the concept/standard/grammar).

I can agree with this. In the end we must find an organization that is easy to understand for the user. It can involve renaming files or not, renaming APIs or not...

@EnricoMi
Copy link
Contributor

JSONL is just lines of JSON, which feels more like a property of the reader than being a flavour of JSON. In my opinion, it does not feel contrary to JSON.

Assuming the JSONL code stays in arrow/json/reader.h (reading files), the "new" methods could live in arrow/json/from_string.h, given they all parse strings. What kind of JSON type is method specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants