Skip to content

ARROW-77: [C++] Conform bitmap interpretation to ARROW-62; 1 for nulls, 0 for non-nulls #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Mar 23, 2016

No description provided.

const uint8_t* other_data = other.raw_data_;

for (int i = 0; i < length_; ++i) {
if (!IsNull(i) && memcmp(this_data, other_data, value_size_)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danrobinson here you see the problem with uninitialized slots that you brought up on the ML in action. This can obviously can (and should) be special-cased for 1, 2, 4, and 8 bytes using integer comparisons. I am not sure it is going to be safe in general to assume that Arrow writers have zero'd out the uninitialized slots =|

@wesm wesm changed the title ARROW-77: Conform bitmap interpretation to ARROW-62; 1 for nulls, 0 for non-nulls ARROW-77: [C++] Conform bitmap interpretation to ARROW-62; 1 for nulls, 0 for non-nulls Mar 23, 2016
@@ -176,9 +184,10 @@ class PrimitiveBuilder : public ArrayBuilder {
void AppendNulls(const uint8_t* null_bytes, int32_t length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change null_bytes here to valid_bytes? update comments above and below method signature?

@emkornfield
Copy link
Contributor

Might be worth looking at: grep -r nulls src/* and converting most of the parameters from null to valid for consistency sake.

@wesm
Copy link
Member Author

wesm commented Mar 23, 2016

Good point, let me clean some of these bits up here.

@wesm
Copy link
Member Author

wesm commented Mar 23, 2016

Did a round of refactoring to change nulls to null_bitmap (when it refers to a buffer) and null_bytes to valid_bytes where it refers to a vector<uint8_t>.

@emkornfield
Copy link
Contributor

Thanks looks good.

@wesm
Copy link
Member Author

wesm commented Mar 24, 2016

+1, thank you

@asfgit asfgit closed this in fbbee3d Mar 24, 2016
@wesm wesm deleted the ARROW-77 branch March 24, 2016 16:32
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 2, 2018
Author: Nong Li <nongli@gmail.com>

Closes apache#35 from nongli/parquet-503 and squashes the following commits:

cb2a4e1 [Nong Li] PARQUET-503: Reenable parquet 2.0 encoding implementations.
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 4, 2018
Author: Nong Li <nongli@gmail.com>

Closes apache#35 from nongli/parquet-503 and squashes the following commits:

cb2a4e1 [Nong Li] PARQUET-503: Reenable parquet 2.0 encoding implementations.

Change-Id: Id3801ddb44164bcc63adc3ee83250d33c1d7e191
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 6, 2018
Author: Nong Li <nongli@gmail.com>

Closes apache#35 from nongli/parquet-503 and squashes the following commits:

cb2a4e1 [Nong Li] PARQUET-503: Reenable parquet 2.0 encoding implementations.

Change-Id: Id3801ddb44164bcc63adc3ee83250d33c1d7e191
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 7, 2018
Author: Nong Li <nongli@gmail.com>

Closes apache#35 from nongli/parquet-503 and squashes the following commits:

cb2a4e1 [Nong Li] PARQUET-503: Reenable parquet 2.0 encoding implementations.

Change-Id: Id3801ddb44164bcc63adc3ee83250d33c1d7e191
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 8, 2018
Author: Nong Li <nongli@gmail.com>

Closes apache#35 from nongli/parquet-503 and squashes the following commits:

cb2a4e1 [Nong Li] PARQUET-503: Reenable parquet 2.0 encoding implementations.

Change-Id: Id3801ddb44164bcc63adc3ee83250d33c1d7e191
kou pushed a commit that referenced this pull request May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test #20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test #21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test #22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test #23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test #24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test #25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test #26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test #27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test #28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test #29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test #30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test #31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test #32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test #33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test #34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test #35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test #36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test #37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test #38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test #39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test #40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test #41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test #42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test #43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test #44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test #45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test #46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test #47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test #48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test #49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test #50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test #51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes #7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
FelixYBW pushed a commit to FelixYBW/arrow that referenced this pull request Nov 3, 2021
…ta from file format (apache#35)

* Dataset: Add API to ignore both filter and project after scanning data from file format

* Fixup

* Fixup
jayhomn-bitquill referenced this pull request in Bit-Quill/arrow Aug 10, 2022
Implement toString method for JdbcArray class
paddyroddy referenced this pull request in rok/arrow Jul 19, 2025
* chore: restart

* update ruff config

* build: add extra dependencies

* update mypy config

* feat: add util.pyi

* feat: add types.pyi

* feat: impl lib.pyi

* update

* feat: add acero.pyi

* feat: add compute.pyi

* add benchmark.pyi

* add cffi

* feat: add csv.pyi

* disable isort single line

* reformat

* update compute.pyi

* add _auzurefs.pyi

* add _cuda.pyi

* add _dataset.pyi

* rename _stub_typing.pyi -> _stubs_typing.pyi

* add _dataset_orc.pyi

* add pyarrow-stubs/_dataset_parquet_encryption.pyi

* add _dataset_parquet.pyi

* add _feather.pyi

* feat: add _flight.pyi

* add _fs.pyi

* add _gcsfs.pyi

* add _hdfs.pyi

* add _json.pyi

* add _orc.pyi

* add _parquet_encryption.pyi

* add _parquet.pyi

* update

* add _parquet.pyi

* add _s3fs.pyi

* add _substrait.pyi

* update

* update

* add parquet/core.pyi

* add parquet/encryption.pyi

* add BufferProtocol

* impl _filesystemdataset_write

* add dataset.pyi

* add feather.pyi

* add flight.pyi

* add fs.pyi

* add gandiva.pyi

* add json.pyi

* add orc.pyi

* add pandas_compat.pyi

* add substrait.pyi

* update util.pyi

* add interchange

* add __lib_pxi

* update __lib_pxi

* update

* update

* add types.pyi

* feat: add scalar.pyi

* update types.pyi

* update types.pyi

* update scalar.pyi

* update

* update

* update

* update

* update

* update

* feat: impl array

* feat: add builder.pyi

* add scipy

* add tensor.pyi

* feat: impl NativeFile

* update io.pyi

* complete io.pyi

* add ipc.pyi

* mv benchmark.pyi into __lib_pxi

* add table.pyi

* do re-export in lib.pyi

* fix io.pyi

* update

* optimize scalar.pyi

* optimize indices

* complete ipc.pyi

* update

* fix NullableIterable

* fix string array

* ignore overload-overlap error

* fix _Tabular.__getitem__

* remove additional_dependencies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants