Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed null type #1033

Merged
merged 3 commits into from
Apr 2, 2024
Merged

Conversation

norberttech
Copy link
Member

Change Log

Added

Fixed

Changed

Removed

  • Removed null type

Deprecated

Security


Description

Null as a EntryType was always problematic, mostly due to simple fact that in most data sources/sinks it simply does not exists as a standalone type. For example in databases, we can have nullable columns but not null fields.
Same with Parquet and Avro, we can have nullable strings/integeres/maps/etc but we don't have null column type.

That was creating a very inconsistent behavior when reading from those strongly typed data srouces.
Reading a nullable boolean column from parquet where all values are null we would get NullEntry in Flow Schema, instead of getting BoolEntry that is nullable. That was similarly problematic when we wanted to write into parquet a nullable column with only null values in a given row group.

Another indication that this change was necessary is how well it worked with Entry Type while creating entries.

Copy link
Contributor

github-actions bot commented Mar 31, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 35.282mb +0.01%  | 838.205ms +0.96% | ±1.63% -39.84%  |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 5.005mb +0.04%   | 341.535ms -0.23% | ±0.52% +688.28% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.156mb +0.11%   | 1.066s +2.13%    | ±0.63% +140.39% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 135.831mb -0.00% | 905.395ms +1.11% | ±0.91% -8.00%   |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.913mb +0.09%   | 35.967ms +1.51%  | ±1.12% +96.31%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.919mb +0.09%   | 433.765ms -1.68% | ±0.40% -61.67%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev        |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 110.619mb +5.07% | 60.784ms -7.74% | ±0.74% -5.95% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 95.663mb +1.06%  | 457.220ms -3.61% | ±0.54% -45.75%  |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.144mb +1.85%  | 68.352ms -6.42%  | ±0.41% -45.81%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 106.570mb +0.95% | 50.106ms -6.15%  | ±1.21% +55.96%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 224.394mb +1.16% | 1.397s -3.24%    | ±0.42% -23.12%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.960mb +0.03%  | 39.183ms -5.39%  | ±1.10% +825.95% |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 76.686mb +13.52% | 3.219ms -5.30%   | ±1.43% +445.59% |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 96.413mb +6.47%  | 187.699ms +0.22% | ±1.15% +217.23% |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 74.938mb +13.92% | 18.926ms -1.80%  | ±0.69% -24.00%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 77.926mb +13.30% | 1.654ms -10.28%  | ±0.87% -76.41%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 77.926mb +13.30% | 1.639ms -17.40%  | ±0.71% -69.47%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 75.038mb +13.81% | 2.464ms -10.41%  | ±2.81% +383.25% |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 75.567mb +13.72% | 16.100ms +7.24%  | ±0.87% +206.49% |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 75.567mb +13.72% | 16.326ms +6.85%  | ±0.71% -48.81%  |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 73.471mb +14.11% | 1.706μs -5.22%   | ±2.72% +0.00%   |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 73.471mb +14.11% | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 87.025mb +7.08%  | 12.717ms -2.75%  | ±0.80% +340.88% |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 116.386mb +5.30% | 60.666ms -10.05% | ±0.66% -24.34%  |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 76.086mb +13.62% | 1.193ms -19.06%  | ±1.03% -62.47%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 79.433mb +13.05% | 62.706ms +4.58%  | ±2.21% +925.90% |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 78.188mb +13.26% | 3.792ms -7.16%   | ±0.48% -42.91%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 73.549mb +14.09% | 40.137ms -0.46%  | ±0.98% +221.08% |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 73.550mb +14.09% | 40.048ms -1.66%  | ±0.65% +43.45%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 73.549mb +14.09% | 40.227ms -1.84%  | ±0.48% -23.46%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 75.912mb +13.65% | 7.372ms -3.54%   | ±0.88% +164.01% |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 73.471mb +14.11% | 28.914ms -1.31%  | ±1.46% -1.05%   |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 73.471mb +14.11% | 13.482μs -1.47%  | ±1.06% -32.58%  |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 73.471mb +14.11% | 16.212μs +0.82%  | ±1.63% +55.38%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 96.479mb +6.40%  | 190.532ms -0.37% | ±0.90% +22.64%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.718mb +0.01% | 507.385ms +4.74% | ±0.22% -69.62%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 60.196mb +0.02%  | 255.960ms +3.95% | ±0.29% -43.63%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 15.130mb +0.06%  | 54.362ms +3.22%  | ±0.94% -24.98%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.961mb -0.00%  | 428.140ms -1.43% | ±0.34% +46.03%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.500mb -0.00%  | 85.088ms -2.54%  | ±0.56% +563.11% |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@github-actions github-actions bot added the docs label Mar 31, 2024
@norberttech norberttech merged commit ef10a6d into flow-php:1.x Apr 2, 2024
17 checks passed
@norberttech norberttech deleted the feature/remove-null-type branch May 9, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant