Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
cd6550b
Creating new kernel for cumulative sum
JabariBooker Feb 17, 2022
067f846
Now made a scalar function; Created struct to hold exec and kernel si…
JabariBooker Feb 25, 2022
a747cb4
Changed handle chunked arrays and arrays with null indicies
JabariBooker Mar 2, 2022
30984e1
Changed to vector function, starting creation corresponding test, inc…
JabariBooker Mar 4, 2022
59fad23
Starting value is now part of options; working more adding new test
JabariBooker Mar 9, 2022
93644f8
Minor changes to cumulative sum kernel
JabariBooker Mar 9, 2022
dd45c3d
Added option for skipping nulls
JabariBooker Mar 18, 2022
236d56b
Fixed some errors; reworking addition for cumulative sum
JabariBooker Mar 22, 2022
19a9d05
Creating separate header for addition kernels
JabariBooker Mar 23, 2022
2c3722f
Creation of CumulativeMeta, simplification of CumulativeSum, and movi…
JabariBooker Mar 24, 2022
73f7a32
Removed CumulativeSum completely; now defined by use of Add operator
JabariBooker Mar 24, 2022
9ff405e
minor changes and initial tests
edponce Mar 24, 2022
4b42c8f
Refactoring, added optionstype to CumulativeGeneric template, and han…
JabariBooker Mar 25, 2022
e318bec
Linting
JabariBooker Mar 27, 2022
940c0c2
Adding Tests for floating and temporal types
JabariBooker Mar 30, 2022
5802d01
Simplified CumulateSum tests
JabariBooker Mar 31, 2022
a4a7d37
Added "Checked" version for overflow detection
JabariBooker Apr 1, 2022
4501d7f
Creating new kernel for cumulative sum
JabariBooker Feb 17, 2022
634273e
Now made a scalar function; Created struct to hold exec and kernel si…
JabariBooker Feb 25, 2022
fc388d7
Changed handle chunked arrays and arrays with null indicies
JabariBooker Mar 2, 2022
df3e6ae
Changed to vector function, starting creation corresponding test, inc…
JabariBooker Mar 4, 2022
6f42a0f
Starting value is now part of options; working more adding new test
JabariBooker Mar 9, 2022
56f2d78
Minor changes to cumulative sum kernel
JabariBooker Mar 9, 2022
1ceebce
Added option for skipping nulls
JabariBooker Mar 18, 2022
0a9e1c9
Fixed some errors; reworking addition for cumulative sum
JabariBooker Mar 22, 2022
5d7d2ea
Creating separate header for addition kernels
JabariBooker Mar 23, 2022
671659a
Creation of CumulativeMeta, simplification of CumulativeSum, and movi…
JabariBooker Mar 24, 2022
a07d7a1
Removed CumulativeSum completely; now defined by use of Add operator
JabariBooker Mar 24, 2022
1086c0b
minor changes and initial tests
edponce Mar 24, 2022
9615188
Refactoring, added optionstype to CumulativeGeneric template, and han…
JabariBooker Mar 25, 2022
05c799d
Linting
JabariBooker Mar 27, 2022
aa27d96
Adding Tests for floating and temporal types
JabariBooker Mar 30, 2022
f3218af
Simplified CumulateSum tests
JabariBooker Mar 31, 2022
cfe8fc4
Added "Checked" version for overflow detection
JabariBooker Apr 1, 2022
bd9f1b7
IWYU
edponce Apr 2, 2022
2589478
general reorganization/fixes
edponce Apr 4, 2022
55ae02a
cast start option
edponce Apr 4, 2022
3d3261c
split Call based on skip nulls, minor changes and notes
edponce Apr 4, 2022
bd265d2
update docs with cumulative function/options
edponce Apr 4, 2022
c53eede
Merge branch 'ARROW-13530' into ARROW-13530-cumulative-sum
JabariBooker Apr 4, 2022
cf4c652
Merge pull request #2 from edponce/ARROW-13530-cumulative-sum
JabariBooker Apr 4, 2022
0997113
Adding chunked arrays to tests
JabariBooker Apr 4, 2022
fcd4272
Merge branch 'ARROW-13530' of https://github.com/JabariBooker/arrow i…
JabariBooker Apr 4, 2022
eaf5f3e
Wrote tests for chunked arrays and fixed some bugs when computing the…
JabariBooker Apr 6, 2022
1148f1e
Making changes suggested to PR
JabariBooker Apr 6, 2022
2f3714b
Simplifying write to null bitmap
JabariBooker Apr 7, 2022
c90d535
Minor changes to funtion docs, comments, and variables
JabariBooker Apr 8, 2022
6bba2fc
add CumulativeOptionsWrapper
edponce Apr 11, 2022
58b36d5
initialize with floating-point zero
edponce Apr 11, 2022
1933da7
set default start to float64, add Python tests
edponce Apr 12, 2022
3667981
Merge pull request #3 from edponce/ARROW-13530-cumulative-sum-with-op…
JabariBooker Apr 12, 2022
ae25c71
Moved basic arithmetic kernels
JabariBooker Apr 12, 2022
b061ab4
Change from Type-Parameterized to Value-Parameterized Tests
JabariBooker Apr 15, 2022
84bdcd7
Including minor suggestions
JabariBooker Apr 15, 2022
bba5fcf
Merge branch 'master' into ARROW-13530
JabariBooker Apr 19, 2022
4e58857
Cumulative operations now trivially processes Scalar inputs
JabariBooker Apr 20, 2022
441a812
Merge branch 'ARROW-13530' of https://github.com/JabariBooker/arrow i…
JabariBooker Apr 20, 2022
6509bef
Fixing missed linting issue and handling
JabariBooker Apr 20, 2022
c596d17
Correcting logic for handling chunked arrays
JabariBooker Apr 21, 2022
16c4489
Adding C++ documentation for Cumulative Functions
JabariBooker Apr 21, 2022
e8be46a
Editting wording of C++ documentation
JabariBooker Apr 25, 2022
f825130
Updating wording of function doc
JabariBooker Apr 26, 2022
9d1f44f
Removed period at end of function doc summary
JabariBooker Apr 26, 2022
1f2df7c
Using non-parameterized tests
JabariBooker Apr 29, 2022
b04f898
Reserving memory up front for CumulativeGeneric and correcting Scalar…
JabariBooker May 3, 2022
4e3e1af
Adding more test inputs to tests and using CheckVectorUnary
JabariBooker May 4, 2022
bbf7ff4
Added testing for "cumulative_sum_checked" and added checks for overflow
JabariBooker May 5, 2022
9ac5c36
Merge branch 'master' into ARROW-13530
JabariBooker May 6, 2022
e7203a0
Change MakeVectorCumulativeFunction to reflect changes to VectorFunction
JabariBooker May 6, 2022
2fe3e3a
Handling Statuses returned from NumericBuilder
JabariBooker May 6, 2022
02a54be
Added arrays to overflow check
JabariBooker May 12, 2022
3673dec
Handling builder.Append() Status output
JabariBooker May 15, 2022
2e20590
Using templated functions instead of macros for IntegerOverflow test
JabariBooker May 16, 2022
3f36557
Added test for scalar null inputs; Minor change to python tests
JabariBooker May 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,7 @@ if(ARROW_COMPUTE)
compute/kernels/scalar_validity.cc
compute/kernels/util_internal.cc
compute/kernels/vector_array_sort.cc
compute/kernels/vector_cumulative_ops.cc
compute/kernels/vector_hash.cc
compute/kernels/vector_nested.cc
compute/kernels/vector_replace.cc
Expand Down
26 changes: 26 additions & 0 deletions cpp/src/arrow/compute/api_vector.cc
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,10 @@ static auto kPartitionNthOptionsType = GetFunctionOptionsType<PartitionNthOption
static auto kSelectKOptionsType = GetFunctionOptionsType<SelectKOptions>(
DataMember("k", &SelectKOptions::k),
DataMember("sort_keys", &SelectKOptions::sort_keys));
static auto kCumulativeSumOptionsType = GetFunctionOptionsType<CumulativeSumOptions>(
DataMember("start", &CumulativeSumOptions::start),
DataMember("skip_nulls", &CumulativeSumOptions::skip_nulls),
DataMember("check_overflow", &CumulativeSumOptions::check_overflow));
} // namespace
} // namespace internal

Expand Down Expand Up @@ -176,6 +180,18 @@ SelectKOptions::SelectKOptions(int64_t k, std::vector<SortKey> sort_keys)
sort_keys(std::move(sort_keys)) {}
constexpr char SelectKOptions::kTypeName[];

CumulativeSumOptions::CumulativeSumOptions(double start, bool skip_nulls,
bool check_overflow)
: CumulativeSumOptions(std::make_shared<DoubleScalar>(start), skip_nulls,
check_overflow) {}
CumulativeSumOptions::CumulativeSumOptions(std::shared_ptr<Scalar> start, bool skip_nulls,
bool check_overflow)
: FunctionOptions(internal::kCumulativeSumOptionsType),
start(std::move(start)),
skip_nulls(skip_nulls),
check_overflow(check_overflow) {}
constexpr char CumulativeSumOptions::kTypeName[];

namespace internal {
void RegisterVectorOptions(FunctionRegistry* registry) {
DCHECK_OK(registry->AddFunctionOptionsType(kFilterOptionsType));
Expand All @@ -185,6 +201,7 @@ void RegisterVectorOptions(FunctionRegistry* registry) {
DCHECK_OK(registry->AddFunctionOptionsType(kSortOptionsType));
DCHECK_OK(registry->AddFunctionOptionsType(kPartitionNthOptionsType));
DCHECK_OK(registry->AddFunctionOptionsType(kSelectKOptionsType));
DCHECK_OK(registry->AddFunctionOptionsType(kCumulativeSumOptionsType));
}
} // namespace internal

Expand Down Expand Up @@ -325,6 +342,15 @@ Result<std::shared_ptr<Array>> DropNull(const Array& values, ExecContext* ctx) {
return out.make_array();
}

// ----------------------------------------------------------------------
// Cumulative functions

Result<Datum> CumulativeSum(const Datum& values, const CumulativeSumOptions& options,
ExecContext* ctx) {
auto func_name = (options.check_overflow) ? "cumulative_sum_checked" : "cumulative_sum";
return CallFunction(func_name, {Datum(values)}, &options, ctx);
}

// ----------------------------------------------------------------------
// Deprecated functions

Expand Down
27 changes: 27 additions & 0 deletions cpp/src/arrow/compute/api_vector.h
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,27 @@ class ARROW_EXPORT PartitionNthOptions : public FunctionOptions {
NullPlacement null_placement;
};

/// \brief Options for cumulative sum function
class ARROW_EXPORT CumulativeSumOptions : public FunctionOptions {
public:
explicit CumulativeSumOptions(double start = 0, bool skip_nulls = false,
bool check_overflow = false);
explicit CumulativeSumOptions(std::shared_ptr<Scalar> start, bool skip_nulls = false,
bool check_overflow = false);
static constexpr char const kTypeName[] = "CumulativeSumOptions";
static CumulativeSumOptions Defaults() { return CumulativeSumOptions(); }

/// Optional starting value for cumulative operation computation
std::shared_ptr<Scalar> start;

/// If true, nulls in the input are ignored and produce a corresponding null output.
/// When false, the first null encountered is propagated through the remaining output.
bool skip_nulls = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming here is not very good because nulls are never skipped. That said, Pandas uses a similar naming and I don't have a better suggestion. @jorisvandenbossche Any opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I took the naming from actually.


/// When true, returns an Invalid Status when overflow is detected
bool check_overflow = false;
Comment on lines +208 to +209
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are two different functions ("cumulative_sum" and "cumulative_sum_checked"), I don't think it makes sense to also have an option for this. Also, it seems actually ignored...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inline with the existing scalar arithmetic functions that uses a check_overflow in a similar way. This can be changed, but then I wonder if the arithmetic functions should be changed in a separate issue.

};

/// @}

/// \brief Filter with a boolean selection filter
Expand Down Expand Up @@ -522,6 +543,12 @@ Result<Datum> DictionaryEncode(
const DictionaryEncodeOptions& options = DictionaryEncodeOptions::Defaults(),
ExecContext* ctx = NULLPTR);

ARROW_EXPORT
Result<Datum> CumulativeSum(
const Datum& values,
const CumulativeSumOptions& options = CumulativeSumOptions::Defaults(),
ExecContext* ctx = NULLPTR);

// ----------------------------------------------------------------------
// Deprecated functions

Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/kernels/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ add_arrow_benchmark(scalar_temporal_benchmark PREFIX "arrow-compute")

add_arrow_compute_test(vector_test
SOURCES
vector_cumulative_ops_test.cc
vector_hash_test.cc
vector_nested_test.cc
vector_replace_test.cc
Expand Down
Loading