Skip to content

Commit

Permalink
Merge branch 'branch-0.11' into libcudf++/datetime_ops
Browse files Browse the repository at this point in the history
  • Loading branch information
trxcllnt authored Nov 20, 2019
2 parents 86a7973 + 51af53d commit e2537d5
Show file tree
Hide file tree
Showing 94 changed files with 7,752 additions and 580 deletions.
27 changes: 25 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# cuDF 0.11.0 (Date TBD)

## New Features

- PR #2905 Added `Series.median()` and null support for `Series.quantile()`
- PR #2930 JSON Reader: Support ARROW_RANDOM_FILE input
- PR #2956 Add `cudf::stack` and `cudf::tile`
- PR #2980 Added nvtext is_vowel/is_consonant functions
Expand All @@ -26,12 +26,19 @@
- PR #3278 Add `to_host` utility to copy `column_view` to host
- PR #3087 Add new cudf::experimental bool8 wrapper
- PR #3219 Construct column from column_view
- PR #3229 Define and implement new search APIs
- PR #3308 java add API for memory usage callbacks
- PR #2691 Row-wise reduction and scan operations via CuPy
- PR #3291 Add normalize_nans_and_zeros
- PR #3344 java split API
- PR #2791 Add `groupby.std()`
- PR #3368 Enable dropna argument in dask_cudf groupby
- PR #3298 add null replacement iterator for column_device_view
- PR #3297 Define and implement new groupby API.
- PR #3396 Update device_atomics with new bool8 and timestamp specializations
- PR #3393 Implement df.cov and enable covariance/correlation in dask_cudf
- PR #3401 Add dask_cudf ORC writer (to_orc)
- PR #3331 Add copy_if_else

## Improvements

Expand Down Expand Up @@ -81,6 +88,7 @@
- PR #3171 Move deprecated error macros to legacy
- PR #3191 Port NVStrings integer convert ops to cudf column
- PR #3189 Port NVStrings find ops to cudf column
- PR #3352 Port NVStrings convert float functions to cudf strings column
- PR #3193 Add cuPy as a formal dependency
- PR #3195 Support for zero columned `table_view`
- PR #3165 Java device memory size for string category
Expand All @@ -91,6 +99,7 @@
- PR #3350 Port NVStrings booleans convert functions
- PR #3231 Add `column::release()` to give up ownership of contents.
- PR #3157 Use enum class rather than enum for mask_allocation_policy
- PR #3232 Port NVStrings datetime conversion to cudf strings column
- PR #3136 Define and implement new transpose API
- PR #3237 Define and implement new transform APIs
- PR #3245 Move binaryop files to legacy
Expand All @@ -113,13 +122,19 @@
- PR #3294 Update to arrow-cpp and pyarrow 0.15.1
- PR #3310 Add `row_hasher` and `element_hasher` utilities
- PR #3286 Clean up the starter code on README
- PR #3322 Port NVStrings pad operations to cudf strings column
- PR #3345 Add cache member for number of characters in string_view class
- PR #3299 Define and implement new `is_sorted` APIs
- PR #3328 Partition by stripes in dask_cudf ORC reader
- PR #3243 Use upstream join code in dask_cudf
- PR #3371 Add `select` method to `table_view`
- PR #3309 Add java and JNI bindings for search bounds
- PR #3380 Concatenate columns of strings
- PR #3382 Add fill function for strings column
- PR #3391 Move device_atomics_tests.cu files to legacy
- PR #3387 Strings column gather function
- PR #3389 Move quantiles.hpp + group_quantiles.hpp files to legacy
- PR #3398 Move reshape.hpp files to legacy
- PR #3201 Define and implement new datetime_ops APIs

## Bug Fixes
Expand Down Expand Up @@ -159,15 +174,23 @@
- PR #3318 Revert arrow to 0.15.0 temporarily to unblock downstream projects CI
- PR #3317 Fix index-argument bug in dask_cudf parquet reader
- PR #3323 Fix `insert` non-assert test case
- PR #3341 Fix `Series` constructor converting NoneType to "None"
- PR #3341 Fix `Series` constructor converting NoneType to "None"
- PR #3326 Fix and test for detail::gather map iterator type inference
- PR #3334 Remove zero-size exception check from make_strings_column factories
- PR #3333 Fix compilation issues with `constexpr` functions not marked `__device__`
- PR #3340 Make all benchmarks use cudf base fixture to initialize RMM pool
- PR #3337 Fix Java to pad validity buffers to 64-byte boundary
- PR #3362 Fix `find_and_replace` upcasting series for python scalars and lists
- PR #3357 Disabling `column_view` iterators for non fixed-width types
- PR #3383 Fix : properly compute null counts for rolling_window.
- PR #3386 Removing external includes from `column_view.hpp`
- PR #3369 Add write_partition to dask_cudf to fix to_parquet bug
- PR #3388 Support getitem with bools when DataFrame has a MultiIndex
- PR #3408 Fix String and Column (De-)Serialization
- PR #3372 Fix dask-distributed scatter_by_map bug
- PR #3419 Fix a bug in parse_into_parts (incomplete input causing walking past the end of string).
- PR #3413 Fix dask_cudf read_csv file-list bug
- PR #3416 Fix memory leak in ColumnVector when pulling strings off the GPU


# cuDF 0.10.0 (16 Oct 2019)
Expand Down
2 changes: 2 additions & 0 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ test:
- test -f $PREFIX/lib/libcudftestutil.a
- test -f $PREFIX/include/cudf/legacy/bitmask.hpp
- test -f $PREFIX/include/cudf/legacy/column.hpp
- test -f $PREFIX/include/cudf/legacy/reshape.hpp
- test -f $PREFIX/include/cudf/legacy/table.hpp
- test -f $PREFIX/include/cudf/utilities/legacy/nvcategory_util.hpp
- test -f $PREFIX/include/cudf/utilities/legacy/type_dispatcher.hpp
Expand All @@ -72,6 +73,7 @@ test:
- test -f $PREFIX/include/cudf/legacy/merge.hpp
- test -f $PREFIX/include/cudf/legacy/join.hpp
- test -f $PREFIX/include/cudf/legacy/predicates.hpp
- test -f $PREFIX/include/cudf/legacy/quantiles.hpp
- test -f $PREFIX/include/cudf/legacy/reduction.hpp
- test -f $PREFIX/include/cudf/legacy/replace.hpp
- test -f $PREFIX/include/cudf/legacy/rolling.hpp
Expand Down
42 changes: 26 additions & 16 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -401,8 +401,8 @@ add_library(cudf
src/datetime/datetime_ops.cu
src/datetime/datetime_util.cpp
src/hash/legacy/hashing.cu
src/quantiles/quantiles.cu
src/quantiles/group_quantiles.cu
src/quantiles/legacy/quantiles.cu
src/quantiles/legacy/group_quantiles.cu
src/reductions/legacy/reductions.cu
src/reductions/legacy/min.cu
src/reductions/legacy/max.cu
Expand All @@ -417,8 +417,8 @@ add_library(cudf
src/reductions/legacy/group_std.cu
src/reductions/legacy/scan.cu
src/replace/legacy/replace.cu
src/replace/replace.cu
src/reshape/stack.cu
src/replace/replace.cu
src/reshape/legacy/stack.cu
src/transpose/transpose.cu
src/transpose/legacy/transpose.cu
src/merge/legacy/merge.cu
Expand Down Expand Up @@ -465,6 +465,7 @@ add_library(cudf
src/utilities/nvtx/nvtx_utils.cpp
src/utilities/nvtx/legacy/nvtx_utils.cpp
src/copying/copy.cpp
src/copying/copy.cu
src/copying/slice.cpp
src/copying/split.cpp
src/copying/legacy/copy.cpp
Expand All @@ -478,6 +479,7 @@ add_library(cudf
src/filling/legacy/repeat.cu
src/filling/legacy/tile.cu
src/search/legacy/search.cu
src/search/search.cu
src/column/column.cu
src/column/column_view.cpp
src/column/column_device_view.cu
Expand All @@ -488,23 +490,31 @@ add_library(cudf
src/bitmask/null_mask.cu
src/sort/sort.cu
src/column/legacy/interop.cpp
src/strings/strings_column_factories.cu
src/strings/strings_scalar_factories.cpp
src/strings/strings_column_view.cu
src/strings/utilities.cu
src/strings/attributes.cu
src/strings/copying/copying.cu
src/strings/sorting/sorting.cu
src/strings/substring.cu
src/strings/combine.cu
src/strings/char_types/char_types.cu
src/strings/case.cu
src/strings/find.cu
src/strings/convert/convert_integers.cu
src/strings/char_types/char_types.cu
src/strings/combine.cu
src/strings/convert/convert_booleans.cu
src/strings/convert/convert_datetime.cu
src/strings/convert/convert_floats.cu
src/strings/convert/convert_integers.cu
src/strings/copying/concatenate.cu
src/strings/copying/copying.cu
src/strings/find.cu
src/strings/filling/fill.cu
src/strings/padding.cu
src/strings/sorting/sorting.cu
src/strings/strings_column_factories.cu
src/strings/strings_column_view.cu
src/strings/strings_scalar_factories.cpp
src/strings/substring.cu
src/strings/utilities.cu
src/scalar/scalar.cpp
src/scalar/scalar_factories.cpp)
src/scalar/scalar_factories.cpp
src/groupby/groupby.cu
src/groupby/hash/groupby.cu
src/groupby/sort/groupby.cu
src/aggregation/aggregation.cpp)

# Rename installation to proper names for later finding
set_target_properties(libNVStrings PROPERTIES OUTPUT_NAME "NVStrings")
Expand Down
12 changes: 6 additions & 6 deletions cpp/benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -96,15 +96,15 @@ ConfigureBench(TYPE_DISPATCHER_BENCH "${TD_BENCH_SRC}")
###################################################################################################
# - quantiles benchmark ---------------------------------------------------------------------------

set(QUANTILES_BENCH_SRC
"${CMAKE_CURRENT_SOURCE_DIR}/quantiles/group_quantiles_benchmark.cu")
set(LEGACY_QUANTILES_BENCH_SRC
"${CMAKE_CURRENT_SOURCE_DIR}/quantiles/legacy/group_quantiles_benchmark.cu")

ConfigureBench(QUANTILES_BENCH "${QUANTILES_BENCH_SRC}")
ConfigureBench(LEGACY_QUANTILES_BENCH "${LEGACY_QUANTILES_BENCH_SRC}")

###################################################################################################
# - stack benchmark -------------------------------------------------------------------------------

set(STACK_BENCH_SRC
"${CMAKE_CURRENT_SOURCE_DIR}/reshape/stack_benchmark.cu")
set(LEGACY_STACK_BENCH_SRC
"${CMAKE_CURRENT_SOURCE_DIR}/reshape/legacy/stack_benchmark.cu")

ConfigureBench(STACK_BENCH "${STACK_BENCH_SRC}")
ConfigureBench(LEGACY_STACK_BENCH "${LEGACY_STACK_BENCH_SRC}")
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

#include <tests/utilities/legacy/column_wrapper.cuh>

#include <cudf/quantiles.hpp>
#include <cudf/legacy/quantiles.hpp>
#include <random>

#include <benchmark/benchmark.h>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#include <tests/utilities/legacy/column_wrapper.cuh>
#include <tests/utilities/legacy/column_wrapper_factory.hpp>

#include <cudf/reshape.hpp>
#include <cudf/legacy/reshape.hpp>
#include <cudf/types.h>

#include <benchmarks/fixture/benchmark_fixture.hpp>
Expand Down
5 changes: 3 additions & 2 deletions cpp/custrings/strings/datetime.cu
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,9 @@ struct parse_datetime
DTFormatItem item = items[idx];
int slen = (int)item.length;
//printf("%d:%c=%d\n",(int)fmt.ftype,ch,(int)slen);
if( length < slen ){
return 1;
}
if(item.item_type==false)
{
// consume fmt.len bytes from datetime
Expand All @@ -223,8 +226,6 @@ struct parse_datetime
length -= slen;
continue;
}
if( length < slen )
return 1;

// special logic for each specifier
switch(item.specifier)
Expand Down
8 changes: 4 additions & 4 deletions cpp/custrings/tests/test_datetime.cu
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,22 @@ TEST_F(TestTimestamp, ToTimestamp)
{
{
std::vector<const char*> hstrs{"1974-02-28T01:23:45Z", "2019-07-17T21:34:37Z",
nullptr, "" };
nullptr, "", "1974" };
NVStrings* strs = NVStrings::create_from_array(hstrs.data(),hstrs.size());
thrust::device_vector<unsigned long> results(hstrs.size(),0);
strs->timestamp2long("%Y-%m-%dT%H:%M:%SZ", NVStrings::seconds, results.data().get());
int expected[] = { 131246625, 1563399277, 0,0 };
int expected[] = { 131246625, 1563399277, 0, 0, 0 };
for( int idx = 0; idx < (int) hstrs.size(); ++idx )
EXPECT_EQ((int)results[idx],expected[idx]);
NVStrings::destroy(strs);
}

{
std::vector<const char*> hstrs{"12.28.1982", "07.17.2019" };
std::vector<const char*> hstrs{"12.28.1982", "07.17.2019", "06" };
NVStrings* strs = NVStrings::create_from_array(hstrs.data(),hstrs.size());
thrust::device_vector<unsigned long> results(hstrs.size(),0);
strs->timestamp2long("%m-%d-%Y", NVStrings::days, results.data().get());
int expected[] = { 4744, 18094 };
int expected[] = { 4744, 18094, 0 };
for( int idx = 0; idx < (int) hstrs.size(); ++idx )
EXPECT_EQ((int)results[idx],expected[idx]);
NVStrings::destroy(strs);
Expand Down
68 changes: 68 additions & 0 deletions cpp/include/cudf/aggregation.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
* Copyright (c) 2019, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cudf/types.hpp>

#include <memory>
#include <vector>

/**
* @file aggregation.hpp
* @brief Representation for specifying desired aggregations from
* aggregation-based APIs, e.g., groupby, reductions, rolling, etc.
*
* @note Not all aggregation APIs support all aggregation operations. See
* individual function documentation to see what aggregations are supported.
*
*/

namespace cudf {
namespace experimental {
/**
* @brief Base class for abstract representation of an aggregation.
*/
class aggregation;

/// Factory to create a SUM aggregation
std::unique_ptr<aggregation> make_sum_aggregation();

/// Factory to create a MIN aggregation
std::unique_ptr<aggregation> make_min_aggregation();

/// Factory to create a MAX aggregation
std::unique_ptr<aggregation> make_max_aggregation();

/// Factory to create a COUNT aggregation
std::unique_ptr<aggregation> make_count_aggregation();

/// Factory to create a MEAN aggregation
std::unique_ptr<aggregation> make_mean_aggregation();

/// Factory to create a MEDIAN aggregation
std::unique_ptr<aggregation> make_median_aggregation();

/**
* @brief Factory to create a QUANTILE aggregation
*
* @param quantiles The desired quantiles
* @param interpolation The desired interpolation
*/
std::unique_ptr<aggregation> make_quantile_aggregation(
std::vector<double> const& q, interpolation i);
} // namespace experimental
} // namespace cudf
22 changes: 22 additions & 0 deletions cpp/include/cudf/copying.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,5 +173,27 @@ std::vector<column_view> slice(column_view const& input,
std::vector<column_view> split(column_view const& input,
std::vector<size_type> const& splits);

/**
* @brief Returns a new column, where each element is selected from either @p lhs or
* @p rhs based on the value of the corresponding element in @p boolean_mask
*
* Selects each element i in the output column from either @p rhs or @p lhs using the following rule:
* output[i] = (boolean_mask[i]) ? lhs[i] : rhs[i]
*
* @throws cudf::logic_error if lhs and rhs are not of the same type
* @throws cudf::logic_error if lhs and rhs are not of the same length
* @throws cudf::logic_error if boolean_mask contains nulls
* @throws cudf::logic_error if boolean mask is not of type bool8
* @throws cudf::logic_error if boolean mask is not of the same length as lhs and rhs
* @param[in] left-hand column_view
* @param[in] right-hand column_view
* @param[in] Non-nullable column of `BOOL8` elements that control selection from `lhs` or `rhs`
* @param[in] mr resource for allocating device memory
*
* @returns new column with the selected elements
*/
std::unique_ptr<column> copy_if_else(column_view const& lhs, column_view const& rhs, column_view const& boolean_mask,
rmm::mr::device_memory_resource *mr = rmm::mr::get_default_resource());

} // namespace experimental
} // namespace cudf
Loading

0 comments on commit e2537d5

Please sign in to comment.