Skip to content

Commit

Permalink
Merge branch 'branch-0.6' into bld-update-version
Browse files Browse the repository at this point in the history
  • Loading branch information
kkraus14 authored Mar 13, 2019
2 parents 536f63f + 80fbb29 commit 5f9e73f
Show file tree
Hide file tree
Showing 123 changed files with 6,568 additions and 1,832 deletions.
8 changes: 8 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,11 @@
path = thirdparty/rmm
url = https://github.com/rapidsai/rmm.git
branch = branch-0.6
[submodule "thirdparty/dlpack"]
path = thirdparty/dlpack
url = https://github.com/rapidsai/dlpack.git
branch=cudf
[submodule "thirdparty/jitify"]
path = thirdparty/jitify
url = https://github.com/rapidsai/jitify.git
branch = cudf
37 changes: 36 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# cuDF 0.7.0 (Date TBD)

## New Features

...

## Improvements

...

## Bug Fixes

...


# cuDF 0.6.0 (Date TBD)

## New Features
Expand All @@ -16,6 +31,7 @@
- PR #898 Add DataFrame.groupby(level=0) support
- PR #920 Add feather, JSON, HDF5 readers / writers from PyArrow / Pandas
- PR #888 CSV Reader: Add prefix parameter for column names, used when parsing without a header
- PR #913 Add DLPack support: convert between cuDF DataFrame and DLTensor
- PR #939 Add ORC reader from PyArrow
- PR #918 Add Series.groupby(level=0) support
- PR #906 Add binary and comparison ops to DataFrame
Expand All @@ -40,9 +56,16 @@
- PR #1052 Add left/right_index and left/right_on keywords to merge
- PR #1091 Add `indicator=` and `suffixes=` keywords to merge
- PR #1107 Add unsupported keywords to Series.fillna
- PR #1136 Removed `gdf_concat`
- PR #1153 Added function for getting the padded allocation size for valid bitmask
- PR #1148 Add cudf.sqrt for dataframes and Series
- PR #1159 Add Python bindings for libcudf dlpack functions
- PR #1155 Add __array_ufunc__ for DataFrame and Series for sqrt
- PR #1168 to_frame for series accepts a name argument

## Improvements

- PR #892 Add support for heterogeneous types in binary ops with JIT
- PR #730 Improve performance of `gdf_table` constructor
- PR #561 Add Doxygen style comments to Join CUDA functions
- PR #813 unified libcudf API functions by replacing gpu_ with gdf_
Expand All @@ -59,7 +82,6 @@
- PR #909 CSV Reader: Avoid host->device->host copy for header row data
- PR #916 Improved unit testing and error checking for `gdf_column_concat`
- PR #941 Replace `numpy` call in `Series.hash_encode` with `numba`
- PR #943 Updated `count_nonzero_mask` to return `num_rows` when the mask is null
- PR #942 Added increment/decrement operators for wrapper types
- PR #943 Updated `count_nonzero_mask` to return `num_rows` when the mask is null
- PR #952 Added trait to map C++ type to `gdf_dtype`
Expand All @@ -72,11 +94,15 @@
- PR #1047 Adding gdf_dtype_extra_info to gdf_column_view_augmented
- PR #1054 Added default ctor to SerialTrieNode to overcome Thrust issue in CentOS7 + CUDA10
- PR #1024 CSV Reader: Add support for hexadecimal integers in integral-type columns
- PR #1033 Update `fillna()` to use libcudf function `gdf_replace_nulls`
- PR #1066 Added inplace assignment for columns and select_dtypes for dataframes
- PR #1026 CSV Reader: Change the meaning and type of the quoting parameter to match Pandas
- PR #1100 Adds `CUDF_EXPECTS` error-checking macro
- PR #1092 Fix select_dtype docstring
- PR #1111 Added cudf::table
- PR #1108 Sorting for datetime columns
- PR #1120 Return a `Series` (not a `Column`) from `Series.cat.set_categories()`
- PR #1128 CSV Reader: The last data row does not need to be line terminated

## Bug Fixes

Expand Down Expand Up @@ -115,9 +141,15 @@
- PR #1058 Added support for `DataFrame.loc[scalar]`
- PR #1060 Fix column creation with all valid nan values
- PR #1073 CSV Reader: Fix an issue where a column name includes the return character
- PR #1090 Updating Doxygen Comments
- PR #1080 Fix dtypes returned from loc / iloc because of lists
- PR #1102 CSV Reader: Minor fixes and memory usage improvements
- PR #1174: Fix release script typo
- PR #1137 Add prebuild script for CI
- PR #1118 Enhanced the `DataFrame.from_records()` feature
- PR #1129 Fix join performance with index parameter from using numpy array
- PR #1145 Issue with .agg call on multi-column dataframes
- PR #1167 Fix issue with null_count not being set after inplace fillna()


# cuDF 0.5.1 (05 Feb 2019)
Expand Down Expand Up @@ -165,8 +197,11 @@
- PR #521 Add `assert_eq` function for testing
- PR #502 Simplify Dockerfile for local dev, eliminate old conda/pip envs
- PR #549 Adds `-rdynamic` compiler flag to nvcc for Debug builds
- PR #472 RMM: Created centralized rmm::device_vector alias and rmm::exec_policy
- PR #577 Added external C++ API for scatter/gather functions
- PR #500 Improved the concurrent hash map class to support partitioned (multi-pass) hash table building
- PR #583 Updated `gdf_size_type` to `int`
- PR #500 Improved the concurrent hash map class to support partitioned (multi-pass) hash table building
- PR #617 Added .dockerignore file. Prevents adding stale cmake cache files to the docker container
- PR #658 Reduced `JOIN_TEST` time by isolating overflow test of hash table size computation
- PR #664 Added Debuging instructions to README
Expand Down
2 changes: 1 addition & 1 deletion ci/cpu/cudf/upload-anaconda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,5 @@ if [ "$BUILD_CUDF" == "1" ]; then

echo "Upload"
echo ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u rapidsai ${LABEL_OPTION} --force ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
fi
2 changes: 1 addition & 1 deletion ci/cpu/libcudf/upload-anaconda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ fi

echo "Upload"
echo ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u rapidsai ${LABEL_OPTION} --force ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
11 changes: 11 additions & 0 deletions ci/cpu/prebuild.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env bash

export BUILD_ABI=1
export BUILD_CUDF=1
export BUILD_CFFI=1

if [[ "$PYTHON" == "3.6" ]]; then
export BUILD_LIBCUDF=1
else
export BUILD_LIBCUDF=0
fi
45 changes: 42 additions & 3 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,10 @@ include_directories("${ARROW_INCLUDE_DIR}"
"${CMAKE_SOURCE_DIR}/include"
"${CMAKE_SOURCE_DIR}/src"
"${CMAKE_SOURCE_DIR}/thirdparty/cub"
"${CMAKE_SOURCE_DIR}/thirdparty/jitify"
"${CMAKE_SOURCE_DIR}/thirdparty/moderngpu/src"
"${CMAKE_SOURCE_DIR}/thirdparty/rmm/include"
"${CMAKE_SOURCE_DIR}/thirdparty/dlpack/include"
"${ZLIB_INCLUDE_DIRS}")

###################################################################################################
Expand Down Expand Up @@ -189,6 +191,13 @@ add_library(cudf SHARED
src/groupby/groupby.cu
src/groupby/new_groupby.cu
src/binary/binary_ops.cu
src/binary/jit/code/kernel.cpp
src/binary/jit/code/operation.cpp
src/binary/jit/code/traits.cpp
src/binary/jit/core/binop.cpp
src/binary/jit/core/launcher.cpp
src/binary/jit/util/operator.cpp
src/binary/jit/util/type.cpp
src/bitmask/bitmask_ops.cu
src/bitmask/valid_ops.cu
src/compaction/stream_compaction_ops.cu
Expand All @@ -203,16 +212,45 @@ add_library(cudf SHARED
src/unary/unary_ops.cu
# src/windowed/windowed_ops.cu ... this is broken
src/io/convert/csr/cudf_to_csr.cu
src/io/convert/dlpack/cudf_dlpack.cpp
src/io/csv/csv_reader.cu
src/io/comp/uncomp.cpp
src/io/comp/cpu_unbz2.cpp
src/utilities/cuda_utils.cu
src/utilities/error_utils.cpp
src/utilities/nvtx/nvtx_utils.cpp)
src/utilities/nvtx/nvtx_utils.cpp
src/copying/gather.cu
src/copying/scatter.cu
src/bitmask/legacy_bitmask.cpp)

#Override RPATH for cudf
SET_TARGET_PROPERTIES(cudf PROPERTIES BUILD_RPATH "\$ORIGIN")

###################################################################################################
# - jitify ----------------------------------------------------------------------------------------

# Creates executable stringify and uses it to convert types.h to c-str for use in JIT code
add_executable(stringify "${CMAKE_SOURCE_DIR}/thirdparty/jitify/stringify.cpp")
execute_process(WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_BINARY_DIR}/include)

add_custom_command(OUTPUT ${CMAKE_BINARY_DIR}/include/types.h.jit
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/include
COMMAND ${CMAKE_BINARY_DIR}/stringify cudf/types.h > ${CMAKE_BINARY_DIR}/include/types.h.jit
COMMENT "Run stringify on header types.h to convert it to c-str for use in JIT compiled code"
DEPENDS stringify
MAIN_DEPENDENCY ${CMAKE_CURRENT_SOURCE_DIR}/include/cudf/types.h)

add_custom_target(stringify_run DEPENDS ${CMAKE_BINARY_DIR}/include/types.h.jit)

add_dependencies(cudf stringify_run)

option(JITIFY_PROCESS_CACHE "Use a process level (instead of thread level) cache for JIT compiled kernels" ON)
if(JITIFY_PROCESS_CACHE)
message(STATUS "Using process level cache for JIT compiled kernels")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --define-macro JITIFY_THREAD_SAFE")
endif(JITIFY_PROCESS_CACHE)

###################################################################################################
# - build options ---------------------------------------------------------------------------------

Expand All @@ -234,7 +272,8 @@ endif(HT_LEGACY_ALLOCATOR)
###################################################################################################
# - link libraries --------------------------------------------------------------------------------

target_link_libraries(cudf rmm "${ARROW_LIB}" ${ZLIB_LIBRARIES} NVStrings)
# TODO: better nvrtc linking with optional variables
target_link_libraries(cudf rmm "${ARROW_LIB}" ${ZLIB_LIBRARIES} NVStrings nvrtc)

###################################################################################################
# - python cffi bindings --------------------------------------------------------------------------
Expand Down Expand Up @@ -266,7 +305,7 @@ add_custom_command(OUTPUT INSTALL_PYTHON_CFFI

add_custom_target(install_python DEPENDS cudf rmm_install_python PYTHON_CFFI INSTALL_PYTHON_CFFI)

###################################################################################################
###################################################################################################
# - make documentation ----------------------------------------------------------------------------

add_custom_command(OUTPUT CUDF_DOXYGEN
Expand Down
93 changes: 93 additions & 0 deletions cpp/include/copying.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
/*
* Copyright (c) 2018-2019, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#ifndef COPYING_HPP
#define COPYING_HPP

#include "cudf.h"
#include "types.hpp"

namespace cudf {
/**
* @brief Scatters the rows (including null values) of a set of source columns
* into a set of destination columns.
*
* The two sets of columns must have equal numbers of columns.
*
* Scatters the rows of the source columns into the destination columns
* according to a scatter map such that row "i" from the source columns will be
* scattered to row "scatter_map[i]" in the destination columns.
*
* The datatypes between coresponding columns in the source and destination
* columns must be the same.
*
* The number of elements in the scatter_map must equal the number of rows in
* the source columns.
*
* If any index in scatter_map is outside the range of [0, num rows in
* destination_columns), the result is undefined.
*
* If the same index appears more than once in scatter_map, the result is
* undefined.
*
* @Param[in] source_table The columns whose rows will be scattered
* @Param[in] scatter_map An array that maps rows in the input columns
* to rows in the output columns.
* @Param[out] destination_table A preallocated set of columns with a number
* of rows equal in size to the maximum index contained in scatter_map
*
* @Returns GDF_SUCCESS upon successful completion
*/
void scatter(table const* source_table, gdf_index_type const scatter_map[],
table* destination_table);

/**
* @brief Gathers the rows (including null values) of a set of source columns
* into a set of destination columns.
*
* The two sets of columns must have equal numbers of columns.
*
* Gathers the rows of the source columns into the destination columns according
* to a gather map such that row "i" in the destination columns will contain
* row "gather_map[i]" from the source columns.
*
* The datatypes between coresponding columns in the source and destination
* columns must be the same.
*
* The number of elements in the gather_map must equal the number of rows in the
* destination columns.
*
* If any index in the gather_map is outside the range [0, num rows in
* source_columns), the result is undefined.
*
* If the same index appears more than once in gather_map, the result is
* undefined.
*
* @param[in] source_table The input columns whose rows will be gathered
* @param[in] gather_map An array of indices that maps the rows in the source
* columns to rows in the destination columns.
* @param[out] destination_table A preallocated set of columns with a number
* of rows equal in size to the number of elements in the gather_map that will
* contain the rearrangement of the source columns based on the mapping. Can be
* the same as `source_table` (in-place gather).
*
* @Returns GDF_SUCCESS upon successful completion
*/
void gather(table const* source_table, gdf_index_type const gather_map[],
table* destination_table);
} // namespace cudf

#endif // COPYING_H
Loading

0 comments on commit 5f9e73f

Please sign in to comment.