Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
86e7e5b
run pyarrow tests in manylinux CI build
kszucs Jan 25, 2019
97fe94b
fix requirements-test.txt path
kszucs Jan 25, 2019
00d98e0
try to bundle bytecode files
kszucs Jan 28, 2019
dbf5b1c
embed precompiled bitcode as char array; load precompiled IR from string
kszucs Feb 15, 2019
f2205d0
gandiva jni
kszucs Feb 15, 2019
19200c3
don't run wheel tests twice; cmake format
kszucs Feb 15, 2019
cb69625
silent maven download msgs
kszucs Feb 15, 2019
169f43a
build gandiva in cpp docker image
kszucs Feb 15, 2019
b0b1117
conda llvmdev
kszucs Feb 15, 2019
d480c83
use const string ref for now
kszucs Feb 15, 2019
42391b1
don't pass precompiled bitcode all around the constructors
kszucs Feb 18, 2019
5841dcd
remove commented code
kszucs Feb 18, 2019
fa19529
properly construct llvm:StringRef
kszucs Feb 19, 2019
7deb359
fix REGEX; remove byteCodeFilePath from java configuration object
kszucs Feb 19, 2019
a88cd37
cmake format
kszucs Feb 19, 2019
af78be2
don't bundle irhelpers in the jar
kszucs Feb 19, 2019
79abc0e
mark .cc file as generated
kszucs Feb 19, 2019
09d829a
build wheels for a single python distribution at a time; adjust travi…
kszucs Feb 19, 2019
5d75adb
initialize jni loader
kszucs Feb 19, 2019
0d29b31
remove not existing resources from gandiva's pom
kszucs Feb 19, 2019
d3cb058
fix wheel building script
kszucs Feb 19, 2019
54fc653
don't export
kszucs Feb 19, 2019
0bb07a7
better bash split
kszucs Feb 19, 2019
18c5488
install wheel from dist dir
kszucs Feb 19, 2019
7c88d61
only build python 3.6 wheel
kszucs Feb 19, 2019
3e4ec2a
unterminated llvm:MemoryBuffer; fix check_import.py path
kszucs Feb 19, 2019
2372f3d
fix requirements path; disable other CI tests
kszucs Feb 19, 2019
71233c7
test 2.7,16 wheel
kszucs Feb 19, 2019
b399496
test py27mu and py36m wheels
kszucs Feb 20, 2019
1aa19f1
reenable travis builds
kszucs Feb 20, 2019
d5531d9
test imports inside the wheel container
kszucs Feb 20, 2019
fd5e3fe
use latest docker image tag
kszucs Feb 20, 2019
c573a56
use env variables insude the container
kszucs Feb 20, 2019
3b1da30
use sudo
kszucs Feb 21, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,8 @@ matrix:
- $TRAVIS_BUILD_DIR/ci/travis_script_python.sh 3.6
- name: "[manylinux1] Python"
language: cpp
env:
- PYTHON_VERSIONS="2.7,32 3.6,16"
before_script:
- if [ $ARROW_CI_PYTHON_AFFECTED == "1" ]; then docker pull quay.io/xhochy/arrow_manylinux1_x86_64_base:llvm-7-manylinux1; fi
script:
Expand Down
2 changes: 2 additions & 0 deletions ci/docker_build_cpp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ cmake -GNinja \
-DARROW_PARQUET=${ARROW_PARQUET:-ON} \
-DARROW_HDFS=${ARROW_HDFS:-OFF} \
-DARROW_PYTHON=${ARROW_PYTHON:-OFF} \
-DARROW_GANDIVA=${ARROW_GANDIVA:-OFF} \
-DARROW_GANDIVA_JAVA=${ARROW_GANDIVA_JAVA:-OFF} \
-DARROW_BUILD_TESTS=${ARROW_BUILD_TESTS:-OFF} \
-DARROW_BUILD_UTILITIES=${ARROW_BUILD_UTILITIES:-ON} \
-DARROW_INSTALL_NAME_RPATH=${ARROW_INSTALL_NAME_RPATH:-ON} \
Expand Down
2 changes: 2 additions & 0 deletions ci/travis_script_gandiva_java.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ set -e
source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh
JAVA_DIR=${TRAVIS_BUILD_DIR}/java

export MAVEN_OPTS="-Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn"

pushd $JAVA_DIR

# build with gandiva profile
Expand Down
57 changes: 46 additions & 11 deletions ci/travis_script_manylinux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,58 @@

set -ex

pushd python/manylinux1
docker run --shm-size=2g --rm -e PYARROW_PARALLEL=3 -v $PWD:/io -v $PWD/../../:/arrow quay.io/xhochy/arrow_manylinux1_x86_64_base:llvm-7-manylinux1 /io/build_arrow.sh

# Testing for https://issues.apache.org/jira/browse/ARROW-2657
# These tests cannot be run inside of the docker container, since TensorFlow
# does not run on manylinux1

source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh

source $TRAVIS_BUILD_DIR/ci/travis_install_conda.sh

PYTHON_VERSION=3.6
CONDA_ENV_DIR=$TRAVIS_BUILD_DIR/pyarrow-test-$PYTHON_VERSION
pushd python/manylinux1

cat << EOF > check_imports.py
import sys
import pyarrow
import pyarrow.orc
import pyarrow.parquet
import pyarrow.plasma
import tensorflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we crashy when Tensorflow is loaded in the same process?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is supposed to test that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not crash, there is some special safeaguarding code for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it doesn't always work, see https://issues.apache.org/jira/browse/ARROW-4272

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So shouldn't we remove this import?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should We? If We want to test that both arrow and tensorflow can live in the same interperter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you prefer... but it will probably fail at one point or the other, because Tensorflow publishes rogue wheels.


if sys.version_info.major > 2:
import pyarrow.gandiva
EOF

for PYTHON_TUPLE in ${PYTHON_VERSIONS}; do
IFS="," read PYTHON_VERSION UNICODE_WIDTH <<< $PYTHON_TUPLE

# cleanup the artifact directory, docker writes it as root
sudo rm -rf dist

# build the wheels
docker run --shm-size=2g --rm \
-e PYARROW_PARALLEL=3 \
-e PYTHON_VERSION=$PYTHON_VERSION \
-e UNICODE_WIDTH=$UNICODE_WIDTH \
-v $PWD:/io \
-v $PWD/../../:/arrow \
quay.io/xhochy/arrow_manylinux1_x86_64_base:latest \
/io/build_arrow.sh

# create a testing conda environment
CONDA_ENV_DIR=$TRAVIS_BUILD_DIR/pyarrow-test-$PYTHON_VERSION
conda create -y -q -p $CONDA_ENV_DIR python=$PYTHON_VERSION
conda activate $CONDA_ENV_DIR

# install the produced wheels
pip install tensorflow
pip install dist/*.whl

# Test optional dependencies and the presence of tensorflow
python check_imports.py

conda create -y -q -p $CONDA_ENV_DIR python=$PYTHON_VERSION
conda activate $CONDA_ENV_DIR
# Install test dependencies and run pyarrow tests
pip install -r ../requirements-test.txt
pytest --pyargs pyarrow
done

pip install -q tensorflow
pip install "dist/`ls dist/ | grep cp36`"
python -c "import pyarrow; import tensorflow"
popd
1 change: 1 addition & 0 deletions cpp/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ RUN arrow/ci/docker_install_conda.sh && \

ENV CC=gcc \
CXX=g++ \
ARROW_GANDIVA=OFF \
ARROW_BUILD_TESTS=ON \
ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX \
ARROW_HOME=$CONDA_PREFIX \
Expand Down
19 changes: 8 additions & 11 deletions cpp/src/gandiva/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,14 @@ add_dependencies(gandiva-all gandiva gandiva-tests gandiva-benchmarks)

find_package(LLVM)

# Set the path where the byte-code files will be installed.
set(GANDIVA_BC_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}/gandiva)
# Set the path where the bitcode file generated, see precompiled/CMakeLists.txt
set(GANDIVA_PRECOMPILED_BC_PATH "${CMAKE_CURRENT_BINARY_DIR}/irhelpers.bc")
set(GANDIVA_PRECOMPILED_CC_PATH "${CMAKE_CURRENT_BINARY_DIR}/precompiled_bitcode.cc")
set(GANDIVA_PRECOMPILED_CC_IN_PATH
"${CMAKE_CURRENT_SOURCE_DIR}/precompiled_bitcode.cc.in")

set(GANDIVA_BC_FILE_NAME irhelpers.bc)
set(GANDIVA_BC_INSTALL_PATH "${GANDIVA_BC_INSTALL_DIR}/${GANDIVA_BC_FILE_NAME}")
set(GANDIVA_BC_OUTPUT_PATH "${CMAKE_CURRENT_BINARY_DIR}/${GANDIVA_BC_FILE_NAME}")
install(FILES ${GANDIVA_BC_OUTPUT_PATH} DESTINATION ${GANDIVA_BC_INSTALL_DIR})

set(BC_FILE_PATH_CC "${CMAKE_CURRENT_BINARY_DIR}/bc_file_path.cc")
configure_file(bc_file_path.cc.in ${BC_FILE_PATH_CC})
add_definitions(-DGANDIVA_BYTE_COMPILE_FILE_PATH="${GANDIVA_BC_OUTPUT_PATH}")
# add_arrow_lib will look for this not yet existing file, so flag as generated
set_source_files_properties(${GANDIVA_PRECOMPILED_CC_PATH} PROPERTIES GENERATED TRUE)

set(SRC_FILES
annotator.cc
Expand Down Expand Up @@ -73,7 +70,7 @@ set(SRC_FILES
selection_vector.cc
tree_expr_builder.cc
to_date_holder.cc
${BC_FILE_PATH_CC})
${GANDIVA_PRECOMPILED_CC_PATH})

set(GANDIVA_SHARED_PRIVATE_LINK_LIBS arrow_shared LLVM::LLVM_INTERFACE ${RE2_LIBRARY})

Expand Down
5 changes: 2 additions & 3 deletions cpp/src/gandiva/configuration.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,11 @@ const std::shared_ptr<Configuration> ConfigurationBuilder::default_configuration
InitDefaultConfig();

std::size_t Configuration::Hash() const {
boost::hash<std::string> string_hash;
return string_hash(byte_code_file_path_);
return 0; // dummy for now, no configuration properties yet
}

bool Configuration::operator==(const Configuration& other) const {
return other.byte_code_file_path() == byte_code_file_path();
return true; // always true, no configuration properties yet
}

bool Configuration::operator!=(const Configuration& other) const {
Expand Down
24 changes: 2 additions & 22 deletions cpp/src/gandiva/configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,6 @@

namespace gandiva {

GANDIVA_EXPORT
extern const char kByteCodeFilePath[];

class ConfigurationBuilder;
/// \brief runtime config for gandiva
///
Expand All @@ -38,17 +35,9 @@ class GANDIVA_EXPORT Configuration {
public:
friend class ConfigurationBuilder;

const std::string& byte_code_file_path() const { return byte_code_file_path_; }

std::size_t Hash() const;
bool operator==(const Configuration& other) const;
bool operator!=(const Configuration& other) const;

private:
explicit Configuration(const std::string& byte_code_file_path)
: byte_code_file_path_(byte_code_file_path) {}

const std::string byte_code_file_path_;
};

/// \brief configuration builder for gandiva
Expand All @@ -57,15 +46,8 @@ class GANDIVA_EXPORT Configuration {
/// to override specific values and build a custom instance
class GANDIVA_EXPORT ConfigurationBuilder {
public:
ConfigurationBuilder() : byte_code_file_path_(kByteCodeFilePath) {}

ConfigurationBuilder& set_byte_code_file_path(const std::string& byte_code_file_path) {
byte_code_file_path_ = byte_code_file_path;
return *this;
}

std::shared_ptr<Configuration> build() {
std::shared_ptr<Configuration> configuration(new Configuration(byte_code_file_path_));
std::shared_ptr<Configuration> configuration(new Configuration());
return configuration;
}

Expand All @@ -74,10 +56,8 @@ class GANDIVA_EXPORT ConfigurationBuilder {
}

private:
std::string byte_code_file_path_;

static std::shared_ptr<Configuration> InitDefaultConfig() {
std::shared_ptr<Configuration> configuration(new Configuration(kByteCodeFilePath));
std::shared_ptr<Configuration> configuration(new Configuration());
return configuration;
}

Expand Down
19 changes: 12 additions & 7 deletions cpp/src/gandiva/engine.cc
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@

namespace gandiva {

extern const char kPrecompiledBitcode[];
extern const size_t kPrecompiledBitcodeSize;

std::once_flag init_once_flag;

bool Engine::init_once_done_ = false;
Expand Down Expand Up @@ -114,7 +117,7 @@ Status Engine::Make(std::shared_ptr<Configuration> config,
// Add mappings for functions that can be accessed from LLVM/IR module.
engine_obj->AddGlobalMappings();

auto status = engine_obj->LoadPreCompiledIRFiles(config->byte_code_file_path());
auto status = engine_obj->LoadPreCompiledIR();
ARROW_RETURN_NOT_OK(status);

// Add decimal functions
Expand All @@ -126,14 +129,16 @@ Status Engine::Make(std::shared_ptr<Configuration> config,
}

// Handling for pre-compiled IR libraries.
Status Engine::LoadPreCompiledIRFiles(const std::string& byte_code_file_path) {
Status Engine::LoadPreCompiledIR() {
auto bitcode = llvm::StringRef(kPrecompiledBitcode, kPrecompiledBitcodeSize);

/// Read from file into memory buffer.
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> buffer_or_error =
llvm::MemoryBuffer::getFile(byte_code_file_path);
ARROW_RETURN_IF(
!buffer_or_error,
Status::CodeGenError("Could not load module from IR ", byte_code_file_path, ": ",
buffer_or_error.getError().message()));
llvm::MemoryBuffer::getMemBuffer(bitcode, "precompiled", false);

ARROW_RETURN_IF(!buffer_or_error,
Status::CodeGenError("Could not load module from IR: ",
buffer_or_error.getError().message()));

std::unique_ptr<llvm::MemoryBuffer> buffer = move(buffer_or_error.get());

Expand Down
5 changes: 3 additions & 2 deletions cpp/src/gandiva/engine.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,9 @@ class GANDIVA_EXPORT Engine {

llvm::ExecutionEngine& execution_engine() { return *execution_engine_.get(); }

/// load pre-compiled IR modules and merge them into the main module.
Status LoadPreCompiledIRFiles(const std::string& byte_code_file_path);
/// load pre-compiled IR modules from precompiled_bitcode.cc and merge them into
/// the main module.
Status LoadPreCompiledIR();

// Create and add mappings for cpp functions that can be accessed from LLVM.
void AddGlobalMappings();
Expand Down
8 changes: 0 additions & 8 deletions cpp/src/gandiva/jni/config_builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -34,16 +34,8 @@ using gandiva::ConfigurationBuilder;
JNIEXPORT jlong JNICALL
Java_org_apache_arrow_gandiva_evaluator_ConfigurationBuilder_buildConfigInstance(
JNIEnv* env, jobject configuration) {
jstring byte_code_file_path =
(jstring)env->CallObjectMethod(configuration, byte_code_accessor_method_id_, 0);
ConfigurationBuilder configuration_builder;
if (byte_code_file_path != nullptr) {
const char* byte_code_file_path_cpp = env->GetStringUTFChars(byte_code_file_path, 0);
configuration_builder.set_byte_code_file_path(byte_code_file_path_cpp);
env->ReleaseStringUTFChars(byte_code_file_path, byte_code_file_path_cpp);
}
std::shared_ptr<Configuration> config = configuration_builder.build();
env->DeleteLocalRef(byte_code_file_path);
return ConfigHolder::MapInsert(config);
}

Expand Down
3 changes: 0 additions & 3 deletions cpp/src/gandiva/jni/env_helper.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,4 @@
// class references
extern jclass configuration_builder_class_;

// method references
extern jmethodID byte_code_accessor_method_id_;

#endif // ENV_HELPER_H
6 changes: 0 additions & 6 deletions cpp/src/gandiva/jni/jni_common.cc
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ static jint JNI_VERSION = JNI_VERSION_1_6;

// extern refs - initialized for other modules.
jclass configuration_builder_class_;
jmethodID byte_code_accessor_method_id_;

// refs for self.
static jclass gandiva_exception_;
Expand All @@ -93,11 +92,6 @@ jint JNI_OnLoad(JavaVM* vm, void* reserved) {
gandiva_exception_ = (jclass)env->NewGlobalRef(localExceptionClass);
env->DeleteLocalRef(localExceptionClass);

const char method_name[] = "getByteCodeFilePath";
const char return_type[] = "()Ljava/lang/String;";
byte_code_accessor_method_id_ =
env->GetMethodID(configuration_builder_class_, method_name, return_type);

env->ExceptionDescribe();

return JNI_VERSION;
Expand Down
26 changes: 23 additions & 3 deletions cpp/src/gandiva/precompiled/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,32 @@ foreach(SRC_FILE ${PRECOMPILED_SRCS})
endforeach()

# link all of the bitcode files into a single bitcode file.
add_custom_command(OUTPUT ${GANDIVA_BC_OUTPUT_PATH}
COMMAND ${LLVM_LINK_EXECUTABLE} -o ${GANDIVA_BC_OUTPUT_PATH}
add_custom_command(OUTPUT ${GANDIVA_PRECOMPILED_BC_PATH}
COMMAND ${LLVM_LINK_EXECUTABLE} -o ${GANDIVA_PRECOMPILED_BC_PATH}
${BC_FILES}
DEPENDS ${BC_FILES})

add_custom_target(precompiled ALL DEPENDS ${GANDIVA_BC_OUTPUT_PATH})
# write a cmake script to replace precompiled bitcode file's content into a .cc file
file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/WritePrecompiledCC.cmake" "\
file(READ \"${GANDIVA_PRECOMPILED_BC_PATH}\" HEXCONTENT HEX)
string(REGEX REPLACE
\"([0-9a-f][0-9a-f])\"
\"'\\\\\\\\x\\\\1',\"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't even mention :)

GANDIVA_PRECOMPILED_BITCODE_CHARS
\"\${HEXCONTENT}\")
configure_file(${GANDIVA_PRECOMPILED_CC_IN_PATH}
${GANDIVA_PRECOMPILED_CC_PATH})
")

# add the previous command to the execution chain
add_custom_command(OUTPUT ${GANDIVA_PRECOMPILED_CC_PATH}
COMMAND ${CMAKE_COMMAND} -P
"${CMAKE_CURRENT_BINARY_DIR}/WritePrecompiledCC.cmake"
DEPENDS ${GANDIVA_PRECOMPILED_CC_IN_PATH}
${GANDIVA_PRECOMPILED_BC_PATH})

add_custom_target(precompiled ALL
DEPENDS ${GANDIVA_PRECOMPILED_BC_PATH} ${GANDIVA_PRECOMPILED_CC_PATH})

function(add_precompiled_unit_test REL_TEST_NAME)
get_filename_component(TEST_NAME ${REL_TEST_NAME} NAME_WE)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,12 @@
// specific language governing permissions and limitations
// under the License.

#include <string>

namespace gandiva {

// Path to the byte-code file.
extern const char kByteCodeFilePath[] = "${GANDIVA_BC_INSTALL_PATH}";
// Content of precompiled bitcode file.
extern const char kPrecompiledBitcode[] = { ${GANDIVA_PRECOMPILED_BITCODE_CHARS} };
extern const size_t kPrecompiledBitcodeSize = sizeof(kPrecompiledBitcode);

} // namespace gandiva
15 changes: 0 additions & 15 deletions cpp/src/gandiva/tests/projector_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -84,21 +84,6 @@ TEST_F(TestProjector, TestProjectCache) {
&cached_projector);
ASSERT_OK(status);
EXPECT_EQ(cached_projector, projector);

// if configuration is different, should return a new projector.

// build a new path by replacing the first '/' with '//'
std::string alt_path(GANDIVA_BYTE_COMPILE_FILE_PATH);
auto pos = alt_path.find('/', 0);
EXPECT_NE(pos, std::string::npos);
alt_path.replace(pos, 1, "//");
auto other_configuration =
ConfigurationBuilder().set_byte_code_file_path(alt_path).build();
std::shared_ptr<Projector> should_be_new_projector2;
status = Projector::Make(schema, {sum_expr, sub_expr}, other_configuration,
&should_be_new_projector2);
ASSERT_OK(status);
EXPECT_NE(projector, should_be_new_projector2);
}

TEST_F(TestProjector, TestProjectCacheFieldNames) {
Expand Down
3 changes: 1 addition & 2 deletions cpp/src/gandiva/tests/test_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,7 @@ static ArrayPtr MakeArrowTypeArray(const std::shared_ptr<arrow::DataType>& type,

std::shared_ptr<Configuration> TestConfiguration() {
auto builder = ConfigurationBuilder();
builder.set_byte_code_file_path(GANDIVA_BYTE_COMPILE_FILE_PATH);
return builder.build();
return builder.DefaultConfiguration();
}

} // namespace gandiva
Expand Down
Loading