Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update apache arrow to include TensorFlow fix #2345

Merged
merged 9 commits into from
Jul 6, 2018

Conversation

pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Jul 4, 2018

This is intended to fix #2159.

@robertnishihara
Copy link
Collaborator

There may be some plasma hashing related issues. We'll see when the tests run.

@robertnishihara
Copy link
Collaborator

#2319 also updates the Arrow commit. Can you take a look?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6462/
Test PASSed.

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Jul 4, 2018

Yeah, the changes in the two PRs are mostly orthogonal (except updating the hash of course); this one is mainly incorporating the plasma API changes and the other one doesn't include that commit.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6463/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6471/
Test PASSed.

@guoyuhong
Copy link
Contributor

Thanks, @pcmoritz for including the scope enum change in Arrow. It looks like this commit does not contains my latest PR of adding deleting a list of objects. That's fine I will do the change later.

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Jul 5, 2018

@guoyuhong Yeah, I started the PR before yours got merged; I'll probably update it before this PR gets merged, but before that happens, I first need to fix the non-determinism problem.

@@ -88,7 +88,8 @@ if [[ ! -d $TP_DIR/../python/ray/pyarrow_files/pyarrow || \
-DARROW_PYTHON=on \
-DARROW_PLASMA=on \
-DPLASMA_PYTHON=on \
-DARROW_JEMALLOC=off \
-DARROW_JEMALLOC=on \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably won't change anything, but we should test to see if this has any performance implications.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6484/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6490/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6503/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6505/
Test FAILed.

git checkout ce23c06469de9cf0c3e38e35cdb8d135f341b964
git checkout d5d39f770047d671e4879369dd680c69afc370c3

git apply $TP_DIR/scripts/arrow-zero-fill.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone want to update the arrow package, will this patch break the building process?

@guoyuhong
Copy link
Contributor

FYI. My PR which Robert mentioned is used to solve the parquet building error and make it friendly when we update the Arrow package.

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Jul 6, 2018

@guoyuhong The patch is just a stop gap solution we are using until apache/arrow#2216 is fixed (hopefully soon, but it's hard to tell and we would like to get a release candidate for 0.5 really soon so we can start the testing), then this patch will go away again. It should be independent of your PR.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6509/
Test FAILed.

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Jul 6, 2018

retest this please

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Jul 6, 2018

Performance implications:

Without this PR:

 In [13]: d = 2000 * [{"hello": 100*["world"], "this": 100*[4234], "array": np.zeros(100)}]

In [14]: %timeit ray.put(d)
10 loops, best of 3: 96.8 ms per loop

With this PR:

In [7]: %timeit ray.put(d)
10 loops, best of 3: 101 ms per loop

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6515/
Test PASSed.

@robertnishihara
Copy link
Collaborator

Test failures seem unrelated (due to a Travis rebasing issue I think).

@robertnishihara robertnishihara merged commit fbde8ca into ray-project:master Jul 6, 2018
@robertnishihara robertnishihara deleted the update-arrow-2 branch July 6, 2018 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault when importing TensorFlow after Ray.
4 participants