-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise CONTRIBUTING.md #10644
Revise CONTRIBUTING.md #10644
Conversation
|
||
## Automated Build in Docker Container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted the sections about using gpuCI locally and building Docker containers, because I think both are out of date. The Docker builds were removed in #10069. It is my understanding that the gpuCI scripts have not been an accurate representation of the CI workflow for some time. @rapidsai/ops-codeowners if you have any guidance or input on this, please feel free to weigh in!
- Create the conda development environment `cudf_dev`: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it seem like conda
is required. Many users frequently complain about this and we've gone to great lengths to make it possible to build without conda.
We should have different sections for "Here's how to build with just cmake" and "here's how to build with a conda environment".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I have only built libcudf/cudf outside of rapids-compose once in the last year. I don't think I'm qualified to write/update these docs but I agree with the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrhemstad Could we defer this to a later PR? I do think this would be a significant improvement (and may illuminate some challenges with installing from source as mentioned in those issues), but I think it's out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimum I'd like to see for this PR:
- Try building w/o conda and see what happens. If it works, great! Document that this is possible
- If it doesn't:
- Add some words to this document that states how a conda environment is currently required to satisfy build dependencies (these could be separated based on building libcudf vs cython/cudf python)
- List what those dependencies are
- File an issue to capture what work is needed to avoid requiring a conda environment for building libcudf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I put this information in an awkward place. At first I couldn't decide how to organize this information and then I forgot to change it once I made a decision. I am going to put these dependencies between "Create the build environment" and "Build cuDF from source." The instructions will be the same, calling build.sh
, so we just need to list the dependencies above that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to include a special note about
--cmake-args=\"\-DCUDF_ENABLE_ARROW_S3=OFF\"
? (Or disable that option by default, like the other disabled-by-default I/O features?)
So we have a couple of options:
- We can enable arrow by also installing libcurl4-openssl-dev and updating the
get_arrow.cmake
script to pass-DAWSSDK_SOURCE=AUTO
- Or we can document why we are disabling S3 ( less requirements? )
I am hestiant to change the default value of CUDF_ENABLE_ARROW_S3
as that will be a breaking change to anyone building cudf presuming enabled by default ( users of build.sh
or people directly calling cmake
).
As for why we need to pass AWSSDK_SOURCE=AUTO
that is required for Arrow's build to work correctly when you don't have arrow already installled. This tells it you are okay with it building the dependency from source when it can't find it locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this makes more sense. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrhemstad Any final comments on this before we merge?
|
||
```bash | ||
make test | ||
make test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful if there was an example of how to run just one single test, for c++ and python. I rarely work in the c++ layer and always have to poke around slack to figure out how to run just one c++ test when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I started this PR specifically to address concerns like:
- running a single test
- running a single benchmark
- running all benchmarks
- building docs
However, I realized that all of these topics would fit better in the C++ Developer Guide, Benchmarking Guide, Documentation Guide, and Testing Guide. Would it solve the problem if we cross-linked one or more of those documents here and did corresponding updates to those docs? If so, I'd like to focus this PR specifically on the Contributing Guide since it's so outdated, then follow up with revisions to the other documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also #9251 which has similar ideas for cross-linking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as it makes it in somewhere I'm happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally for leaving it here for better visibility. But doesn't have to include every detail. An example usage followed by a cross link to a detail section might be a a way to go. I also agree it doesn't need to be tackled in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to leave this conversation open since it contains action items for future PRs. However, it would be nice to get a PR approval if you think the rest is good to go.
@jrhemstad @brandon-b-miller I tagged you both for review since you provided the most comments in prior iterations - I'd like to merge this as-is and defer follow-up work to later PRs. This PR gets the formatting to be consistent, which will make the diffs of future changes easier to handle. Thanks! |
@gpucibot merge |
This PR is a breaking change that disables Arrow S3 support by default. Enabling this feature by default has caused build issues for many downstream consumers, all of whom (to my knowledge) manually disable support for this feature. Most commonly, that build error appears as `fatal error: aws/core/Aws.h: No such file or directory`. In my understanding, several downstream consumers of cudf no longer rely on Arrow S3 support from this library and instead get S3 access via fsspec. I am not aware of any users of libcudf who rely on this being enabled by default (or enabled at all). See related issues and discussions: #8617, #11333, #8867, #10644 (comment), NVIDIA/spark-rapids#2827. Build errors caused by this default behavior have also been reported internally. cc: @rjzamora @beckernick @jdye64 @randerzander @robertmaynard @jlowe @quasiben if you have comments following our previous discussion. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) - AJ Schmidt (https://github.com/ajschmidt8) URL: #11470
I have revised the
CONTRIBUTING.md
file to address several pieces that are out of date. I also revised a good portion of the text and updated external references. Finally, I wrapped the lines at 100 characters to align with other Markdown files in the C++ docs. I would prefer to adopt a convention of one sentence per line if reviewers agree, but went with the 100 character wrapping for now to be consistent with other docs.