SPARK-1637: Clean up examples for 1.0#571
Conversation
techaddict
commented
Apr 27, 2014
- Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
- Move Python examples into examples/src/main/python
- Update docs to reflect these changes
….apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
There was a problem hiding this comment.
I'm not sure about this, needed by RawNetworkGrep and KafkaWordCount. There has to be better way.
There was a problem hiding this comment.
We shouldn't use any private methods in examples. The usage of RawTextHelper in KafkaWordCount is very simple. warmUp is used in RawNetworkGrep, is it necessary? @tdas
There was a problem hiding this comment.
Well, the RawTextHelper was essentially the way to provide data for running benchmarks. However, I havent quite maintained it, so that example code has been languishing in unknown state. I am inclined to keep it there as it is, as long as it hidden behind private[*]. Though I dont see the logic behind changing it from private[streaming] to private[spark]
|
Can one of the admins verify this patch? |
|
Jenkins, add to whitelist and test this please |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
@techaddict FYI, I moved main methods to |
|
One question: is |
|
@mengxr ya exactly, thats why i commented on the private[spark] change. I think better way would be to make the examples use there own methods, in case of private methods. |
|
If Apache allows to use a name space not starting with |
|
@mengxr I'm pretty sure we are only allowed to release classes in the org.apache namespace (unfortunately). |
|
I know this is a little weird, but either |
|
@mengxr IMO we should re-implement the necessary private functions in the examples, they are not too many just 2-3 methods. It will make using/modifying the examples easier for users. |
|
I have a general question about what should file and package organization of the examples be. For example, for Spark Streaming, is |
|
@tdas I'm a little confused. Which one do you prefer, |
|
Oops, sorry for the confusion. I meant "latter" instead of "former". That is, I prefer
|
|
Bringing @marmbrus into this discussion as he had added the Spark SQL examples in |
|
No strong opinions here other than we should be consistent. |
|
Okay. As I said earlier, I find it more consistent to keep it as "org.apache.spark.streaming.examples.*" |
|
@mengxr so what changes should i make other than streaming one suggested by tdas. |
|
@techaddict I don't see anything necessary except the one using |
|
@mengxr what do you suggest how should we resolve the private[spark] problem ? I'm short on idea. |
|
Build triggered. |
|
Build started. |
|
I would say using |
|
@mengxr @47ef86c392badc58052a0414115e49c2970b31eb looks good ? |
|
Build finished. All automated tests passed. |
|
All automated tests passed. |
|
@techaddict It looks good to me. |
|
@mateiz ready for merging |
|
No, I have strong feelings against "org.apache.spark.examples.streaming.*" This is entirely inconsistent with rest of the API. |
|
@tdas @techaddict let's not move them IMO. There are two reasons: First, when you browse the folder tree, you want to get into spark/examples and immediately see subfolders, instead of wondering "should I go into spark/examples or spark/streaming or spark/mllib or whatever". Second, it makes it less likely for us to use private[streaming] APIs in there. |
|
BTW the other point of this was to make them consistent across packages, and MLlib and all the Python examples are under examples/ already. |
|
@mateiz So this is good to merge. ?
|
|
@mateiz I dont think that decision is that big a deal. Its fairly obvious if one sees a streaming folder inside spark/ that those are streaming examples. It feels more inconsistent to have |
|
Talked with TD offline and he was okay with changing the package name, so I've merged this in. Thanks Sandeep! |
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - [x] Move Python examples into examples/src/main/python - [x] Update docs to reflect these changes Author: Sandeep <sandeep@techaddict.me> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <matei@databricks.com> Closes #571 from techaddict/SPARK-1637 and squashes the following commits: 47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples 8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples 5f96121 [Sandeep] Move Python examples into examples/src/main/python 0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - [x] Move Python examples into examples/src/main/python - [x] Update docs to reflect these changes Author: Sandeep <sandeep@techaddict.me> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <matei@databricks.com> Closes #571 from techaddict/SPARK-1637 and squashes the following commits: 47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples 8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples 5f96121 [Sandeep] Move Python examples into examples/src/main/python 0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) (cherry picked from commit a000b5c) Signed-off-by: Matei Zaharia <matei@databricks.com>
…e PR apache#571 was last updated. This resulted in Compilation Errors.
…e PR #571 was last updated. This resulted in Compilation Errors. cc @mateiz project not compiling currently. Author: Sandeep <sandeep@techaddict.me> Closes #673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits: b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors.
…e PR #571 was last updated. This resulted in Compilation Errors. cc @mateiz project not compiling currently. Author: Sandeep <sandeep@techaddict.me> Closes #673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits: b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors. (cherry picked from commit fdae095) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
SPARK-1072 Use binary search when needed in RangePartioner Author: Holden Karau <holden@pigscanfly.ca> Closes apache#571 and squashes the following commits: f31a2e1 [Holden Karau] Swith to using CollectionsUtils in Partitioner 4c7a0c3 [Holden Karau] Add CollectionsUtil as suggested by aarondav 7099962 [Holden Karau] Add the binary search to only init once 1bef01d [Holden Karau] CR feedback a21e097 [Holden Karau] Use binary search if we have more than 1000 elements inside of RangePartitioner
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - [x] Move Python examples into examples/src/main/python - [x] Update docs to reflect these changes Author: Sandeep <sandeep@techaddict.me> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <matei@databricks.com> Closes apache#571 from techaddict/SPARK-1637 and squashes the following commits: 47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples 8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples 5f96121 [Sandeep] Move Python examples into examples/src/main/python 0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
…e PR apache#571 was last updated. This resulted in Compilation Errors. cc @mateiz project not compiling currently. Author: Sandeep <sandeep@techaddict.me> Closes apache#673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits: b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR apache#571 was last updated. This resulted in Compilation Errors.
Surfacing decoders on KafkaInputDStream
The new role "install-docker" install docker-ce by using get.docker.com script, it support multiple OS and arch, and keep consistent docker-ce version, mark role "install-docker-ce" as deprecated, and apply new role in all of docker-machine jobs. Update tag of docker-machine job from latest-release to 18.06 in order to avoid confusions. Related-Bug: theopenlab/openlab#308
### What changes were proposed in this pull request? This pr aims to remove the unused private object `RawTextHelper` from `streaming` module. It was introduced in ae61eba and is no longer used after the integration of SPARK-1637 | #571 . ### Why are the changes needed? Code cleanup ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #53635 from LuciferYang/SPARK-54862. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>