Skip to content

[SPARK-1134] Fix and document passing of arguments to IPython #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mateiz
Copy link
Contributor

@mateiz mateiz commented Apr 2, 2014

This is based on @dianacarroll's previous pull request #227, and @JoshRosen's comments on #38. Since we do want to allow passing arguments to IPython, this does the following:

  • It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see Don't run PYTHONSTARTUP file if a file or code is passed ipython/ipython#5226, but no released version has that fix.)
  • If you run pyspark with IPYTHON=1, it passes your command-line arguments to it. This way you can do stuff like IPYTHON=1 bin/pyspark notebook.
  • The old IPYTHON_OPTS remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it.

This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring $@ for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do IPYTHON=1 bin/pyspark myscript.py to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython.

@JoshRosen you should probably take the final call on this.

…only call ipython if no command line arguments were supplied
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13660/

@dianacarroll
Copy link

Well, my perspective is always that of a new easily confused user. (That's
my target audience.) I myself got tired of typing "IPYTHON=1" every time I
started pyspark, so I did what I'm guessing most people will do which is to
set that as an environment variable in my profile. Which was fine until
the first time I tried running a pyspark script.

Your doc change explicitly recommends I not do that, but...well, really?
It just makes learning pyspark that much more confusing. The "pyspark"
command is going to be most new users' main entry point into this new
technology.

In the new Spark class I'm working on, which uses mainly Python, I had
planned all along to set IPYTHON for the students automatically, because it
is sooooo much easier working in IPython than vanilla, and so tedious to
have to type that in repeatedly (or use command history every time.)

I think what's going to happen is that users will ignore your admonishment
to explicitly type the variable setting every time they start the shell,
and set it in their environment...then they will end up scratching their
heads trying to figure out why their scripts aren't working. (The error
that results is quite non-intuitive for a Spark newbie.)

I can live with it as is (with the doc change) but it isn't a very user
friendly thing.

On Wed, Apr 2, 2014 at 12:31 AM, UCB AMPLab notifications@github.comwrote:

Merged build finished. All automated tests passed.

Reply to this email directly or view it on GitHubhttps://github.com//pull/294#issuecomment-39287958
.

@mateiz
Copy link
Contributor Author

mateiz commented Apr 2, 2014

I see, in that case, I think we can do the following:

  • Leave in IPYTHON_OPTS as a way to pass options to IPython. Otherwise the IPython Notebook won't work, and neither will the Pylab flags and stuff like that.
  • Add back the number of arguments = 0 check you added.
  • In later versions of IPython that fix its startup bug, we can remove that check and let you run a script through IPython too. I guess you can also do IPYTHON_OPTS="myscript.py" bin/pyspark.

@mateiz
Copy link
Contributor Author

mateiz commented Apr 2, 2014

BTW I've updated it now to be just your initial commit, but without attempting to remove IPYTHON_OPTS. I think this is the best solution.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@bouk
Copy link
Contributor

bouk commented Apr 2, 2014

Note that the new version of IPython released today has my fix in it, so doing exec ipython $@ should be fine with version 2.0.0

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13700/

@mateiz
Copy link
Contributor Author

mateiz commented Apr 2, 2014

@bouk that's great, thanks, but we probably can't have it be the default for a while until more people update their IPython.

@mateiz
Copy link
Contributor Author

mateiz commented Apr 3, 2014

@dianacarroll I've merged this in now, using just your original commit (mateiz@747bb13). I think that's the best solution for now. Thanks for the feedback!

asfgit pushed a commit that referenced this pull request Apr 3, 2014
This is based on @dianacarroll's previous pull request #227, and @JoshRosen's comments on #38. Since we do want to allow passing arguments to IPython, this does the following:
* It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.)
* If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`.
* The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it.

This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython.

@JoshRosen you should probably take the final call on this.

Author: Diana Carroll <dcarroll@cloudera.com>

Closes #294 from mateiz/spark-1134 and squashes the following commits:

747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied

(cherry picked from commit a599e43)
Signed-off-by: Matei Zaharia <matei@databricks.com>
@asfgit asfgit closed this in a599e43 Apr 3, 2014
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Apr 7, 2014
Bug fixes for updating the RDD block's memory and disk usage information

Bug fixes for updating the RDD block's memory and disk usage information.
From the code context, we can find that the memSize and diskSize here are both always equal to the size of the block. Actually, they never be zero. Thus, the logic here is wrong for recording the block usage in BlockStatus, especially for the blocks which are dropped from memory to ensure space for the new input rdd blocks. I have tested it that this would cause the storage metrics shown in the Storage webpage wrong and misleading. With this patch, the metrics will be okay.
 Finally, Merry Christmas, guys:)
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This is based on @dianacarroll's previous pull request apache#227, and @JoshRosen's comments on apache#38. Since we do want to allow passing arguments to IPython, this does the following:
* It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.)
* If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`.
* The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it.

This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython.

@JoshRosen you should probably take the final call on this.

Author: Diana Carroll <dcarroll@cloudera.com>

Closes apache#294 from mateiz/spark-1134 and squashes the following commits:

747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied
lins05 pushed a commit to lins05/spark that referenced this pull request May 30, 2017
* Added files should be in the working directories.

* Revert unintentional changes

* Fix test
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
* Added files should be in the working directories.

* Revert unintentional changes

* Fix test
gatesn pushed a commit to gatesn/spark that referenced this pull request Mar 14, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
fishcus pushed a commit to fishcus/spark that referenced this pull request Nov 18, 2021
Co-authored-by: Zhiting Guo <zhiting.guo@kyligence.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants