[SPARK-1134] Fix and document passing of arguments to IPython #294

mateiz · 2014-04-02T03:05:04Z

This is based on @dianacarroll's previous pull request #227, and @JoshRosen's comments on #38. Since we do want to allow passing arguments to IPython, this does the following:

It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see Don't run PYTHONSTARTUP file if a file or code is passed ipython/ipython#5226, but no released version has that fix.)
If you run pyspark with IPYTHON=1, it passes your command-line arguments to it. This way you can do stuff like IPYTHON=1 bin/pyspark notebook.
The old IPYTHON_OPTS remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it.

This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring $@ for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do IPYTHON=1 bin/pyspark myscript.py to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython.

@JoshRosen you should probably take the final call on this.

…only call ipython if no command line arguments were supplied

AmplabJenkins · 2014-04-02T03:07:22Z

Merged build triggered.

AmplabJenkins · 2014-04-02T03:07:28Z

Merged build started.

AmplabJenkins · 2014-04-02T04:31:26Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-02T04:31:26Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13660/

dianacarroll · 2014-04-02T14:58:32Z

Well, my perspective is always that of a new easily confused user. (That's
my target audience.) I myself got tired of typing "IPYTHON=1" every time I
started pyspark, so I did what I'm guessing most people will do which is to
set that as an environment variable in my profile. Which was fine until
the first time I tried running a pyspark script.

Your doc change explicitly recommends I not do that, but...well, really?
It just makes learning pyspark that much more confusing. The "pyspark"
command is going to be most new users' main entry point into this new
technology.

In the new Spark class I'm working on, which uses mainly Python, I had
planned all along to set IPYTHON for the students automatically, because it
is sooooo much easier working in IPython than vanilla, and so tedious to
have to type that in repeatedly (or use command history every time.)

I think what's going to happen is that users will ignore your admonishment
to explicitly type the variable setting every time they start the shell,
and set it in their environment...then they will end up scratching their
heads trying to figure out why their scripts aren't working. (The error
that results is quite non-intuitive for a Spark newbie.)

I can live with it as is (with the doc change) but it isn't a very user
friendly thing.

On Wed, Apr 2, 2014 at 12:31 AM, UCB AMPLab notifications@github.comwrote:

Merged build finished. All automated tests passed.

Reply to this email directly or view it on GitHubhttps://github.com//pull/294#issuecomment-39287958
.

mateiz · 2014-04-02T21:53:44Z

I see, in that case, I think we can do the following:

Leave in IPYTHON_OPTS as a way to pass options to IPython. Otherwise the IPython Notebook won't work, and neither will the Pylab flags and stuff like that.
Add back the number of arguments = 0 check you added.
In later versions of IPython that fix its startup bug, we can remove that check and let you run a script through IPython too. I guess you can also do IPYTHON_OPTS="myscript.py" bin/pyspark.

mateiz · 2014-04-02T22:01:02Z

BTW I've updated it now to be just your initial commit, but without attempting to remove IPYTHON_OPTS. I think this is the best solution.

AmplabJenkins · 2014-04-02T22:02:23Z

Merged build triggered.

AmplabJenkins · 2014-04-02T22:02:29Z

Merged build started.

bouk · 2014-04-02T22:22:33Z

Note that the new version of IPython released today has my fix in it, so doing exec ipython $@ should be fine with version 2.0.0

AmplabJenkins · 2014-04-02T23:09:42Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-02T23:09:42Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13700/

mateiz · 2014-04-02T23:20:37Z

@bouk that's great, thanks, but we probably can't have it be the default for a while until more people update their IPython.

mateiz · 2014-04-03T22:49:58Z

@dianacarroll I've merged this in now, using just your original commit (mateiz@747bb13). I think that's the best solution for now. Thanks for the feedback!

@dianacarroll

This is based on @dianacarroll's previous pull request #227, and @JoshRosen's comments on #38. Since we do want to allow passing arguments to IPython, this does the following: * It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.) * If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`. * The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it. This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython. @JoshRosen you should probably take the final call on this. Author: Diana Carroll <dcarroll@cloudera.com> Closes #294 from mateiz/spark-1134 and squashes the following commits: 747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied (cherry picked from commit a599e43) Signed-off-by: Matei Zaharia <matei@databricks.com>

Bug fixes for updating the RDD block's memory and disk usage information Bug fixes for updating the RDD block's memory and disk usage information. From the code context, we can find that the memSize and diskSize here are both always equal to the size of the block. Actually, they never be zero. Thus, the logic here is wrong for recording the block usage in BlockStatus, especially for the blocks which are dropped from memory to ensure space for the new input rdd blocks. I have tested it that this would cause the storage metrics shown in the Storage webpage wrong and misleading. With this patch, the metrics will be okay. Finally, Merry Christmas, guys:)

@dianacarroll

This is based on @dianacarroll's previous pull request apache#227, and @JoshRosen's comments on apache#38. Since we do want to allow passing arguments to IPython, this does the following: * It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.) * If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`. * The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it. This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython. @JoshRosen you should probably take the final call on this. Author: Diana Carroll <dcarroll@cloudera.com> Closes apache#294 from mateiz/spark-1134 and squashes the following commits: 747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied

* Added files should be in the working directories. * Revert unintentional changes * Fix test

Co-authored-by: Zhiting Guo <zhiting.guo@kyligence.io>

SPARK-1134 bug with ipython prevents non-interactive use with spark; …

747bb13

…only call ipython if no command line arguments were supplied

asfgit closed this in a599e43 Apr 3, 2014

lins05 pushed a commit to lins05/spark that referenced this pull request May 30, 2017

Added files should be in the working directories. (apache#294)

56414f9

* Added files should be in the working directories. * Revert unintentional changes * Fix test

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017

Added files should be in the working directories. (apache#294)

27b79a2

* Added files should be in the working directories. * Revert unintentional changes * Fix test

gatesn pushed a commit to gatesn/spark that referenced this pull request Mar 14, 2018

Use latest maven patch version (apache#294)

e4e1294

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Update the expired OTC account password (apache#294)

34890b9

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

MapR [SPARK-259] Spark application doesn't finish correctly (apache#294)

e73e4d1

fishcus pushed a commit to fishcus/spark that referenced this pull request Nov 18, 2021

KE-24930 fix decimal precision (apache#294)

5856294

Co-authored-by: Zhiting Guo <zhiting.guo@kyligence.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-1134] Fix and document passing of arguments to IPython #294

[SPARK-1134] Fix and document passing of arguments to IPython #294

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

dianacarroll commented Apr 2, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

bouk commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

mateiz commented Apr 3, 2014

Uh oh!

Uh oh!

[SPARK-1134] Fix and document passing of arguments to IPython #294

[SPARK-1134] Fix and document passing of arguments to IPython #294

Uh oh!

Conversation

mateiz commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

dianacarroll commented Apr 2, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

bouk commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

mateiz commented Apr 3, 2014

Uh oh!

Uh oh!