Skip to content

[SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx #2292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Sep 5, 2014

Using Sphinx to generate API docs for PySpark.

requirement: Sphinx

$ cd python/docs/
$ make html

The generated API docs will be located at python/docs/_build/html/index.html

It can co-exists with those generated by Epydoc.

This is the first working version, after merging in, then we can continue to improve it and replace the epydoc finally.

@JoshRosen
Copy link
Contributor

When I tried this, I got a lot of warnings saying that Sphinx couldn't import the PySpark modules:

[joshrosen python (9081ead...)]$ make html
sphinx-build -b html -d _build/doctrees   . _build/html
Running Sphinx v1.2.2
loading pickled environment... done
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 3 changed, 0 removed
reading sources... [100%] pyspark.sql
/Users/joshrosen/Documents/spark/docs/api/python/pyspark.rst:16: WARNING: autodoc: failed to import module u'pyspark'; the following exception was raised:
Traceback (most recent call last):
  File "/Users/joshrosen/anaconda/lib/python2.7/site-packages/sphinx/ext/autodoc.py", line 335, in import_object
    __import__(self.modname)
ImportError: No module named pyspark
/Users/joshrosen/Documents/spark/docs/api/python/pyspark.mllib.rst:10: WARNING: autodoc: failed to import module u'pyspark.mllib.classification'; the following exception was raised:
Traceback (most recent call last):
  File "/Users/joshrosen/anaconda/lib/python2.7/site-packages/sphinx/ext/autodoc.py", line 335, in import_object
    __import__(self.modname)
ImportError: No module named pyspark.mllib.classification

I guess this is because $SPARK_HOME/python/ isn't on the default PYTHONPATH.

When I do

export PYTHONPATH=../../../python/

it picks up the imports and runs.

@davies
Copy link
Contributor Author

davies commented Sep 5, 2014

thanks, I had update the PR description.

@JoshRosen
Copy link
Contributor

Maybe we can just stick that command inside the Makefile...

@JoshRosen
Copy link
Contributor

For those unfamiliar with Sphinx, here's a screenshot of the new docs (which look great!):

image

@JoshRosen
Copy link
Contributor

It looks like there's some markup, like C{(a, b)}, that doesn't render properly since it's a holdover from the epydocs:

image

@JoshRosen
Copy link
Contributor

There are also a few cases of docstrings that didn't get rendered properly:

image

This looks really good overall. One wishlist item: can we add links from the homepage / index.html to the core classes (RDD, SparkContext, etc.)?

@davies
Copy link
Contributor Author

davies commented Sep 5, 2014

Yes, I tried to convert most of the markup (but not all of them), what does this mean in epydocs?

@JoshRosen
Copy link
Contributor

The markup like C{}, I{}, etc. is epydoc's inline markup. In Sphinx's inline markup, I think you can just use backticks for denoting variables, etc.

@nchammas
Copy link
Contributor

nchammas commented Sep 5, 2014

There are also a few cases of docstrings that didn't get rendered properly:

FYI: Those ones are broken in epydoc, too, so in the case of top() and others broken in that fashion I doubt it's a Sphinx issue.

@JoshRosen
Copy link
Contributor

I guess we have some choices for the markup language dialect that we use for our docstrings. I tend to prefer the Google style to ReStructuredText / Javadoc, since it's a little less cluttered to edit / read. We can revisit this later, though; I don't think it's important to rewrite all of the docstrings as part of this patch.

@davies
Copy link
Contributor Author

davies commented Sep 5, 2014

@JoshRosen @nchammas I had addressed all the comments, please take a look again.

@nchammas
Copy link
Contributor

nchammas commented Sep 5, 2014

Oh, sorry I didn't mean the badly formatted doc strings should be fixed in this PR. At least not the ones that are also bad in epydoc. That probably should be left for another PR.

There are many strings that need fixing:

Etc...

@JoshRosen
Copy link
Contributor

Hmm, the Makefile PYTHONPATH didn't seem to work. Maybe we should add those directories to sys.path in conf.py, which has a section for this:

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath('.'))

@davies
Copy link
Contributor Author

davies commented Sep 5, 2014

@JoshRosen The Makefile has been fixed.

@davies
Copy link
Contributor Author

davies commented Sep 6, 2014

Jenkins, test this please.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@marmbrus
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have started for PR 2292 at commit 9468ab0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have finished for PR 2292 at commit 9468ab0.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Weird, I've never seen Hive fail to initialize like this before.

@marmbrus
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have started for PR 2292 at commit 9468ab0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have finished for PR 2292 at commit 9468ab0.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2292 at commit 9468ab0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2292 at commit 9468ab0.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rnowling
Copy link
Contributor

I actually fixed a number of the Python docstrings previously:
#1808

The changes just didn't show up in the docs until the 1.1.0 release.

I added a new PR to fix .top() :
#2370

Since this PR includes the functionality of my new PR, should I close the new PR?

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

QA tests have started for PR 2292 at commit 1573298.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

QA tests have finished for PR 2292 at commit 1573298.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class TaskCompletionListenerException(errorMessages: Seq[String]) extends Exception
    • class Dummy(object):

@davies
Copy link
Contributor Author

davies commented Sep 13, 2014

@JoshRosen I had moved the docs to python/docs/, rebased with master. I think it's ready to merge, please take another look, thanks.

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

QA tests have started for PR 2292 at commit 425a3b1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

Tests timed out after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

QA tests have started for PR 2292 at commit 425a3b1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

Tests timed out after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

QA tests have started for PR 2292 at commit 425a3b1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

QA tests have finished for PR 2292 at commit 425a3b1.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JavaSparkContext(val sc: SparkContext)
    • class TaskCompletionListenerException(errorMessages: Seq[String]) extends Exception
    • class Dummy(object):
    • class JavaStreamingContext(val ssc: StreamingContext) extends Closeable

@SparkQA
Copy link

SparkQA commented Sep 14, 2014

QA tests have started for PR 2292 at commit 425a3b1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 14, 2014

QA tests have finished for PR 2292 at commit 425a3b1.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JavaSparkContext(val sc: SparkContext)
    • throw new IllegalStateException("The main method in the given main class must be static")
    • class TaskCompletionListenerException(errorMessages: Seq[String]) extends Exception
    • class Dummy(object):
    • class RatingDeserializer(FramedSerializer):
    • case class CreateTableAsSelect(
    • class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
    • class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
    • class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
    • class Encoder extends compression.Encoder[IntegerType.type]
    • class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])
    • class Encoder extends compression.Encoder[LongType.type]
    • class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])
    • case class CreateTableAsSelect(
    • class JavaStreamingContext(val ssc: StreamingContext) extends Closeable

@JoshRosen
Copy link
Contributor

This looks good to me; I'm going to merge this into master but leave the JIRA open so that we remember to eventually remove the epydocs / etc.

@asfgit asfgit closed this in ec1adec Sep 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants