Skip to content

Commit

Permalink
Merge branch 'jsonrpc-split'
Browse files Browse the repository at this point in the history
  • Loading branch information
dangra committed Aug 16, 2014
2 parents 8fece4b + a9292cf commit 94d00b2
Show file tree
Hide file tree
Showing 15 changed files with 18 additions and 818 deletions.
3 changes: 0 additions & 3 deletions docs/intro/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,9 +236,6 @@ scraping easy and efficient, such as:
* A :ref:`System service <topics-scrapyd>` designed to ease the deployment and
run of your spiders in production.

* A built-in :ref:`Web service <topics-webservice>` for monitoring and
controlling your bot

* A :ref:`Telnet console <topics-telnetconsole>` for hooking into a Python
console running inside your Scrapy process, to introspect and debug your
crawler
Expand Down
13 changes: 0 additions & 13 deletions docs/topics/extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ by a string: the full Python path to the extension's class name. For example::

EXTENSIONS = {
'scrapy.contrib.corestats.CoreStats': 500,
'scrapy.webservice.WebService': 500,
'scrapy.telnet.TelnetConsole': 500,
}

Expand Down Expand Up @@ -178,18 +177,6 @@ Core Stats extension
Enable the collection of core statistics, provided the stats collection is
enabled (see :ref:`topics-stats`).

.. _topics-extensions-ref-webservice:

Web service extension
~~~~~~~~~~~~~~~~~~~~~

.. module:: scrapy.webservice
:synopsis: Web service

.. class:: scrapy.webservice.WebService

See `topics-webservice`.

.. _topics-extensions-ref-telnetconsole:

Telnet console extension
Expand Down
1 change: 0 additions & 1 deletion docs/topics/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -463,7 +463,6 @@ Default::

{
'scrapy.contrib.corestats.CoreStats': 0,
'scrapy.webservice.WebService': 0,
'scrapy.telnet.TelnetConsole': 0,
'scrapy.contrib.memusage.MemoryUsage': 0,
'scrapy.contrib.memdebug.MemoryDebugger': 0,
Expand Down
229 changes: 3 additions & 226 deletions docs/topics/webservice.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,231 +4,8 @@
Web Service
===========

Scrapy comes with a built-in web service for monitoring and controlling a
running crawler. The service exposes most resources using the `JSON-RPC 2.0`_
protocol, but there are also other (read-only) resources which just output JSON
data.
webservice has been moved into a separate project.

Provides an extensible web service for managing a Scrapy process. It's enabled
by the :setting:`WEBSERVICE_ENABLED` setting. The web server will listen in the
port specified in :setting:`WEBSERVICE_PORT`, and will log to the file
specified in :setting:`WEBSERVICE_LOGFILE`.

The web service is a :ref:`built-in Scrapy extension <topics-extensions-ref>`
which comes enabled by default, but you can also disable it if you're running
tight on memory.

.. _topics-webservice-resources:

Web service resources
=====================

The web service contains several resources, defined in the
:setting:`WEBSERVICE_RESOURCES` setting. Each resource provides a different
functionality. See :ref:`topics-webservice-resources-ref` for a list of
resources available by default.

Although you can implement your own resources using any protocol, there are
two kinds of resources bundled with Scrapy:

* Simple JSON resources - which are read-only and just output JSON data
* JSON-RPC resources - which provide direct access to certain Scrapy objects
using the `JSON-RPC 2.0`_ protocol

.. module:: scrapy.contrib.webservice
:synopsis: Built-in web service resources

.. _topics-webservice-resources-ref:

Available JSON-RPC resources
----------------------------

These are the JSON-RPC resources available by default in Scrapy:

.. _topics-webservice-crawler:

Crawler JSON-RPC resource
~~~~~~~~~~~~~~~~~~~~~~~~~

.. module:: scrapy.contrib.webservice.crawler
:synopsis: Crawler JSON-RPC resource

.. class:: CrawlerResource

Provides access to the main Crawler object that controls the Scrapy
process.

Available by default at: http://localhost:6080/crawler

Stats Collector JSON-RPC resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. module:: scrapy.contrib.webservice.stats
:synopsis: Stats JSON-RPC resource

.. class:: StatsResource

Provides access to the Stats Collector used by the crawler.

Available by default at: http://localhost:6080/stats

Spider Manager JSON-RPC resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can access the spider manager JSON-RPC resource through the
:ref:`topics-webservice-crawler` at: http://localhost:6080/crawler/spiders

Extension Manager JSON-RPC resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can access the extension manager JSON-RPC resource through the
:ref:`topics-webservice-crawler` at: http://localhost:6080/crawler/spiders

Available JSON resources
------------------------

These are the JSON resources available by default:

Engine status JSON resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. module:: scrapy.contrib.webservice.enginestatus
:synopsis: Engine Status JSON resource

.. class:: EngineStatusResource

Provides access to engine status metrics.

Available by default at: http://localhost:6080/enginestatus

Web service settings
====================

These are the settings that control the web service behaviour:

.. setting:: WEBSERVICE_ENABLED

WEBSERVICE_ENABLED
------------------

Default: ``True``

A boolean which specifies if the web service will be enabled (provided its
extension is also enabled).

.. setting:: WEBSERVICE_LOGFILE

WEBSERVICE_LOGFILE
------------------

Default: ``None``

A file to use for logging HTTP requests made to the web service. If unset web
the log is sent to standard scrapy log.

.. setting:: WEBSERVICE_PORT

WEBSERVICE_PORT
---------------

Default: ``[6080, 7030]``

The port range to use for the web service. If set to ``None`` or ``0``, a
dynamically assigned port is used.

.. setting:: WEBSERVICE_HOST

WEBSERVICE_HOST
---------------

Default: ``'127.0.0.1'``

The interface the web service should listen on

WEBSERVICE_RESOURCES
--------------------

Default: ``{}``

The list of web service resources enabled for your project. See
:ref:`topics-webservice-resources`. These are added to the ones available by
default in Scrapy, defined in the :setting:`WEBSERVICE_RESOURCES_BASE` setting.

WEBSERVICE_RESOURCES_BASE
-------------------------

Default::

{
'scrapy.contrib.webservice.crawler.CrawlerResource': 1,
'scrapy.contrib.webservice.enginestatus.EngineStatusResource': 1,
'scrapy.contrib.webservice.stats.StatsResource': 1,
}

The list of web service resources available by default in Scrapy. You shouldn't
change this setting in your project, change :setting:`WEBSERVICE_RESOURCES`
instead. If you want to disable some resource set its value to ``None`` in
:setting:`WEBSERVICE_RESOURCES`.

Writing a web service resource
==============================

Web service resources are implemented using the Twisted Web API. See this
`Twisted Web guide`_ for more information on Twisted web and Twisted web
resources.

To write a web service resource you should subclass the :class:`JsonResource` or
:class:`JsonRpcResource` classes and implement the :class:`renderGET` method.

.. class:: scrapy.webservice.JsonResource

A subclass of `twisted.web.resource.Resource`_ that implements a JSON web
service resource. See

.. attribute:: ws_name

The name by which the Scrapy web service will known this resource, and
also the path where this resource will listen. For example, assuming
Scrapy web service is listening on http://localhost:6080/ and the
``ws_name`` is ``'resource1'`` the URL for that resource will be:

http://localhost:6080/resource1/

.. class:: scrapy.webservice.JsonRpcResource(crawler, target=None)

This is a subclass of :class:`JsonResource` for implementing JSON-RPC
resources. JSON-RPC resources wrap Python (Scrapy) objects around a
JSON-RPC API. The resource wrapped must be returned by the
:meth:`get_target` method, which returns the target passed in the
constructor by default

.. method:: get_target()

Return the object wrapped by this JSON-RPC resource. By default, it
returns the object passed on the constructor.

Examples of web service resources
=================================

StatsResource (JSON-RPC resource)
---------------------------------

.. literalinclude:: ../../scrapy/contrib/webservice/stats.py

EngineStatusResource (JSON resource)
-------------------------------------

.. literalinclude:: ../../scrapy/contrib/webservice/enginestatus.py

Example of web service client
=============================

scrapy-ws.py script
-------------------

.. literalinclude:: ../../extras/scrapy-ws.py

.. _Twisted Web guide: http://jcalderone.livejournal.com/50562.html
.. _JSON-RPC 2.0: http://www.jsonrpc.org/
.. _twisted.web.resource.Resource: http://twistedmatrix.com/documents/10.0.0/api/twisted.web.resource.Resource.html
Its is now hosted at:

https://github.com/scrapy/scrapy-jsonrpc
Empty file.
8 changes: 0 additions & 8 deletions scrapy/contrib/webservice/crawler.py

This file was deleted.

22 changes: 0 additions & 22 deletions scrapy/contrib/webservice/enginestatus.py

This file was deleted.

8 changes: 0 additions & 8 deletions scrapy/contrib/webservice/stats.py

This file was deleted.

12 changes: 0 additions & 12 deletions scrapy/settings/default_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@

EXTENSIONS_BASE = {
'scrapy.contrib.corestats.CoreStats': 0,
'scrapy.webservice.WebService': 0,
'scrapy.telnet.TelnetConsole': 0,
'scrapy.contrib.memusage.MemoryUsage': 0,
'scrapy.contrib.memdebug.MemoryDebugger': 0,
Expand Down Expand Up @@ -239,17 +238,6 @@
TELNETCONSOLE_PORT = [6023, 6073]
TELNETCONSOLE_HOST = '127.0.0.1'

WEBSERVICE_ENABLED = True
WEBSERVICE_LOGFILE = None
WEBSERVICE_PORT = [6080, 7030]
WEBSERVICE_HOST = '127.0.0.1'
WEBSERVICE_RESOURCES = {}
WEBSERVICE_RESOURCES_BASE = {
'scrapy.contrib.webservice.crawler.CrawlerResource': 1,
'scrapy.contrib.webservice.enginestatus.EngineStatusResource': 1,
'scrapy.contrib.webservice.stats.StatsResource': 1,
}

SPIDER_CONTRACTS = {}
SPIDER_CONTRACTS_BASE = {
'scrapy.contracts.default.UrlContract': 1,
Expand Down
Loading

0 comments on commit 94d00b2

Please sign in to comment.