Skip to content

Initial implementation #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
comment:
layout: "header, diff, tree"

coverage:
status:
project: false
23 changes: 23 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Publish
on:
push:
tags:
- 'v[0-9]+.[0-9]+.[0-9]+'
jobs:
publish:
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/${{ github.event.repository.name }}
permissions:
id-token: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.13
- run: |
python -m pip install --upgrade build
python -m build
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
49 changes: 49 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: test
on:
push:
branches: [ main ]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
include:
- python-version: '3.13'
toxenv: pre-commit
no-coverage: true
- python-version: '3.13'
toxenv: docs
no-coverage: true
- python-version: '3.13'
toxenv: mypy
no-coverage: true
- python-version: '3.13'
toxenv: twine
no-coverage: true
- python-version: '3.9'
toxenv: min
- python-version: '3.9'
- python-version: '3.10'
- python-version: '3.11'
- python-version: '3.12'
- python-version: '3.13'
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install tox
- name: tox
run: |
tox -e ${{ matrix.toxenv || 'py' }}
- name: coverage
if: ${{ success() && !matrix.no-coverage }}
uses: codecov/codecov-action@v4.0.1
with:
token: ${{ secrets.CODECOV_TOKEN }}
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.coverage
.mypy_cache/
.tox/
dist/
htmlcov/
coverage.xml
docs/_build
*.egg-info/
__pycache__/
coverage-html/
build/
.idea/
venv/
13 changes: 13 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.10
hooks:
- id: ruff
args: [ --fix ]
- id: ruff-format
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.19.1
hooks:
- id: blacken-docs
additional_dependencies:
- black==25.1.0
12 changes: 12 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
version: 2
formats: all
sphinx:
configuration: docs/conf.py
build:
os: ubuntu-24.04
tools:
python: "3.13" # Keep in sync with .github/workflows/test.yml
python:
install:
- requirements: docs/requirements.txt
- path: .
8 changes: 8 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
=======
Changes
=======

0.0.1 (unreleased)
==================

Initial version.
45 changes: 22 additions & 23 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,28 +1,27 @@
BSD 3-Clause License
Copyright (c) Zyte Group Ltd
All rights reserved.

Copyright (c) 2025, Scrapy Plugins
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of Zyte nor the names of its contributors may be used
to endorse or promote products derived from this software without
specific prior written permission.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
33 changes: 33 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
=================
scrapy-crawl-maps
=================

.. image:: https://img.shields.io/pypi/v/scrapy-crawl-maps.svg
:target: https://pypi.python.org/pypi/scrapy-crawl-maps
:alt: PyPI Version

.. image:: https://img.shields.io/pypi/pyversions/scrapy-crawl-maps.svg
:target: https://pypi.python.org/pypi/scrapy-crawl-maps
:alt: Supported Python Versions

.. image:: https://github.com/scrapy-plugins/scrapy-crawl-maps/actions/workflows/test.yml/badge.svg
:target: https://github.com/scrapy-plugins/scrapy-crawl-maps/actions/workflows/test.yml
:alt: Automated tests

.. image:: https://codecov.io/github/scrapy-plugins/scrapy-crawl-maps/coverage.svg?branch=main
:target: https://codecov.io/gh/scrapy-plugins/scrapy-crawl-maps
:alt: Coverage report

.. description-start

**scrapy-crawl-maps** is a Scrapy_ plugin that allows defining the logic of a
spider using a `directed graph`_ defined in JSON_ format.

.. _directed graph: https://en.wikipedia.org/wiki/Directed_graph
.. _JSON: https://www.json.org/json-en.html
.. _Scrapy: https://scrapy.org/

.. description-end

* Documentation: https://scrapy-crawl-maps.readthedocs.io/en/latest/
* License: BSD 3-clause
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
25 changes: 25 additions & 0 deletions docs/_ext/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
def setup(app):
# https://stackoverflow.com/a/13663325
#
# Copied from Scrapy:
# https://github.com/scrapy/scrapy/blob/dba37674e6eaa6c2030c8eb35ebf8127cd488062/docs/_ext/scrapydocs.py#L90C16-L110C6
app.add_crossref_type(
directivename="setting",
rolename="setting",
indextemplate="pair: %s; setting",
)
app.add_crossref_type(
directivename="signal",
rolename="signal",
indextemplate="pair: %s; signal",
)
app.add_crossref_type(
directivename="command",
rolename="command",
indextemplate="pair: %s; command",
)
app.add_crossref_type(
directivename="reqmeta",
rolename="reqmeta",
indextemplate="pair: %s; reqmeta",
)
106 changes: 106 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
.. _reference:

==========
Python API
==========

Nodes
=====

.. _builtin-node-types:

Built-in node types
-------------------

.. autoclass:: scrapy_crawl_maps.FetchNode()
:show-inheritance:
:members: type, spec

.. autoclass:: scrapy_crawl_maps.ItemFollowNode()
:show-inheritance:
:members: type, spec

.. autopydantic_model:: scrapy_crawl_maps.ItemFollowNodeParams
:inherited-members: BaseModel

.. autoclass:: scrapy_crawl_maps.ItemsNode()
:show-inheritance:
:members: type, spec

.. autopydantic_model:: scrapy_crawl_maps.ItemsNodeParams
:inherited-members: BaseModel

.. autoclass:: scrapy_crawl_maps.SelectorParserNode()
:show-inheritance:
:members: type, spec

.. autopydantic_model:: scrapy_crawl_maps.SelectorParserNodeParams
:inherited-members: BaseModel

.. autoclass:: scrapy_crawl_maps.UrlsNode()
:show-inheritance:
:members: type, spec

.. autopydantic_model:: scrapy_crawl_maps.UrlsNodeParams
:inherited-members: BaseModel

.. autoclass:: scrapy_crawl_maps.UrlsFileNode()
:show-inheritance:
:members: type, spec

.. autopydantic_model:: scrapy_crawl_maps.UrlsFileNodeParams
:inherited-members: BaseModel


Node base classes
-----------------

.. autoclass:: scrapy_crawl_maps.ProcessorNode
:members: type, spec, process, process_request

.. autoclass:: scrapy_crawl_maps.SpiderNode
:members: type, spec, process_input, process_output

.. autoclass:: scrapy_crawl_maps.NodeArgs()
:members: args

.. _port-types:

Port types
==========



.. _spiders:

Spiders
=======

Crawl map spider
----------------

.. autoclass:: scrapy_crawl_maps.CrawlMapSpider()
:show-inheritance:

.. autopydantic_model:: scrapy_crawl_maps.CrawlMapSpiderParams
:inherited-members: BaseModel

.. autoclass:: scrapy_crawl_maps.CrawlMapSpiderCrawlMap
:show-inheritance:


Base spider
-----------

.. autoclass:: scrapy_crawl_maps.CrawlMapBaseSpider()


Crawl map
=========

.. autoclass:: scrapy_crawl_maps.CrawlMap()
:members:

.. autoclass:: scrapy_crawl_maps.ResponseData
:members:
1 change: 1 addition & 0 deletions docs/changes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.. include:: ../CHANGES.rst
Loading
Loading