Skip to content

Conversation

@guillermoap
Copy link
Contributor

Hello, I've been using frontera for the last couple of months and have found that in some places the docs are not up to date. In this case the setup-cluster docs.

If I try to run the dbworker as specified in the setup cluster doc on line 130, by running:
python -m frontera.worker.db --config src.config.db_worker --no-scoring --no-incoming --partitions 0,1

I get the following output:

dbworker_batch_1        |              [--partitions [PARTITIONS [PARTITIONS ...]]] --config CONFIG
dbworker_batch_1        |              [--log-level LOG_LEVEL] [--port PORT]
dbworker_batch_1        | db.py: error: argument --partitions: invalid int value: '0,1'

By trial and error I found out that the current correct way to initialize the dbworker with a specific number of partitions is by running the following:
python -m frontera.worker.db --config src.config.db_worker --no-scoring --no-incoming --partitions 0 1

As well the CRAWLING_STRATEGY config var that is specified in the doc, on line 91, if you config that var the specified crawling strategy is not taken into account by frontera. So I looked into the default_settings file, on line 77, to see how to correctly set that var and there the var that does that is named STRATEGY. When I made that change the strategy started working as expected.

So to sum everything up, I've just updated the docs to reflect this changes.

@jpbalarini
Copy link

👍

@sibiryakov sibiryakov merged commit e1a4ca9 into scrapinghub:master Nov 15, 2018
@sibiryakov
Copy link
Member

thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants