Skip to content

Historical crawl tasks#789

Merged
Andrei-Dolgolev merged 14 commits intomainfrom
historical-crawl-tasks
May 31, 2023
Merged

Historical crawl tasks#789
Andrei-Dolgolev merged 14 commits intomainfrom
historical-crawl-tasks

Conversation

@Andrei-Dolgolev
Copy link
Contributor

@Andrei-Dolgolev Andrei-Dolgolev commented May 23, 2023

Changes

Add atuation about historical crawl of each new subscription.

Depending on Spire client update

Tags tasks schema. https://drive.google.com/file/d/1z_jsWnqtxKSU-kHD6RNnaRdMoR9_PVRU/view?usp=sharing

General logic

For each blockchain, create two worker instances separate for events and transactions. These workers will read a 'moonworm-jobs' journal.

Workers will collect abi's data for newly added subscriptions to addresses and generate a list of tasks in the form of events or functions to be crawled. The entries should have the tags ['moonworm_task_pickedup:True','historical_crawl_status:pending'].

The worker processes will pick up these jobs and update their status to 'historical_crawl_status:in_progress'.

For every iteration over a batch of blocks, the worker process will update the status of each entry to indicate the progress of the crawl. The progress is represented as status:x.yy, where x.yy is the percentage of the crawl that has been completed for a specific address.

The progress percentage is calculated using the formula:

(latest_block - current_block)/(latest_block - deployment_block) * 100

latest_block - get just in start and it current block on node.
current_block - block between latest_block and deployment_block

This gives the percentage completion of the crawl for the given address.

Once the progress percentage is 100 or more, the worker process will set the entry status to 'historical_crawl_status:finished' indicating that the crawl job for that particular address has been completed.

Run command:

moonworm-crawler --access-id $NB_CONTROLLER_ACCESS_ID historical-crawl --blockchain-type polygon --find-deployed-blocks --end 0 --tasks-journal --only-events

new subcommands:
--only-functions - only method calls.
--tasks-journal - Get all tasks from moonworm journal.
--find-deployed-blocks - use moonworm find deployment functionality.

How to test these changes?

Related issues


# Historical crawler status config

HISTORICAL_CRAWLER_STATUSES = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use Enums, but let's try dictionary)

Comment on lines 233 to 235
if block is None:
logger.warning(
f"Failed to find deployment block for {address}, code: {code}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to throw exception here

Copy link
Contributor

@kompotkot kompotkot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

@Andrei-Dolgolev Andrei-Dolgolev merged commit 72ccc6c into main May 31, 2023
@Andrei-Dolgolev Andrei-Dolgolev deleted the historical-crawl-tasks branch May 31, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants