Skip to content

Improve tracker stats importer performace #569

Open
@josecelano

Description

@josecelano

There is a background task to import torrent stats form the tracker using the tracker REST API. The Index imports a number of seeders and leechers for all torrents in the Index (all torrents in the Index are also in the Tracker, but not all torrents in the Tracker are in the Index).

Old solution

loop
  start timer
  for all torrents in Index
    import stats
  end for
  check timer: if less than 1 hour has passed wait until 1 hour has passed from start timer time.
end loop

Pros:

  • It's the fastest way to import all torrents. The process does not stop until it imports all torrents.

Cons:

  • If the process is interrupted, the process will start again from the beginning, and some torrents can not be imported at all.

Current solution

I changed the old solution adding an updated_at field to the stats records.

loop
  import 50 torrents that have been unimported for more than one hour
  wait 2 seconds
end loop

Pros:

  • If the process (or server is restarted) it will start from the torrents that have not been imported in the last hour. It's more robust but slower.
  • The tracker does not receive too many requests.

Cons

  • We are limited to 50 torrents per second.

NOTICE: we wait 2 seconds between importations because if there is nothing to import, the CPU is constantly running queries against the database to get the updated list of torrents pending import.

Newly proposed solution

loop
  while there are torrents pending to import
    import 50 torrents that have been unimported for more than one hour
  end while
  no more torrents pending to update -> wait 2 seconds
end loop

NOTICE: we wait only when there is nothing to import. Instead of continuously checking if there is more job to do, we only check every 2 seconds (config value). However, when there are torrents to update, we don't stop updating them.

Pros:

  • Faster importation
  • Avoid polling the DB too many times

Cons

  • Maybe, the tracker could receive too many requests. You cannot control that with a config option, unless we also add a limit for the inner loop in this solution.

cc @da2ce7

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions