Skip to content

Conversation

mriccobene
Copy link
Member

@mriccobene mriccobene commented Jan 21, 2022

This PR fixes some downloader issues which in some cases stalled the downloader.

add dump funcs for chain bundles & orphaned links
@mriccobene mriccobene marked this pull request as ready for review January 26, 2022 17:03
@yperbasis yperbasis requested a review from canepat January 27, 2022 09:18
@codecov
Copy link

codecov bot commented Jan 27, 2022

Codecov Report

Merging #545 (6269141) into master (0bbb3f6) will increase coverage by 0.35%.
The diff coverage is 66.99%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #545      +/-   ##
==========================================
+ Coverage   81.45%   81.81%   +0.35%     
==========================================
  Files         154      153       -1     
  Lines       12857    12993     +136     
==========================================
+ Hits        10473    10630     +157     
+ Misses       2384     2363      -21     
Impacted Files Coverage Δ
core/silkworm/common/as_range.hpp 100.00% <ø> (ø)
node/silkworm/common/stopwatch.cpp 90.47% <0.00%> (-3.59%) ⬇️
node/silkworm/common/stopwatch.hpp 100.00% <ø> (ø)
...ilkworm/downloader/internals/header_only_state.cpp 2.50% <ø> (ø)
...lkworm/downloader/internals/preverified_hashes.hpp 100.00% <ø> (ø)
...de/silkworm/downloader/internals/working_chain.cpp 69.31% <64.10%> (+4.41%) ⬆️
.../silkworm/downloader/internals/persisted_chain.cpp 68.18% <71.42%> (+2.35%) ⬆️
core/silkworm/common/lru_cache.hpp 68.08% <100.00%> (+5.58%) ⬆️
...e/silkworm/downloader/internals/chain_elements.hpp 85.89% <100.00%> (+22.00%) ⬆️
...e/silkworm/downloader/internals/priority_queue.hpp 93.33% <100.00%> (+17.14%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bbb3f6...6269141. Read the comment docs.

@mriccobene mriccobene requested a review from canepat February 8, 2022 11:00
log::Settings settings;
settings.log_threads = true;
log::init(settings);
log::set_verbosity(log::Level::kInfo);
log::tee_file(std::filesystem::path("downloader.log"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to to set the file name into settings directly and let log::init(settings) do the rest.

log::Settings settings;
settings.log_file = "downloader.log";
log::init(settings);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW ... you don't set a CLI argument to set verbosity level : as a result, being defaulted to Info, all traces are not printed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just introduced CLI argument for log verbosity level, as you suggested

@AndreaLanfranchi
Copy link
Contributor

AndreaLanfranchi commented Feb 8, 2022

In thread_safe_queue I find a bit of a nonsense in here

    template <typename Duration>
    bool timed_wait_and_pop(T& popped_value, Duration const& wait_duration) {
        std::unique_lock lock(mutex_);
        if (!condition_variable_.wait_for(lock, wait_duration, [this] { return !queue_.empty(); })) {
            return false;
        }
        popped_value = queue_.front();
        queue_.pop_front();
        return true;
    }

As this is a "timed wait" I interpret it as "wait until some other thread pushes something in the queue". But this cannot happen as the very moment we enter here we lock the mutex hence any push() cannot happen as we're under the same mutex. Therefore the stop predicate in the function does not make any sense to me: basically if we enter here and the queue is empty the only possibility to exit is either the queue is not empty, or the cv to timeout. It cannot happen that something gets into the queue while waiting for the cv.

I'd have created a second mutex for the cv awaiter and rewrite as

    template <typename Duration>
    bool timed_wait_and_pop(T& popped_value, Duration const& wait_duration) {
        while(!try_pop(popped_value)) {
            std::unique_lock lock(cvmutex_);
            if(condition_variable_.template wait_for(lock, wait_duration) == std::cv_status::timeout) {
                return false;
            }
        }
        return true;
    }

This allows other threads to push items and the cv to notify. We also save a lambda which capture the whole this

Same speech for wait_and_pop which btw does not seem to be used.

@AndreaLanfranchi
Copy link
Contributor

In HeaderDownloader::execution_loop() why the early continue ?

    while (!is_stopping() && !sentry_.is_stopping()) {
        // pop a message from the queue
        std::shared_ptr<Message> message;
        bool present = messages_.timed_wait_and_pop(message, 1000ms);
        if (!present) continue;  // timeout, needed to check exiting_

        // process the message (command pattern)
        message->execute();
    }

I find this simpler and immediate to understand

    while (!is_stopping() && !sentry_.is_stopping()) {
        std::shared_ptr<Message> message;
        if(messages_.timed_wait_and_pop(message, 1s)) {
            message->execute(); // Process the message from the queue (if any)
        }
    }

@canepat
Copy link
Member

canepat commented Feb 8, 2022

In thread_safe_queue I find a bit of a nonsense in here

    template <typename Duration>
    bool timed_wait_and_pop(T& popped_value, Duration const& wait_duration) {
        std::unique_lock lock(mutex_);
        if (!condition_variable_.wait_for(lock, wait_duration, [this] { return !queue_.empty(); })) {
            return false;
        }
        popped_value = queue_.front();
        queue_.pop_front();
        return true;
    }

As this is a "timed wait" I interpret it as "wait until some other thread pushes something in the queue". But this cannot happen as the very moment we enter here we lock the mutex hence any push() cannot happen as we're under the same mutex.

AFAIK the code seems correct as it is because the std::condition_variable::wait_for method performs three actions atomically: releases lock, blocks the current executing thread and adds it to the the waiting threads. When the current thread is unblocked (either because of a notify or timeout expired), it will reacquire the lock before exiting the call

@AndreaLanfranchi
Copy link
Contributor

You're right sorry ... forgot the unlock by the cv and the fact waiting with predicate returns the result of the predicate, not of the waiting. My fault : big mistake.

@mriccobene mriccobene merged commit 8d93dde into master Feb 10, 2022
@mriccobene mriccobene deleted the fix_downloader_issues branch February 10, 2022 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants