Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit valid YAML for cachemgr:server_list #1735

Open
wants to merge 140 commits into
base: master
Choose a base branch
from

Conversation

kinkie
Copy link
Contributor

@kinkie kinkie commented Mar 13, 2024

Use PackableStream for CachePeer::reportStatistics and
update output format to well-formed YAML

@kinkie
Copy link
Contributor Author

kinkie commented Mar 13, 2024

Runtime test:

master:

Sibling    : localhost
Host       : localhost/13128/0
Flags      : default connection-auth=auto tls-disable
Address[0] : 127.0.0.1
Status     : Down
FETCHES    : 0
OPEN CONNS : 0
AVG RTT    : 0 msec
LAST QUERY : 1710344842 seconds ago
LAST REPLY : none received
PINGS SENT :        0
PINGS ACKED:        0   0%
IGNORED    :        0   0%
Histogram of PINGS ACKED:
Last failed connect() at: 13/Mar/2024:15:47:16 +0000
keep-alive ratio: 0%

PR:

Sibling    : localhost
Host       : localhost/13128/0
Flags      : default connection-auth=auto tls-disable
Address[0] : 127.0.0.1
Status     : Down
FETCHES    :        0
OPEN CONNS :        0
AVG RTT    :        0 msec
LAST QUERY : none sent
LAST REPLY : none received
PINGS SENT :        0
PINGS ACKED:        0   0%
IGNORED    :        0   0%
Histogram of PINGS ACKED:
Last failed connect() at: 13/Mar/2024:15:54:31 +0000
keep-alive ratio: 0%

Delta:

--- /tmp/before	2024-03-13 22:54:54
+++ /tmp/after	2024-03-13 22:54:53
@@ -3,14 +3,14 @@
 Flags      : default connection-auth=auto tls-disable
 Address[0] : 127.0.0.1
 Status     : Down
-FETCHES    : 0
-OPEN CONNS : 0
-AVG RTT    : 0 msec
-LAST QUERY : 1710344842 seconds ago
+FETCHES    :        0
+OPEN CONNS :        0
+AVG RTT    :        0 msec
+LAST QUERY : none sent
 LAST REPLY : none received
 PINGS SENT :        0
 PINGS ACKED:        0   0%
 IGNORED    :        0   0%
 Histogram of PINGS ACKED:
-Last failed connect() at: 13/Mar/2024:15:47:16 +0000
+Last failed connect() at: 13/Mar/2024:15:54:31 +0000
 keep-alive ratio: 0%

yadij

This comment was marked as resolved.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for posting sample output and the diff!

src/neighbors.cc Outdated Show resolved Hide resolved
src/neighbors.cc Outdated Show resolved Hide resolved
src/neighbors.cc Outdated Show resolved Hide resolved
src/neighbors.cc Outdated Show resolved Hide resolved
src/neighbors.cc Outdated Show resolved Hide resolved
src/neighbors.cc Outdated Show resolved Hide resolved
@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label Mar 13, 2024
rousskov

This comment was marked as outdated.

@kinkie

This comment was marked as resolved.

@kinkie
Copy link
Contributor Author

kinkie commented Mar 15, 2024

Current sample output:

Name       : 127.0.0.1
Type       : Sibling
Addresses  : 1
Host       : 127.0.0.1/13129/0
Flags      : default connection-auth=auto tls-disable
Address[0] : 127.0.0.1
Status     : Down
Fetches    : 0
Open conns : 0
Avg rtt    : 0 msec
Pings sent : 0
Pings acked: 0 0%
Ignored    : 0 0%
Histogram of pings acked:
Last failed connect() at: 15/Mar/2024:05:27:58 +0000
keep-alive ratio: 0%

@kinkie kinkie requested review from rousskov and yadij March 15, 2024 05:34
@kinkie kinkie added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Mar 15, 2024
src/CachePeer.cc Outdated Show resolved Hide resolved
src/CachePeer.cc Outdated Show resolved Hide resolved
src/CachePeer.cc Outdated Show resolved Hide resolved
src/CachePeer.cc Outdated Show resolved Hide resolved
src/CachePeer.cc Outdated Show resolved Hide resolved
src/CachePeer.cc Outdated Show resolved Hide resolved
kinkie and others added 30 commits June 22, 2024 23:34
…cache#1797)

Detected by Coverity. CID 1529581: Unnecessary object copies can affect
performance (COPY_INSTEAD_OF_MOVE).
…-cache#1802)

Detected by Coverity. CID 1529599: Unnecessary object copies can affect
performance (COPY_INSTEAD_OF_MOVE).
…-cache#1803)

Detected by Coverity. CID 1529569: Unnecessary object copies can affect
performance (COPY_INSTEAD_OF_MOVE).
Use nproc to detect the actual number of CPU cores available instead of
going by an old "2 cores" value from GitHub Actions runners docs.
…#1801)

Detected by Coverity. CID 1529543 and CID 1529594: Unnecessary object
copies can affect performance (COPY_INSTEAD_OF_MOVE).
…d-cache#1764)

... as well as response_delay_pool and send_hit directives.

    auto check = clientAclChecklistCreate(...); // sets check->reply
    check->reply = reply; // self-assignment does nothing
    HTTPMSGLOCK(check->reply); // an unwanted/extra lock

When ACLFilledChecklist::reply is already set to X, resetting it to X
should not change HttpReply lock count, but some manual locking code did
not check that "already set" precondition and over-locked reply objects
set to ClientHttpRequest::al::reply by clientAclChecklistFill().

Current known leaks were probably introduced in 2021 commit e227da8 that
started supplying HttpReply to ACLChecklist in clientAclChecklistFill().
The added code locked the reply object correctly, but it was
incompatible with unconditional manual locks in three existing indirect
clientAclChecklistFill() callers (calling clientAclChecklistCreate()).

This change removes all known similar leaks and improves
ACLFilledChecklist API to prevent future similar leaks.
…-cache#1807)

Currently, the insert() method calls implicitly convert raw "this"
pointers to RefCount pointers. That self-registration code raises red
flags and may eventually be refactored. If it is refactored, insert()
calls are likely to start using RefCount pointers as parameters. This
change allows those future calls to avoid RefCount pointer copies. This
change does not affect current calls performance.

This change was triggered by Coverity CID 1554665: Unnecessary object
copies can affect performance (COPY_INSTEAD_OF_MOVE). However, the
insert() method is still calling RefCount copy assignment operator
rather than its (cheaper) move assignment operator. Safely removing that
overhead is only possible by reintroducing future parameter copying
overhead described above _and_ using an explicit std::move() call that
we are trying to avoid in general code. It is arguably not worth it.
…d-cache#1787)

Adding a second matrix test dimension can decrease
testing wall time by about 75% by parallelizing tests
Add "compiler" dimension to build-tests matrix.

Do not test layer-04-noauth-everything to partially compensate for the
new matrix dimension repeating other layer tests 3 times.

Change build-tests artifacts name pattern to include compiler dimension
to avoid name clashes that would result in logs being overwritten and
lost. Also use target environment instead of runner environment in
anticipation of adding container-based tests where the two differ.
For example, ACLChecklist::resumeNonBlockingCheck() missed a
markFinished() call when prepNonBlocking() returned false after a
reconfiguration. Missing markFinished() calls result in the last
evaluated ACL name being unset in nonBlockingCheck() callbacks.
    src/fs/rock/RockRebuild.cc:356:17: error: variable length arrays
    in C++ are a Clang extension [-Werror,-Wvla-cxx-extension]
        char hdrBuf[SwapDir::HeaderSize];
    note: initializer of 'HeaderSize' is unknown
…d-cache#1818)

The three ACLs were documented as matching any username when configured
with a parameter spelled "REQUIRED". Neither actually treated that
parameter specially -- all interpreted it as an ordinary regex.

This dangerous documentation bug was introduced in 2000 commit 145cf92
that added ident_regex and proxy_auth_regex support. It was then
duplicated in 2003 commit abb929f that added ext_user_regex support.

This minimal documentation fix does not imply that these ACLs should not
treat REQUIRED values specially. Enabling such special treatment
requires significant code changes, especially if we want to do that well
and without duplicating the corresponding code.
Do not hand-roll tests for exception-throwing code
Do not hand-roll tests for exception-throwing code
Resolve a long-standing TODO and stop hand-rolling HTTP range unit tests
…d-cache#1820)

    WARNING: markAsTunneled ACL is used in context without an HTTP
    request. Assuming mismatch.

Our annotate_client and annotate_transaction ACLs are documented as
"always matching", and some existing Squid configurations rely on that
invariant. Both ACLs did not match when they lacked access to the
current transaction information because their requiresRequest() method
returned true; annotate_client ACL also did not match after the
client-to-Squid connection was gone and when the transaction was not
associated with a client-to-Squid connection.

This change makes ACL code conform to documentation. Squid still warns
the admin if an ACL cannot annotate. Such warnings may indicate s Squid
bug or misconfiguration, but mismatching in those cases causes more harm
because it makes it impossible for the admin to rely on the primary
matching invariant. "One reliable invariant plus one unreliable side
effect" is the lesser evil than "two unreliable effects".

    # the following must deny even if it cannot annotate
    http_access deny markAsDenied

    # the following might not log denied traffic (with a prior warning)
    access_log syslog:daemon.err markedAsDenied

Also fixed annotate_client to annotate the current transaction even
after ConnStateData destruction. Such annotations may happen when, for
example, Squid continues a large download after the HTTP client is gone.
Ident protocol (RFC 931 obsoleted by RFC 1413) has been considered
seriously insecure and broken since at least 2009 when SANS issued an
update recommending its removal from all networks. Squid Ident
implementation suffered from assertions (since 2021 commit e227da8) and
memory leaks (since 2015 commit fbbea66). Ident implementation design
increased ACLChecklist::goAsync() design complexity.

An external ACL helper can be written to perform Ident transactions.

Configurations using ident/ident_regex ACLs, %ui logformat codes, %IDENT
external_acl_type format code, or ident_lookup_access/ident_timeout
directives are now rejected, leading to fatal startup failures:

    FATAL: Invalid ACL type 'ident_regex'
    FATAL: Invalid ACL type 'ident'
    ERROR: configuration failure: Unsupported %code: '%ui'
    ERROR: configuration failure: Unsupported %code: '%IDENT'
    squid.conf(6): unrecognized: 'ident_lookup_access'
    squid.conf(7): unrecognized: 'ident_timeout'

To avoid inconveniencing admins that do _not_ use Ident features, access
logs with "common" and "combined" logformats now always receive a dash
in the position of what used to be a %ui record field.

Co-authored-by: Amos Jeffries <squid3@treenet.co.nz>
TrieNode::add() incorrectly computed an offset of an internal data
structure, resulting in out-of-bounds memory accesses that could cause
corruption or crashes.

This bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/esi-underflow.html
where it was filed as "Buffer Underflow in ESI".
In triage, it is often useful to know that ErrorState was created _when_
it was created rather than waiting for errorAppendEntry() or errorSend()
debugging, especially if those error handling stages are not reached.

Also report HTTP status code of the error response (when available).
…#1832)

All obsolete directives except for log_access and log_icap do not result
in Squid instance death today. Treating those two directives specially
is not necessary but does complicate reconfiguration improvements a bit.

Future refactoring might add support for treating (some or all) obsolete
directives as fatal configuration errors, but we can simplify for now.
There is no need for dynamic allocation risks and performance overheads
in these simple "immediate fastCheck()" cases.

This change converts all dynamically allocated checklists that use only
fastCheck() calls except one: Security::PeerConnector::initialize() has
to allocate its checklist dynamically to pass it to SSL_set_ex_data().
Fix a bug with mkrelease.sh which would
incorrectly abort unless the minor release is
only a single digit number
When Squid does not recognize a directive name, it reports the problem
and proceeds to parse the remaining configuration lines. When parsing is
complete, Squid quits if it has encountered such directives. This change
preserves this overall behavior while fixing related reporting problems.
Fixed problems depend on whether Squid discovered an unrecognized
directive at startup or during reconfiguration.

Startup configuration case was missing both an ERROR classification and
a FATAL message, resulting in a somewhat mysterious death:

    Processing Configuration File: squid.conf (depth 0)
    squid.conf(7): unrecognized: 'http_accesses'
    Starting Authentication on port [::]:4443
    Disabling Authentication on port [::]:4443 (interception enabled)

If other non-fatal ERRORs were present, that startup death reporting
became more problematic because admins would naturally (but incorrectly)
assume that Squid died due to those irrelevant ERRORs:

    Processing Configuration File: squid.conf (depth 0)
    squid.conf(7): unrecognized: 'http_accesses'
    ERROR: Unsupported TLS option SINGLE_DH_USE

Reconfiguration case was worse because Squid clearly attributed failure
to the wrong (innocent and default) directive that was not even present
in the configuration file (among other problems like saying "null"):

    Reconfiguring Squid Cache (version 7.0.0-VCS)...
    squid.conf(7): unrecognized: 'http_accesses'
    ERROR: Unsupported TLS option SINGLE_DH_USE
    FATAL: Bungled (null) line 3: sslproxy_cert_sign signTrusted all
    Squid Cache (Version 7.0.0-VCS): Terminated abnormally.

Squid now reports unrecognized directives using the standardized ERROR
prefix and prints correct FATAL message in both cases:

    Processing Configuration File: squid.conf (depth 0)
    ERROR: unrecognized directive near 'http_accesses'
        directive location: squid.conf(7)
    FATAL: reconfiguration failure: Found 1 unrecognized directive(s)
        exception location: cache_cf.cc(610) Parse
        advice: Run 'squid -k parse' and check for ERRORs.
    Squid Cache (Version 7.0.0-VCS): Terminated abnormally.

Also reduced the number of nested try/catch blocks, simplifying
configuration code. However, this change is just a small step forward.
More fatal (re)configuration error handling improvements are needed,
including removing unwanted and noisy fatal() side effects and possibly
recognizing other kinds of errors that should not immediately terminate
configuration parsing.
…he#1844)

    ERROR: Squid BUG: cannot aggregate mgr:userhash:
        check failed: cmd->profile != nullptr
        exception location: cache_manager.cc(102) createRequestedAction

Since 2010 commit 7de94c8, only worker processes made RegisterAction()
calls associated with the affected cache manager reports. Coordinator
process needs these registrations as well because it uses Mgr::Action
code while iterating report-generating kids[^1].

When fixed Coordinator asks a disker process for an Action report, that
process must recognize the action as well (even when disker has no
information to supply) to avoid triggering similar Action lookup BUGs.

[^1]: Coordinator mostly needs Mgr::Action for stats aggregation, but
even non-aggregating actions (like the fixed three) need registered
Mgr::ActionProfile objects for Coordinator to know whether to aggregate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-ignored-by-merge-bots https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-labels S-waiting-for-author author action is expected (and usually required) S-waiting-for-PR Closure of other PR(s), current or future, is expected (and usually required) S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants