Emit valid YAML for cachemgr:server_list #1735

kinkie · 2024-03-13T15:56:03Z

Use PackableStream for CachePeer::reportStatistics and
update output format to well-formed YAML

kinkie · 2024-03-13T15:57:23Z

Runtime test:

master:

Sibling    : localhost
Host       : localhost/13128/0
Flags      : default connection-auth=auto tls-disable
Address[0] : 127.0.0.1
Status     : Down
FETCHES    : 0
OPEN CONNS : 0
AVG RTT    : 0 msec
LAST QUERY : 1710344842 seconds ago
LAST REPLY : none received
PINGS SENT :        0
PINGS ACKED:        0   0%
IGNORED    :        0   0%
Histogram of PINGS ACKED:
Last failed connect() at: 13/Mar/2024:15:47:16 +0000
keep-alive ratio: 0%

PR:

Sibling    : localhost
Host       : localhost/13128/0
Flags      : default connection-auth=auto tls-disable
Address[0] : 127.0.0.1
Status     : Down
FETCHES    :        0
OPEN CONNS :        0
AVG RTT    :        0 msec
LAST QUERY : none sent
LAST REPLY : none received
PINGS SENT :        0
PINGS ACKED:        0   0%
IGNORED    :        0   0%
Histogram of PINGS ACKED:
Last failed connect() at: 13/Mar/2024:15:54:31 +0000
keep-alive ratio: 0%

Delta:

--- /tmp/before	2024-03-13 22:54:54
+++ /tmp/after	2024-03-13 22:54:53
@@ -3,14 +3,14 @@
 Flags      : default connection-auth=auto tls-disable
 Address[0] : 127.0.0.1
 Status     : Down
-FETCHES    : 0
-OPEN CONNS : 0
-AVG RTT    : 0 msec
-LAST QUERY : 1710344842 seconds ago
+FETCHES    :        0
+OPEN CONNS :        0
+AVG RTT    :        0 msec
+LAST QUERY : none sent
 LAST REPLY : none received
 PINGS SENT :        0
 PINGS ACKED:        0   0%
 IGNORED    :        0   0%
 Histogram of PINGS ACKED:
-Last failed connect() at: 13/Mar/2024:15:47:16 +0000
+Last failed connect() at: 13/Mar/2024:15:54:31 +0000
 keep-alive ratio: 0%

rousskov

Thank you for posting sample output and the diff!

src/neighbors.cc

Co-authored-by: Alex Rousskov <rousskov@measurement-factory.com>

kinkie · 2024-03-15T05:31:53Z

Current sample output:

Name       : 127.0.0.1
Type       : Sibling
Addresses  : 1
Host       : 127.0.0.1/13129/0
Flags      : default connection-auth=auto tls-disable
Address[0] : 127.0.0.1
Status     : Down
Fetches    : 0
Open conns : 0
Avg rtt    : 0 msec
Pings sent : 0
Pings acked: 0 0%
Ignored    : 0 0%
Histogram of pings acked:
Last failed connect() at: 15/Mar/2024:05:27:58 +0000
keep-alive ratio: 0%

src/CachePeer.cc

…cache#1797) Detected by Coverity. CID 1529581: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE).

…-cache#1802) Detected by Coverity. CID 1529599: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE).

…-cache#1803) Detected by Coverity. CID 1529569: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE).

Use nproc to detect the actual number of CPU cores available instead of going by an old "2 cores" value from GitHub Actions runners docs.

…#1801) Detected by Coverity. CID 1529543 and CID 1529594: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE).

…d-cache#1764) ... as well as response_delay_pool and send_hit directives. auto check = clientAclChecklistCreate(...); // sets check->reply check->reply = reply; // self-assignment does nothing HTTPMSGLOCK(check->reply); // an unwanted/extra lock When ACLFilledChecklist::reply is already set to X, resetting it to X should not change HttpReply lock count, but some manual locking code did not check that "already set" precondition and over-locked reply objects set to ClientHttpRequest::al::reply by clientAclChecklistFill(). Current known leaks were probably introduced in 2021 commit e227da8 that started supplying HttpReply to ACLChecklist in clientAclChecklistFill(). The added code locked the reply object correctly, but it was incompatible with unconditional manual locks in three existing indirect clientAclChecklistFill() callers (calling clientAclChecklistCreate()). This change removes all known similar leaks and improves ACLFilledChecklist API to prevent future similar leaks.

…-cache#1807) Currently, the insert() method calls implicitly convert raw "this" pointers to RefCount pointers. That self-registration code raises red flags and may eventually be refactored. If it is refactored, insert() calls are likely to start using RefCount pointers as parameters. This change allows those future calls to avoid RefCount pointer copies. This change does not affect current calls performance. This change was triggered by Coverity CID 1554665: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE). However, the insert() method is still calling RefCount copy assignment operator rather than its (cheaper) move assignment operator. Safely removing that overhead is only possible by reintroducing future parameter copying overhead described above _and_ using an explicit std::move() call that we are trying to avoid in general code. It is arguably not worth it.

…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

Add "compiler" dimension to build-tests matrix. Do not test layer-04-noauth-everything to partially compensate for the new matrix dimension repeating other layer tests 3 times. Change build-tests artifacts name pattern to include compiler dimension to avoid name clashes that would result in logs being overwritten and lost. Also use target environment instead of runner environment in anticipation of adding container-based tests where the two differ.

For example, ACLChecklist::resumeNonBlockingCheck() missed a markFinished() call when prepNonBlocking() returned false after a reconfiguration. Missing markFinished() calls result in the last evaluated ACL name being unset in nonBlockingCheck() callbacks.

src/fs/rock/RockRebuild.cc:356:17: error: variable length arrays in C++ are a Clang extension [-Werror,-Wvla-cxx-extension] char hdrBuf[SwapDir::HeaderSize]; note: initializer of 'HeaderSize' is unknown

…d-cache#1818) The three ACLs were documented as matching any username when configured with a parameter spelled "REQUIRED". Neither actually treated that parameter specially -- all interpreted it as an ordinary regex. This dangerous documentation bug was introduced in 2000 commit 145cf92 that added ident_regex and proxy_auth_regex support. It was then duplicated in 2003 commit abb929f that added ext_user_regex support. This minimal documentation fix does not imply that these ACLs should not treat REQUIRED values specially. Enabling such special treatment requires significant code changes, especially if we want to do that well and without duplicating the corresponding code.

Do not hand-roll tests for exception-throwing code

Resolve a long-standing TODO and stop hand-rolling HTTP range unit tests

Remove struct addrinfo as we do.

…-cache#1824)

…d-cache#1820) WARNING: markAsTunneled ACL is used in context without an HTTP request. Assuming mismatch. Our annotate_client and annotate_transaction ACLs are documented as "always matching", and some existing Squid configurations rely on that invariant. Both ACLs did not match when they lacked access to the current transaction information because their requiresRequest() method returned true; annotate_client ACL also did not match after the client-to-Squid connection was gone and when the transaction was not associated with a client-to-Squid connection. This change makes ACL code conform to documentation. Squid still warns the admin if an ACL cannot annotate. Such warnings may indicate s Squid bug or misconfiguration, but mismatching in those cases causes more harm because it makes it impossible for the admin to rely on the primary matching invariant. "One reliable invariant plus one unreliable side effect" is the lesser evil than "two unreliable effects". # the following must deny even if it cannot annotate http_access deny markAsDenied # the following might not log denied traffic (with a prior warning) access_log syslog:daemon.err markedAsDenied Also fixed annotate_client to annotate the current transaction even after ConnStateData destruction. Such annotations may happen when, for example, Squid continues a large download after the HTTP client is gone.

Removing another addrinfo as we do.

Ident protocol (RFC 931 obsoleted by RFC 1413) has been considered seriously insecure and broken since at least 2009 when SANS issued an update recommending its removal from all networks. Squid Ident implementation suffered from assertions (since 2021 commit e227da8) and memory leaks (since 2015 commit fbbea66). Ident implementation design increased ACLChecklist::goAsync() design complexity. An external ACL helper can be written to perform Ident transactions. Configurations using ident/ident_regex ACLs, %ui logformat codes, %IDENT external_acl_type format code, or ident_lookup_access/ident_timeout directives are now rejected, leading to fatal startup failures: FATAL: Invalid ACL type 'ident_regex' FATAL: Invalid ACL type 'ident' ERROR: configuration failure: Unsupported %code: '%ui' ERROR: configuration failure: Unsupported %code: '%IDENT' squid.conf(6): unrecognized: 'ident_lookup_access' squid.conf(7): unrecognized: 'ident_timeout' To avoid inconveniencing admins that do _not_ use Ident features, access logs with "common" and "combined" logformats now always receive a dash in the position of what used to be a %ui record field. Co-authored-by: Amos Jeffries <squid3@treenet.co.nz>

TrieNode::add() incorrectly computed an offset of an internal data structure, resulting in out-of-bounds memory accesses that could cause corruption or crashes. This bug was discovered and detailed by Joshua Rogers at https://megamansec.github.io/Squid-Security-Audit/esi-underflow.html where it was filed as "Buffer Underflow in ESI".

In triage, it is often useful to know that ErrorState was created _when_ it was created rather than waiting for errorAppendEntry() or errorSend() debugging, especially if those error handling stages are not reached. Also report HTTP status code of the error response (when available).

…#1832) All obsolete directives except for log_access and log_icap do not result in Squid instance death today. Treating those two directives specially is not necessary but does complicate reconfiguration improvements a bit. Future refactoring might add support for treating (some or all) obsolete directives as fatal configuration errors, but we can simplify for now.

There is no need for dynamic allocation risks and performance overheads in these simple "immediate fastCheck()" cases. This change converts all dynamically allocated checklists that use only fastCheck() calls except one: Security::PeerConnector::initialize() has to allocate its checklist dynamically to pass it to SSL_set_ex_data().

Fix a bug with mkrelease.sh which would incorrectly abort unless the minor release is only a single digit number

When Squid does not recognize a directive name, it reports the problem and proceeds to parse the remaining configuration lines. When parsing is complete, Squid quits if it has encountered such directives. This change preserves this overall behavior while fixing related reporting problems. Fixed problems depend on whether Squid discovered an unrecognized directive at startup or during reconfiguration. Startup configuration case was missing both an ERROR classification and a FATAL message, resulting in a somewhat mysterious death: Processing Configuration File: squid.conf (depth 0) squid.conf(7): unrecognized: 'http_accesses' Starting Authentication on port [::]:4443 Disabling Authentication on port [::]:4443 (interception enabled) If other non-fatal ERRORs were present, that startup death reporting became more problematic because admins would naturally (but incorrectly) assume that Squid died due to those irrelevant ERRORs: Processing Configuration File: squid.conf (depth 0) squid.conf(7): unrecognized: 'http_accesses' ERROR: Unsupported TLS option SINGLE_DH_USE Reconfiguration case was worse because Squid clearly attributed failure to the wrong (innocent and default) directive that was not even present in the configuration file (among other problems like saying "null"): Reconfiguring Squid Cache (version 7.0.0-VCS)... squid.conf(7): unrecognized: 'http_accesses' ERROR: Unsupported TLS option SINGLE_DH_USE FATAL: Bungled (null) line 3: sslproxy_cert_sign signTrusted all Squid Cache (Version 7.0.0-VCS): Terminated abnormally. Squid now reports unrecognized directives using the standardized ERROR prefix and prints correct FATAL message in both cases: Processing Configuration File: squid.conf (depth 0) ERROR: unrecognized directive near 'http_accesses' directive location: squid.conf(7) FATAL: reconfiguration failure: Found 1 unrecognized directive(s) exception location: cache_cf.cc(610) Parse advice: Run 'squid -k parse' and check for ERRORs. Squid Cache (Version 7.0.0-VCS): Terminated abnormally. Also reduced the number of nested try/catch blocks, simplifying configuration code. However, this change is just a small step forward. More fatal (re)configuration error handling improvements are needed, including removing unwanted and noisy fatal() side effects and possibly recognizing other kinds of errors that should not immediately terminate configuration parsing.

…he#1844) ERROR: Squid BUG: cannot aggregate mgr:userhash: check failed: cmd->profile != nullptr exception location: cache_manager.cc(102) createRequestedAction Since 2010 commit 7de94c8, only worker processes made RegisterAction() calls associated with the affected cache manager reports. Coordinator process needs these registrations as well because it uses Mgr::Action code while iterating report-generating kids[^1]. When fixed Coordinator asks a disker process for an Action report, that process must recognize the action as well (even when disker has no information to supply) to avoid triggering similar Action lookup BUGs. [^1]: Coordinator mostly needs Mgr::Action for stats aggregation, but even non-aggregating actions (like the fixed three) need registered Mgr::ActionProfile objects for Coordinator to know whether to aggregate.

…mp_peers

kinkie added 2 commits March 13, 2024 15:44

Use PackableStream for dump_peers

71ae5e0

Fix formatting

57d511e

This comment was marked as resolved.

Sign in to view

rousskov requested changes Mar 13, 2024

View reviewed changes

rousskov added the S-waiting-for-author author action is expected (and usually required) label Mar 13, 2024

kinkie added 4 commits March 14, 2024 08:00

Source maintenance

8441c91

dump_peer_options -> CachePeer::dumpOptions

b411cee

neighborTypeStr -> CachePeer::typeString

5a46e40

dump_peers internals -> CachePeer::reportStatistics

cd395c9

This comment was marked as outdated.

Sign in to view

Source maintenance

0533486

This comment was marked as resolved.

Sign in to view

kinkie and others added 10 commits March 14, 2024 14:26

fix typo

d99bc9e

Move newline at end of CachePeer::dumpOptions to callers

d0e4b1b

move opening newline in reportStatistics()

321a664

Reposition newline

40d7295

Apply suggestions from code review

aec3b06

Co-authored-by: Alex Rousskov <rousskov@measurement-factory.com>

fix const

f1e5848

Do not consider current configuration to output stats

50959d5

Reorder CachePeer::typeString position

5486566

Steamline output

cc0587d

Move away from all caps

58759f6

kinkie requested review from rousskov and yadij March 15, 2024 05:34

kinkie added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Mar 15, 2024

Simplified htcp options output

0513961

rousskov requested changes Mar 15, 2024

View reviewed changes

kinkie and others added 30 commits June 22, 2024 23:34

Optimization: clientBeginRequest() call copied streamData ptr (squid-…

bb7b7a7

…cache#1797) Detected by Coverity. CID 1529581: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE).

Optimization: Fs::Ufs::RebuildState constructor copied dir ptr (squid…

9f52635

…-cache#1802) Detected by Coverity. CID 1529599: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE).

Optimization: esiChoose constructor call copied parent pointer (squid…

0ee8e5a

…-cache#1803) Detected by Coverity. CID 1529569: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE).

CI: better parallelism for functionality-test build (squid-cache#1805)

7c91d7c

Use nproc to detect the actual number of CPU cores available instead of going by an old "2 cores" value from GitHub Actions runners docs.

Optimization: clientStreamInit() copied ClientStreamData (squid-cache…

a9a75f9

…#1801) Detected by Coverity. CID 1529543 and CID 1529594: Unnecessary object copies can affect performance (COPY_INSTEAD_OF_MOVE).

CI: Build layers independently to speed up GitHub Actions tests (squi…

f409314

…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

Fix build with clang v18 [-Wvla-cxx-extension] (squid-cache#1813)

6366db1

src/fs/rock/RockRebuild.cc:356:17: error: variable length arrays in C++ are a Clang extension [-Werror,-Wvla-cxx-extension] char hdrBuf[SwapDir::HeaderSize]; note: initializer of 'HeaderSize' is unknown

testRandomUuid: use cppunit exception tests (squid-cache#1814)

5c0e9cb

Do not hand-roll tests for exception-throwing code

testCacheManager: use cppunit exception tests (squid-cache#1811)

838fe55

Do not hand-roll tests for exception-throwing code

testHttpRange: use cppunit framework (squid-cache#1816)

586deea

Resolve a long-standing TODO and stop hand-rolling HTTP range unit tests

MinGW: fix error: cannot convert 'size_t*' to 'int*' (squid-cache#1822)

a442dd2

Remove struct addrinfo as we do.

MinGW: Fix error: cannot convert 'const int*' to 'const char*' (squid…

3aa043e

…-cache#1824)

MinGW: Fix error: cannot convert 'size_t*' to 'int*' (squid-cache#1825)

0250a81

Removing another addrinfo as we do.

Maintenance: Remove debug_log wrapper (squid-cache#1833)

1d7b5e1

mkrelease: allow two digits for minor release numbers (squid-cache#1837)

10045b2

Fix a bug with mkrelease.sh which would incorrectly abort unless the minor release is only a single digit number

fix merge issues

a869ae2

Merge remote-tracking branch 'upstream/master' into packablestream-du…

449f63c

…mp_peers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit valid YAML for cachemgr:server_list #1735

Emit valid YAML for cachemgr:server_list #1735

kinkie commented Mar 13, 2024 •

edited

Loading

kinkie commented Mar 13, 2024

This comment was marked as resolved.

rousskov left a comment

This comment was marked as outdated.

This comment was marked as resolved.

kinkie commented Mar 15, 2024

Emit valid YAML for cachemgr:server_list #1735

Are you sure you want to change the base?

Emit valid YAML for cachemgr:server_list #1735

Conversation

kinkie commented Mar 13, 2024 • edited Loading

kinkie commented Mar 13, 2024

This comment was marked as resolved.

rousskov left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as resolved.

kinkie commented Mar 15, 2024

kinkie commented Mar 13, 2024 •

edited

Loading