Author: | Arvid Norberg, arvid@libtorrent.org |
---|
In the 4th quarter of 2020 Mozilla Open Source Support Awards commissioned a security audit of libtorrent, to be performed by include security.
The full report from the audit can be found here.
This document discusses the issues raised by the report as well as describes the changes made to libtorrent in response to it. These changes were included in libtorrent version 1.2.12 and version 2.0.2.
Comments on this document are welcome through any of these means:
- email them to
arvid@libtorrent.org
- email to libtorrent mailing list
- an issue on github.
issues brought up in the report
- F1: Server-Side Request Forgery (SSRF)
- F2: Compile Options Can Remove Assert Security Validation
- F3: Confidential and Security Relevant Information Stored in Logs
- F4: Pseudo Random Number Generator Is Vulnerable to Prediction Attack
- F5: Potential Null Pointer Dereference Issues
- F6: Integer Overflow
- F7: Magnet URIs Allow IDNA Domain Names
- I1: Additional Documentation and Automation
- I2: Automated Fuzzer Generation
- I3: Type Confusion and Integer Overflow Improvements
For background, see OWASP definition of SSRF.
Running a tracker on the local network is an established use case for BitTorrent (here). Filtering all tracker requests to the local network is not feasible. Running a tracker on the loopback device would seem to only make sense for testing.
The SSRF issue is not limited to tracker URLs, but also applies to web seeds. A web seed can be embedded in a .torrent file as well as included in a magnet link.
The report says:
If user-controllable URLs must be requested then sanitizing them in a manner similar to the SafeCurl library is recommended (see the link in the reference section).
The SafeCurl library, as I understand is, sanitizes URLs based on include- and exclude lists of host names, IP addresses, ports, schemes.
Tracker URLs can be arbitrary URLs that libtorrent appends certain query string
parameters to (like &info_hash=
etc.). The path component of a tracker URL
is typically not relevant, and most trackers follow the convention of using
/announce
.
A web seed for a multi-file torrent cannot include any query string arguments and libtorrent will append the path to the file that's being requested. However, the response from the web seed can redirect to any arbitrary URL, including on the local network. A web seed for a single-file torrent can be any arbitrary URL.
Web seed HTTP requests will almost always be a range request (unless the file is so small to fit in one or a few pieces).
What heuristics and restrictions could libtorrent implement to mitigate attacks?
Both trackers and web seeds only use HTTP GET
request, i.e. no POST
for
example. This ought to protect certain APIs that mutate state.
The examples in the OWASP article are:
Cloud services such as AWS provide a REST interface on
http://169.254.169.254/
where important configuration and sometimes even
authentication keys can be extracted
The response from a REST API would have to be compatible with the BitTorrent tracker protocol, which is a bencoded structure with specific keys being mandatory (the protocol is defined here, with amendments here, here and here).
A tracker response that doesn't match this protocol will be ignored by libtorrent. The response will not be published and made available anywhere, including the logs. Therefore it's not likely there would be a way to extract data from a REST API via a tracker request.
NoSQL database such as MongoDB provide REST interfaces on HTTP ports. If the database is expected to only be available to internally, authentication may be disabled and the attacker can extract data
Since libtorrent doesn't make the response from a tracker request available to anybody, especially not if it's not a valid BitTorrent tracker response, it's not likely data can be extracted via such tracker URL. See previous section for details.
libtorrent can definitely hit a REST interface and may affect configuration changes in other software that's installed on the local machine. This is assuming that the software does not use any authentication other than checking the source IP being the localhost.
As mentioned earlier, extracting data from a REST API via a tracker URL is not likely to be possible.
It is established practice to include arbitrary URL query parameters in tracker URLs, and clients amend them with the query parameters required by the tracker protocol. This makes it difficult to sanitize the query string.
One way to mitigate hitting REST APIs on local host is to require that tracker
URLs, for local host specifically, use the request path /announce
. This is
the convention for bittorrent trackers.
Web seeds that resolve to a local network address are not allowed to have query string parameters.
This SSRF mitigation was implemented for trackers in #5303 and for web seeds in #5319.
Web Seeds that resolve to a global address (i.e. not loopback, local network or multicast address) are not allowed to redirect to a non-global IP. This mitigation was implemented in #5846, for libtorrent-2.0.3.
The attacker may be able to read files using <file://>
URIs
libtorrent only supports http
, https
and udp
protocol schemes, and
will reject any other tracker URL. Specifically, libtorrent does not support
the file://
URL scheme.
Additionally, #5346 implements checks for tracker URLs that include query string arguments that are supposed to be added by clients.
The comments have been addressed in #5308. The changes include:
- use
span<char>
to simplify updates of pointer + length - use
span<char const>
for (immutable) write buffers, to improve const correctness and avoid aconst_cast
- introduce additional sanity checks that no buffer lengths are < 0
- introduce additional check to ensure buffer lengths fit in unsigned 16 bit field (in the case where it's stored in one)
- generally reduce signed <-> unsigned casts
The secret keys for protocol encryption are not particularly sensitive, since it's primarily an obfuscation feature. However, I have never had to use these keys for debugging, so they don't have much value in the log anyway.
Addressed in #5299.
These are the places random_bytes()
, random()
and random_shuffle()
are used in libtorrent. The "crypto" column indicates whether the random number
is sensitive and must be hard to predict, i.e. have high entropy.
crypto | Use | Description |
---|---|---|
Yes | PCP nonce | generating a nonce for PCP (Port Control Protocol). The PCP RFC section 11.2 references RFC 4086 Randomness Requirements for Security for the nonce generation. This was fixed. |
Yes | DHT ed25519 keys | used for kademlia mutable put feature. These keys are sensitive an
should use an appropriate entropy source. This is not done as part of
normal libtorrent operations, it's a utility function a client using the mutable
PUT-feature can call. This functionality is exposed in the
This was fixed. |
Maybe | DHT write-token | The DHT maintains a secret 32 bit number which is updated every 5
minutes to a new random number. The secret from the last 5 minute period
is also remembered. In responses to This was changed to use cryptographic random numbers. |
Maybe | DHT transaction ID | Each DHT request that is sent to a node includes a 16 bit transaction ID that must be returned in the response. This is used to map responses to the correct request (required when making multiple requests to the same IP), but also to make it harder for a 3rd party to spoof the source IP and fake a response. Presumably the fact that there are only 65536 different transaction IDs would be a problem before someone guesses the random number. Additionally, a request is only valid for a few tens of seconds, which further mitigates spoofed responses. This has been left using pseudo random numbers. |
Maybe | uTP sequence numbers | When connecting a uTP socket, the initial sequence number is chosen at random. This has been left using pseudo random numbers. |
No | protocol encryption (obfuscation) | both key generation for DH handshake as well as random padding ahead of handshake. The protocol encryption feature is not intended to provide any authentication or confidentiality. |
No | i2p session-id | generation of the session ID, not key generation. All crypto, including key generation is done by the i2p daemon implementing the SAM bridge. |
No | DHT node-id | The node ID does not need to be hard to guess, just uniformly distributed. |
No | DHT node-id fingerprint | Used to identify announces to fake info-hashes. More info here. |
No | DHT peer storage | When returning peers from peer storage, in response to a DHT
get_peers request, we pick n of m random peers. |
No | peer-id | In bittorrent, each peer generates a random peer-id used in interactions with other peers as well as HTTP(S) trackers. The peer-id is not secret and does not need to be hard to guess. In fact, for each peer libtorrent connects to, it generates a different peer-id. Additionally, each torrent has a unique peer-id that's advertised to trackers. Trackers need a consistent peer-id for its book keeping. |
No | ip_voter | The ip_voter maintains a list of possible external IP addresses, based on how many peer interactions we've seen telling us that's our external IP as observed by them. Knowing our external IP is not critical, it's primarily used to generate our DHT node ID according to this. The ip_voter uses |
No | local service discovery | In order to ignore our own service discovery messages sent on a
multi-cast group, we include a "cookie". If we see our own cookie, we
ignore the message. The cookie is generated by random() . |
No | piece picker | The order pieces are picked in is rarest first. Pieces of the same
rarity are picked in random order, using random() . |
No | smart-ban | If a piece fails the hash check, we may not know which peer sent the corrupt data. The smart ban function will record the hashes of all blocks of the failed piece. Once the piece passes, it can compare the passing blocks against the failing one, identifying exactly which peer sent corrupt data. This is a property of how bittorrent checks data at the piece level, but downloads smaller parts (called "blocks") from potentially different peers. In earlier version of libtorrent, the block hash would use CRC32, and a secret salt to prevent trivial exploiting by malicious peers. This is no longer the case, smart-ban uses SHA-1 now, so there is no need for the salt. It was removed in #5295. |
No | peer-list pruning | When the peer list has too many peers in it, random low quality peers are pruned. |
No | peer-list duplicate peer | When receiving a connection from an IP we're already connected to, the connection to keep and which one to disconnect is based on the local and remote port numbers. If the ports are the same, one of the two connections are closed randomly. |
No | UPnP external port | When the external port of a mapping conflicts with an existing map, the port mapping is re-attempted with a random external port. |
No | ut_metadata re-request timeout | When a peer responds to a metadata request with "don't have", we delay randomly between 20 - 70 seconds before re-requesting. |
No | web seeds | Web seeds are shuffled, to attempt connecting to them in random order |
No | trackers | Trackers within the same tier are shuffled, to try them in random order (for load balancing) |
No | resume data peers | When saving resume data and we have more than 100 peers, once "high quality peers" have been saved, pick low quality peers at random to save. |
No | share mode seeds | In share mode, where libtorrent attempts to maximize its upload to download ratio, if we're connected to too many seeds, some random seeds are disconnected. |
No | share mode pick | In share mode, when more than one piece has the lowest availability, one of them is picked at random |
No | http_connection endpoints | After a successful hostname lookup, the endpoints are randomized to try them in an arbitrary order, for load balancing. |
No | super seeding piece picking | In Super seeding mode, the rarest piece is selected for upload. If there's a tie, a piece is chosen at random. |
No | UDP listen socket | When using a proxy, but not connecting peer via the proxy, the local UDP socket, used for uTP and DHT traffic will bind to the listen socket of the first configured listen interface. If there is no listen interface configured, a random port is chosen. |
No | bind outgoing uTP socket | When bind-outgoing-sockets is enabled, uTP sockets are bound to the listen interface matching the target IP. If there is no match, an interface is picked at random to bind the outgoing socket to. |
No | uTP send ID | uTP connections are assigned send ID, to allow multiple connections to the same IP. Similar to port number, but all uTP connections run over a single UDP socket. |
The following issues were addressed:
- the existing
random_bytes()
function was made to unconditionally produce pseudo random bytes. - increase amount of entropy to seed the pseudo random number generator.
- a new function
crypto_random_bytes()
was added which unconditionally use a strong entropy source. - If no specialized API is available for high-entropy random numbers is
available (like
libcrypto
or CryptoAPI on windows) random numbers are pulled from/dev/urandom
. - The PCP nonce was changed to use
crypto_random_bytes()
- The ed25519 key seed function was changed to use
crypto_random_bytes()
Addressed in #5298.
This was fundamentally caused by the boost.pool default allocator using new
(std::nothrow)
, rather than plain (throwing) new
. The code using the pool
added to the confusion by checking for a nullptr
return value, but further
up the call chain that check was not made. The fix was to remove the check for
nullptr
and replace the boost.pool allocator to throw std::bad_alloc
on
memory exhaustion.
Addressed in #5293.
This was a bug in the fuzzer itself, not in the production code (as far as I could find). The parse_int fuzzer used an uninitialized variable.
Addressed in #5292.
My understanding of this attack is that a tracker hostname could be crafted to look like a well known host, but in fact be a different host, by using look-alike unicode characters in the hostname.
For example, the well-known tracker http://bt1.archive.org:6969/announce
could be spoofed by using bt1.archive.org
(the e
at the end is really
U+ff45).
The issue of trusting trackers goes beyond tracker host names in magnet links. Normal .torrent files also contain tracker URLs, and they could also use misleading tracker host names. However, this highlights a more fundamental issue that libtorrent does not provide an API for clients to vet trackers before announcing to them. libtorrent provides an IP filter that will block announcing to trackers, but not the URLs or host names directly.
Having an ability to vet trackers before using them would also mitigate the F1: Server-Side Request Forgery (SSRF).
This issue also goes beyond trackers. Web seeds are also URLs embedded in .torrent files or magnet links which libtorrent will make requests to.
These are the changes I'm making to mitigate this issue:
- enable
validate_https_trackers
by default. #5314. The name of this setting is misleading. It does not only affect trackers, but also web seeds. - Support loading the system certificate store on windows, to authenticate trackers with, #5313.
- add an option to allow IDNA domain names, and disable it by default. This applies to both trackers and web seeds. #5316.
Addressed in:
No effort has been put into generating fuzzers with FuzzGen, but it's an intriguing project I hope to have time to put some effort towards in the future.
Addressed in: