-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Problem
Our analysis of zstd ends up with unexpectedly many unknown pointers due to the following sequence of coincidences.
Thread pool and its queue
In zstd's POOL_create_advanced, the thread pool and its queue are allocated on the heap as follows (simplified):
ctx = (POOL_ctx*)ZSTD_customCalloc(sizeof(POOL_ctx), customMem);
ctx->queueSize = queueSize + 1;
ctx->queue = (POOL_job*)ZSTD_customMalloc(ctx->queueSize * sizeof(POOL_job), customMem);
ctx->queueHead = 0;
// ...Here ctx and both alloc variables are in the local state. Thanks to #686 the alloc variables escape into global state correctly.
When jobs are added to the queue, their function pointers are nicely joined into the queue's alloc variable, which precisely contains known function pointers and no NULL or unknown pointer.
However later in the thread pool threads, they retrieve jobs from the queue using ctx->queue[ctx->queueHead], i.e. *(ctx->queue + ctx->queueHead) after CIL normalization. Confusingly, the function pointer in the returned job struct contains an additional unknown pointer, despite the queue's alloc variable not containing one.
Extensive invalidation
Turns out this is because ctx's alloc variable also has a NULL pointer possible for the queue field. When pointer arithmetic is applied (and queueHead is no longer constant either), the known alloc pointer is nicely kept, but the NULL pointer is replaced by an unknown pointer using this code:
Lines 208 to 214 in 3137936
| let default = function | |
| | Addr.NullPtr when GobOption.exists (BI.equal BI.zero) (ID.to_int n) -> Addr.NullPtr | |
| | _ -> Addr.UnknownPtr | |
| in | |
| match Addr.to_var_offset addr with | |
| | Some (x, o) -> Addr.from_var_offset (x, addToOffset n (Some x.vtype) o) | |
| | None -> default addr |
Analyzing zstd, the unknown function pointer is absolutely disastrous effect:
- An unknown function pointer call is analyzed by
specialto invalidate its arguments. - In zstd, one of the thread pool jobs is passed an argument, which inside it contains a pointer to the same thread pool
ctx. - The resulting invalidation is extensive and invalidates:
ctxalloc variable (so an unknown pointer is added toqueue's address set directly as well),queuealloc variable (soNULLand unknown pointers are added to the job's function pointer and argument sets),- all the argument structs for all the jobs (holding all of the relevant data) are totally invalidated.
Culprit
The culprit of all of this is the NULL pointer in the queue field. If we didn't have that there, none of this invalidation would have happened!
Tracking this down was a two-day job since the cyclic pointing structure means that everything ends up being invalidated and it's impossible to tell from the final results, where this imprecision originated from. Moreover, it's practically impossible to trace Goblint when the problematic analysis takes over 40 minutes and produces way too much tracing output (and not to mention being slowed down by all the I/O).
Solution
Ideally, Goblint should know that queue cannot be a NULL pointer, avoiding all of the above. The source of the NULL pointer in the first place is that ctx is allocated using calloc. But clearly that NULL pointer is overwritten before ctx is seen by any other code, so it should be avoidable.
Although simple on the surface, to be able to strongly update the queue field requires many big pieces to work together:
- Uniqueness analysis for Juliet suite std_thread wrappers #186 is needed to know that we have a uniquely allocated blob, otherwise strong updates wouldn't be sound.
ValueDomain.update_offsetmust use that uniqueness information to replace the weak (joining) update with a strong (replacing) one here:analyzer/src/cdomains/valueDomain.ml
Lines 900 to 904 in 3137936
| `Blob (x,s,orig), _ -> begin let l', o' = shift_one_over l o in let x = zero_init_calloced_memory orig x t in mu (`Blob (join x (do_update_offset ask x offs value exp l' o' v t), s, orig)) - Even that would not be enough, because
calloccreates an abstract array of blobs:Line 2261 in 3137936
set_many ~ctx (Analyses.ask_of_ctx ctx) gs st [(add_null (AD.from_var heap_var), TVoid [], `Array (CArrays.make (IdxDom.of_int (Cilfacade.ptrdiff_ikind ()) BI.one) (`Blob (VD.bot (), eval_int (Analyses.ask_of_ctx ctx) gs st size, false))));
Therefore that array update also needs to be strong, not weak. In zstd's case (and very commonly),callocis used with a count of 1, so we get an array of length 1. Hence the array domain(s) should be specialized for the length 1 case to just do strong updates instead.
(After Fixes/calloc #145 there's probably a bug in thatcalloccode because it always creates an array of length 1, instead of using the count from thecallocargument.) - Even with all of the above, I'm not sure if we'd be precise enough, because in zstd thread pools may be created when Goblint is in multithreaded mode, so it might even be necessary to incorporate Primitive malloc freshness analysis #690 into the uniqueness and privatization to always have a local strongly update-able copy of the blob.