Description
Current situation
Currently, we have multiple format implementations that require some kind of lazily-built cache.
For example, ProtoBuf
maintains a mapping from the serial descriptor id to the corresponding proto id, Json
maintains the cache of all types' unique serialName
s and the corresponding alternative names. Other caches might be even more heavy-weight, for example, Json
may want to build a trie for serial names in order to support zero-allocation key decoding.
Per-format cache has numerous downsides:
- It forces each format to re-invent its own thread-safe data structure and cache convention
- It is prone to memory leaks and unbound memory usage in classloading-heavy scenarios: cache should reference serial descriptors in a weak manner. Addressing this is a non-trivial implementation burden.
- Managing the concurrency of such cache is a non-trivial task: format instances may become the contention point, especially on the application startup
- Memory-unfriendly: in our practice, an application may have dozens of format instances that differ in various minor details (pretty printing, polymorphism etc.).Eventually, each of them will effectively have a copy of the cached value.
- The most important downside, that likely to outweigh the previous ones: computational-heavy caching is an API-unfriendly performance timebomb: users often consider format's allocation as something lightweight and do not bother to pre-allocate it (to the extent that we have a dedicating IDEA's inspection for that). In such scenarios, the supposedly-cached value will be re-evaluated each time and potentially can consume more time than the actual serialization process (see Intrinsics for serializer() function #1348 and Implemented serializers caching for lookup #2015).
Proposed solution
Taking into account all the known limitations, I propose a format-agnostic concept -- SeralDescriptorLocal
, a ThreadLocal
counterpart (or, with some restrictions, ClassValue
one) that shifts the caching responsibility to the core library level and delegates it to the SerialDescriptor
the same way ThreadLocal
delegates it to Thread
instance.
The very preliminary API shape might have the following form:
// Format code, in companion
private val myCache = SerialDescriptorLocal { descriptor -> computeFormatSpecificValue(descriptor) }
// Format code, decode*/encode* functions
val cachedValue = myCache.get(currentSerialDescriptor)
...
// Core library, SerialDescriptor implementation
fun getOrCompute(key: SerialDescriptorLocal<T>): T {
... implementation shared between all SDs ...
}
Things that we have to figure out:
- How to expose it in the
SerialDescriptor
interface - Whether we want to support a scenario where
SerialDescriptor
can opt-out from such behaviour and whether it is allowed to throw an exception - Whether we can provide support for non-static (e.g. format-dependent)
SerialDescriptorLocal
instances (e.g. case-insensitive trie) with the help of structural equality ofSDL
- Whether it is possible to implement with all the restrictions applied (thread-safety, class unloading friendliness) while keeping the API lightweight