-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finer-grained InMemoryCache result caching. #5617
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Although functional decomposition is generally a good idea, it is neither required nor beneficial in the case of StoreReader#executeField, since we do not need to cache the value of every individual field (as we do for executeSelectionSet and executeSubSelectedArray), and executeField was a private method with only one caller (executeSelectionSet), so we can inline it without duplicating its implementation in multiple places. By moving the executeField implementation into executeSelectionSet, we can avoid generating an ExecResult object for every field value that we read from the cache, and we have direct access to the StoreObject whose fields we are reading, which will prove convenient in upcoming commits. Given how often executeField was called, even a tiny reduction in the overhead of calling it as a separate method (and handling its results) adds up in a big way.
…entObject. Exposing the entire parent StoreObject to custom read and merge functions was risky, even if we made it read-only, because it permitted unobserved access to sibling fields of the parentObject, which would have prevented effective result caching for read functions that compute their results using sibling fields as input. Passing in a function that takes a field name and returns the value of that field (namely, getFieldValue) enables the same sibling field use cases, without exposing unrelated StoreObject fields. In the future, we can use this function to register dependencies on the fields that were requested, which would not have been possible with just an object.
Until now, the result caching system registered dependencies on specific entity objects by ID, which meant that changes to any of the fields of an entity in the cache would invalidate every query result that previously involved that entity, even if some of those queries did not use any of the fields that changed. While this system was logically correct, it resulted in extensive and unnecessary recomputation of cached query results. Perhaps most dramatically, any changes to the fields of the ROOT_QUERY object would necessitate the recomputation of every query that involved any root query fields, because those queries depended on the ROOT_QUERY ID rather than the specific fields they consumed. With this commit, the StoreReader reads from the EntityCache using a new method called getFieldValue, which registers dependencies on the combination of the entity ID and the name of the requested field, so that cached query results will remain valid as long as the specific fields they used have not changed. Furthermore, the EntityCache#set method has been replaced by a method called merge, which intelligently updates the normalized data using a set of new fields, automatically invalidating dependencies for any fields whose values are modified by the merge. For backwards compatibility, if a consumer chooses to continue using the EntityCache#get(id) method instead of getFieldValue(id, fieldName), ID-based dependencies will be registered, as before.
This was referenced Nov 25, 2019
Merged
hwillson
approved these changes
Nov 25, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome @benjamn! Great idea with getFieldValue
- that definitely seems safer.
benjamn
added a commit
that referenced
this pull request
Dec 2, 2019
In PR #5617, we shifted from using just IDs for caching results read from the EntityStore to using IDs plus field names, which made the result caching system significantly more precise and less likely to trigger unnecessary rerenders. However, the field names we used in that PR were storeFieldName strings, instead of the actual field names you'd find in your GraphQL schema. Within the InMemoryCache implementation, a storeFieldName is the full string produced by Policies#getStoreFieldName, which must begin with the original field.name.value, but may include an additional suffix that captures field identity based on arguments (and also, but less frequently, directives). For the purposes of result caching, it's important to depend on the entity ID plus the actual field name (field.name.value), rather than the ID plus the storeFieldName, because there can be many different storeFieldName strings for a single field.name.value (and we don't want to waste memory on dependency tracking), and because it's safer (more conservative) to dirty all fields with a particular field.name.value when any field with a matching storeFieldName is updated or evicted, since field values with the same field.name.value often overlap in subtle ways (for example, multiple pages of a paginated list), so it seems dangerous to encourage deleting some but not all of them. Perhaps more importantly, application developers should never have to worry about the concept of a storeFieldName, since it is very much an implementation detail of the InMemoryCache, and the precise format of storeFieldName strings is subject to change.
benjamn
added a commit
that referenced
this pull request
Dec 2, 2019
The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in __tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.
benjamn
added a commit
that referenced
this pull request
Dec 3, 2019
In PR #5617, we shifted from using just IDs for caching results read from the EntityStore to using IDs plus field names, which made the result caching system significantly more precise and less likely to trigger unnecessary rerenders. However, the field names we used in that PR were storeFieldName strings, instead of the actual field names you'd find in your GraphQL schema. Within the InMemoryCache implementation, a storeFieldName is the full string produced by Policies#getStoreFieldName, which must begin with the original field.name.value, but may include an additional suffix that captures field identity based on arguments (and also, but less frequently, directives). For the purposes of result caching, it's important to depend on the entity ID plus the actual field name (field.name.value), rather than the ID plus the storeFieldName, because there can be many different storeFieldName strings for a single field.name.value (and we don't want to waste memory on dependency tracking), and because it's safer (more conservative) to dirty all fields with a particular field.name.value when any field with a matching storeFieldName is updated or evicted, since field values with the same field.name.value often overlap in subtle ways (for example, multiple pages of a paginated list), so it seems dangerous to encourage deleting some but not all of them. Perhaps more importantly, application developers should never have to worry about the concept of a storeFieldName, since it is very much an implementation detail of the InMemoryCache, and the precise format of storeFieldName strings is subject to change.
benjamn
added a commit
that referenced
this pull request
Dec 3, 2019
The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in __tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.
benjamn
added a commit
that referenced
this pull request
Dec 3, 2019
The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in src/cache/inmemory/__tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.
benjamn
added a commit
that referenced
this pull request
Dec 3, 2019
The getFieldValue(fieldName) helper function was introduced in #5617 for reading fields from the current StoreObject during read functions. This commit adds a second parameter to getFieldValue, foreignRef, which is an optional Reference. When foreignRef is provided, getFieldValue will read the specified field from the StoreObject identified by the foreignRef, instead of reading from the current StoreObject. In either case, getFieldValue reads an existing value from the cache, without invoking any read functions, so you cannot use getFieldValue to set up expensive (and potentially cyclic) chains of read functions. With this new ability to read fields from arbitrary Reference objects, read functions can explore the entire reachable cache, without having to call cache.readQuery. The beauty of this system is that every field read operation requires a function call (getFieldValue), which allows the result caching system to know which fields were read from which entities, so future changes to those fields can properly invalidate any cached results that involved the original read function.
benjamn
added a commit
that referenced
this pull request
Dec 4, 2019
The getFieldValue(fieldName) helper function was introduced in #5617 for reading fields from the current StoreObject during read functions. This commit adds a second parameter to getFieldValue, foreignRef, which is an optional Reference. When foreignRef is provided, getFieldValue will read the specified field from the StoreObject identified by the foreignRef, instead of reading from the current StoreObject. In either case, getFieldValue reads an existing value from the cache, without invoking any read functions, so you cannot use getFieldValue to set up expensive (and potentially cyclic) chains of read functions. With this new ability to read fields from arbitrary Reference objects, read functions can explore the entire reachable cache, without having to call cache.readQuery. The beauty of this system is that every field read operation requires a function call (getFieldValue), which allows the result caching system to know which fields were read from which entities, so future changes to those fields can properly invalidate any cached results that involved the original read function.
benjamn
added a commit
that referenced
this pull request
Jan 23, 2020
One of the most important internal changes in Apollo Client 3.0 is that the result caching system now tracks its dependencies at the field level, rather than at the level of whole entity objects: #5617 As proof that this transformation is complete, we can finally remove the ability to read entire entity objects from the EntityStore by ID, so that the only way to read from the store is by providing both an ID and the name of a field. The logic I improved in #5772 is now even simpler, without the need for overloading multiple EntityStore#get signatures. In the process, I noticed that the EntityStore#has(ID) method was accidentally depending on any changes to the contents of the identified entity, even though store.has only cares about the existence of the ID in the store. This was only a problem if we called store.has during a larger read operation, which currently only happens when InMemoryCache#read is called with options.rootId. The symptoms of this oversight went unnoticed because the caching of readFragment results was just a little more sensitive to changes to the given entity. The results themselves were still logically correct, but their caching was not as effective as it could be. That's the beauty and the curse of caching: you might not notice when it isn't working, because all your tests still pass, and nobody complains that their application is broken. Fortunately, we can model this dependency with an ID plus a fake field name called "__exists", which is only dirtied when the ID is newly added to or removed from the store.
benjamn
added a commit
that referenced
this pull request
Jan 23, 2020
benjamn
added a commit
that referenced
this pull request
Jun 8, 2020
When an object is evicted from the cache, common intuition says that any dangling references to that object should be proactively removed from elsewhere in the cache. Thankfully, this intuition is misguided, because a much simpler and more efficient approach to handling dangling references is already possible, without requiring any new cache features. As the tests added in this commit demonstrate, the cleanup of dangling references can be postponed until the next time the affected fields are read from the cache, simply by defining a custom read function that performs any necessary cleanup, in whatever way makes sense for the logic of the particular field. This lazy approach is vastly more efficient than scanning the entire cache for dangling references would be, because it kicks in only for fields you actually care about, the next time you ask for their values. For example, you might have a list of references that should be filtered to exclude the dangling ones, or you might want the dangling references to be nullified in place (without filtering), or you might have a single reference that should default to something else if it becomes invalid. All of these options are matters of application-level logic, so the cache cannot choose the right default strategy in all cases. By default, references are left untouched unless you define custom logic to do something else. It may actually be unwise/destructive to remove dangling references from the cache, because the evicted data could always be written back into the cache at some later time, restoring the validity of the references. Since eviction is not necessarily final, dangling references represent useful information that should be preserved by default after eviction, but filtered out just in time to keep them from causing problems. Even if you ultimately decide to prune the dangling references, proactively finding and removing them is way more work than letting a read function handle them on-demand. This system works because the result caching system (#3394, #5617) tracks hierarchical field dependencies in a way that causes read functions to be reinvoked any time the field in question is affected by updates to the cache, even if the changes are nested many layers deep within the field. It also helps that custom read functions are consistently invoked for a given field any time that field is read from the cache, so you don't have to worry about dangling references leaking out by other means.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I encourage reading all the commit messages included here, but the final commit is the capstone:
Use ID+field dependencies for result caching, rather than just ID
Until now, the
InMemoryCache
result caching system (introduced by #3394) registered dependencies on specific entity objects by ID, which meant changes to any of the fields of an entity in the cache would invalidate every query result that previously involved that entity, even if not all of those queries actually used any of the fields that changed.While this system was logically correct, it resulted in unnecessary recomputation of cached query results. Perhaps most dramatically, any changes to the fields of the
ROOT_QUERY
object would necessitate the recomputation of every query that involved any root query fields, because those queries depended on theROOT_QUERY
ID rather than the specific fields they consumed.With this commit, the
StoreReader
reads from theEntityCache
using the new methodgetFieldValue(dataId: string, fieldName: string): StoreValue
, which registers dependencies on the combination of the entity ID and the name of the requested field, so cached query results will remain valid as long as the specific fields they used have not changed.Furthermore, the
EntityCache#set
method has been replaced by a method calledmerge
, which intelligently updates the normalized data to incorporate a set of new fields, automatically invalidating dependencies for any fields whose values are modified by the merge.For backwards compatibility, if a consumer chooses to continue using the
EntityCache#get(id)
method instead ofgetFieldValue(id, fieldName)
, ID-based dependencies will be registered (and later invalidated), exactly as before.