Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance @memo with configurability and effects of @benmanes/caffeine #1553

Closed
jurgenvinju opened this issue Dec 15, 2021 · 7 comments
Closed

Comments

@jurgenvinju
Copy link
Member

jurgenvinju commented Dec 15, 2021

Is your feature request related to a problem? Please describe.

  • @memo allows caching side-effects to otherwise pure Rascal functions. It helps to keep the code clean whilst also providing a mode of efficiency normally associated with procedural and object-oriented programming.
  • but to actually get efficiency benefits, there are fine lines to walk that trade memory for CPU for example. It can depend on all kinds of circumstances which kind of caching strategy is effective.
  • Due to the many features of the JVM garbage collector, there are many ways to create caching strategies. benmanes/caffeine offers a lot of this variation off-the-shelf and with added features such as time-dependent cache clearing.

Describe the solution you'd like

  • Propose to (via @DavyLandman a few years ago) extend the @memo tag with a mini abstract configuration DSL that can be used to build caffeine caches with different properties.
  • The caches would still be for the return value of the function, depending on its arguments
  • It would be great to add configurability for arguments to select and arguments to ignore for the cache key
  • It would be great to built-in timestamp functionality for loc parameters; i.e. that caches are cleared when timestamps of files are older than the timestamp stored in the cache
  • It would be great to factor out most of the code needed to implement this from the interpreter/compiler run-time such that it can be reused between the two

Describe alternatives you've considered

  • people write all kinds of Rascal code to cache and optimize their code now, including caching on disk. It's a hairy domain.
  • using @memo now actually often leads to dissappointment, we either cache too many values or we cache too few, or what we cache uses too much memory to be effective. we really need different strategies depending on the application
  • we do not have alternatives to play around with weak or soft keys or values at all, everything is what @memo currently has to offer.

Impact

  • the compiler would benefit itself from this feature, since it needs to cache the results of modular compilation.
@jurgenvinju
Copy link
Member Author

It also thinkable to compile the configuration DSL directly to Java code and only support this for the compile Rascal code.

@jurgenvinju
Copy link
Member Author

jurgenvinju commented Dec 15, 2021

Yes that one! but extended with everything (or almost everything) that caffeine can do, plus the file timestamp things.

  • {strong,soft,weak,phantom?} references x {strong,soft,weak,phantom?} keys x parameter selections
  • automatic asynchronous pre-loading
  • size-based eviction
  • time-based eviction
  • automatic asyncronous serialization to a file location, and automatic recovery from those

Plus I'd like to see an interface for debugging/optimizing the choice of these features using some kind of statistical reports.

@jurgenvinju
Copy link
Member Author

And agreed, maybe @PaulKlint can already take a lot of benefit from what is in https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/util/Memo.rsc now for the compiler.

@jurgenvinju
Copy link
Member Author

asynchronous pre-loading would be impossible for the interpreter to do.

@DavyLandman
Copy link
Member

Okay, that's quite a big feature set, and requires more design of the @memo tag. We now have:

  • access based eviction (so time, but only on access, not on store)
  • entry-based eviction
  • softreferences to avoid OutOfMemory.

Some of the features you mentioned are easy to implement. But just to be clear, we are not using Caffeine in this case, the memo tag has some specific features that don't map to Caffeine.

@jurgenvinju
Copy link
Member Author

ok yes; understood. I'm expecting an "extended subset" of the Caffeine features and I'm not sure where the boundaries are.

In particular the asynchronous backroom serialization to disk; I'm not sure how much of Caffeine we could reuse for that, because of course it has to be integrated with our loc data-type and the URIResolverRegistry.

Asynchrounous pre-loading is probably based on a lambda or an interface which we can implement and link to a compiler or interpreted Rascal function. However, for interpretation I foresee lots of issues :-) of course.

timestamps for loc entries are definetely not in scope of Caffeine, but it seems a natural thing to add to @memo, unless we think a different tag would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants