Skip to content

Recycler API #1401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Nov 14, 2023
Merged

Recycler API #1401

merged 49 commits into from
Nov 14, 2023

Conversation

vy
Copy link
Member

@vy vy commented Mar 29, 2023

This work moves Recycler et al. from log4j-layout-template-json to log4j-api and replaces ThreadLocal usages with Recyclers wherever possible. (It is based on @jvz's draft in #1194.)

Why is this a good idea?

ThreadLocals

  • provide one way of caching things
  • has significant memory tax on threaded applications
  • can cause memory leaks in the presence of multiple class loaders

Recycler abstraction addresses these pain points by

  • abstracting away the caching mechanism – one can decide to disable it, use a queue with a limited capacity, switch back to old behavior by using a ThreadLocal-based implementation, or provide a custom one
  • enabling its configuration from a central place

See JSON Template Layout's documentation on it for details.

What are the change highlights?

  • Added Configuration#getRecyclerStrategy() and LoggingSystem.getRecyclerStrategy()
  • ThreadLocals in appenders and layouts are replaced with recyclers
  • Added ReusableLogEvent interface (implemented by MutableLogEvent and RingBufferLogEvent)
  • Adapted StringBuilderEncoder

What further needs to be done?

  • fix failing tests
  • write/update docs

jvz and others added 30 commits January 9, 2023 12:06
This adds executions during process-sources and process-test-sources to ensure that modified files are formatted properly.
This will allow for reuse of this API in places where ThreadLocal-recycled objects are currently used. This will be beneficial to runtimes where workloads don't correspond to threads such as virtual threads, coroutines, and reactive streams. This adds an optional dependency on JCTools to log4j-api for one of the QueueingRecycler implementations.
Signed-off-by: Matt Sicker <mattsicker@apache.org>
Also updates ThreadLocalRecyclerFactory to support nested calls to acquire() by using an ArrayDeque to track potentially more than one object.

Signed-off-by: Matt Sicker <mattsicker@apache.org>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
- Split cleaner concerns in RecyclerFactory: eager versus lazy cleaning. This makes the Recycler API work naturally with existing StringBuilder post-usage trimming.
- Uses a simpler method for checking for the presence of JCTools. Given that it's an optional (static) dependency, we can compile direct references to it in classes and see if our classes throw LinkageError-related exceptions at runtime when attempting to link to JCTools.
- Rename QueueSupplier to QueueFactory
- Extract a Queues util for easier access to different JCTools queue implementations when available
- Update implementations of AbstractStringLayout to use Recycler API
  - CsvLogEventLayout
  - CsvParameterLayout
  - GelfLayout
  - HtmlLayout
  - Log4j1SyslogLayout
  - Log4j1XmlLayout
  - PatternLayout
  - Rfc5424Layout
  - SyslogLayout

Signed-off-by: Matt Sicker <mattsicker@apache.org>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
…actories.java

Co-authored-by: Volkan Yazıcı <volkan@yazi.ci>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
Signed-off-by: Matt Sicker <mattsicker@apache.org>
This simplifies DefaultLogBuilder as reuse is now handled by the recycler. This also adds a place to store a default RecyclerFactory in LoggingSystem.

Signed-off-by: Matt Sicker <mattsicker@apache.org>
`LogBuilder` suppliers in the `Logger` class always return a no-op
builder if the log message level is higher than the configured level.
This way global filters are always skipped.

This patch always returns a functional builder if we detect the presence
of a global filter.
@vy
Copy link
Member Author

vy commented Apr 24, 2023

h3. Progress report

I am done with code changes. I will proceed with the documentation update. garbagefree.adoc needs to be refactored (contains either outdated or incorrect information) and adapted for recyclers.

@rgoers
Copy link
Member

rgoers commented May 7, 2023

I was actually looking for this today as I will be needing to add another ThreadLocal (as much as I really don't want to). But in looking at this PR again I do have some concerns about it.

  1. It uses a Factory to provide a "specification" (really just a String key) to determine what kind of Recycler to provide. That seems somewhat ugly to me. I'd prefer a Spec object that provides attributes for what the characteristics are.
  2. It seems every ThreadLocal usage will simply be replaced by a Recycler instance. Instead, I would prefer that there is a Recylcer registry (i.e. - a Map) that contains all the Recyclers. That allows us to reference a Recycler from anywhere without needing them to be injected or passed around.

@vy
Copy link
Member Author

vy commented May 15, 2023

Thanks for taking time to share your feedback @rgoers, much appreciated! 🙇

I was actually looking for this today as I will be needing to add another ThreadLocal (as much as I really don't want to). But in looking at this PR again I do have some concerns about it.

  1. It uses a Factory to provide a "specification" (really just a String key) to determine what kind of Recycler to provide. That seems somewhat ugly to me. I'd prefer a Spec object that provides attributes for what the characteristics are.

RecyclerFactories.ofSpec(String) static factory method is there to only parse the textual recycler factory specification read from a user- provided property. Components will be automatically provided the global [user-set] RecyclerFactory in their Configuration instance. For those which need to run test against a particular RF implementation, they can directly reference them; e.g., ThreadLocalRecyclerFactory::new.

  1. It seems every ThreadLocal usage will simply be replaced by a Recycler instance. Instead, I would prefer that there is a Recylcer registry (i.e. - a Map) that contains all the Recyclers. That allows us to reference a Recycler from anywhere without needing them to be injected or passed around.

I assume you mean a static registry as in RecyclerFactoryRegistry.getDefaultRecyclerFactory() – please correct me otherwise.

  1. Static registries are proven to be a big problem both at runtime and [parallel] tests.
    1. When used at runtime, they make it impossible to provide different values for different components. Something conflicts with the DI principles.
    2. When used for tests, they cannot be mocked and/or changed per test. This is one of the main reasons we need to fork a new JVM for almost all tests currently.
  2. I am not able to see the benefit a global registry. Components should be agnostic to the provided implementation passed via Configuration#getRecyclerFactory(). Mind sharing some uses cases, please?

@rgoers
Copy link
Member

rgoers commented May 15, 2023

I am thinking of when Scoped Values come into play. Or if thread inheritance is required. I don't want to have to go into the code and change the "name" passed to the factory. Instead, I would like to do:

Recycler recycler = RecyclerRegistry.get("name", spec)

where name represents the particular instance I want to access and spec is the Specification object. The Spec might contain

private boolean inheritable;
private boolean threadRelated;

or other attributes. Based on this the factory could switch from using a ThreadLocal Recylcer to a ScopedVariableRecycler once it is available by only updating the Registry logic.
In addition, using a registry means I can access the recycler instance when it is needed. Typically, a class has to declare a ThreadLocal as a class member and methods within that class reference it. The implementation you have provided simply replaces that declaration with a Recycler. With what I am proposing each method simply calls the get method to obtain the Recycler instance so there is no need to declare the instance as a member. Also, this allows:

  1. Different use cases to potentially share the same ThreadLocal (depending on their spec declaration)
  2. Intelligent cleanup of all Recycler instances. This could be handled through callbacks that are part of the interface or other mechanism as we desire.

Since the Recyclers above are just thin veneer around a ThreadLocal or queue the above two items cannot really be done.

One other thing worth noting is that Scoped Values will be joined at the hip with Virtual Threads so the Recycler will inherently have to allow an instance to provide a method to "run" while the Recycler is in scope. Note that this is quite different from how the Recycler defined above behaves. So in addition to the method above we might also require:

<T> RecyclerRegistry.run(String name, Spec spec, Function<T, X>(X param))

or similar. Then from within the Function you can call RecyclerRegistry.get("name") to retrieve the value(s) associated with that name. This could be implemented with either a ThreadLocal or a ScopedValue.

Also, saying static registries are a problem flies in the face that Log4j has a ServiceRegistry, PropertiesUtil (the Property Registry), a Plugin registry, the LoggerContext is a registry of the application's Loggers, and the LoggerContextSelector is a registry of the LoggerContexts. The DI system is also essentially a registry much as Spring's IOC is. There are probably more but that is just what immediately came to mind. You would have to provide some concrete evidence that registries are a problem for me to buy that argument. FWIW, it is quite easy to create an intelligently designed registry be able to generate and use Mock objects for testing. For example, this is quite easy to do with Spring.

Copy link
Member

@rgoers rgoers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comments

@jvz
Copy link
Member

jvz commented May 15, 2023

That pattern Ralph describes is how resilience4j works. Check out https://resilience4j.readme.io/docs

@vy
Copy link
Member Author

vy commented May 16, 2023

Dependency Injection (DI) vs Static Registries

Also, saying static registries are a problem flies in the face that Log4j has a ServiceRegistry, PropertiesUtil (the PropertyRegistry), a Plugin registry, the LoggerContext is a registry of the application's Loggers, and the LoggerContextSelector is a registry of the LoggerContexts.

There are several JIRA tickets and mailing list threads from users (and committers, PMC members!) complaining that they cannot test logging without forking a fresh JVM each time. Since we both like Spring, let me continue with an example from there. Why do we need to inject Environment every time we need to deal with it rather than using a static PropertiesUtil just like Log4j does?

I think hardcoding the functionality in static registries was the right decision when it was first introduced to Log4j; it was (still is!) convenient to use, the core was small, and there weren't a DI system. But things have changed, in a positive way: we now have a powerful DI system. Let's put that into use!

The DI system is also essentially a registry much as Spring's IOC is. There are probably more but that is just what immediately came to mind.

Yes and with any other system run by a decent DI, you shouldn't need static access. JTL contains the most complicated DI usage out there and there is not a single call to static DI.* methods.

You would have to provide some concrete evidence that registries are a problem for me to buy that argument.

I have provided several examples demonstrating why static registries are bad:

  1. Hinders testing
  2. Hinders composition (e.g., one cannot provide a PropertiesUtil implementation, etc.)
  3. It is a bad-practice in a DI-backed system

I am curious to know if there is a flaw in my line of thinking.

FWIW, it is quite easy to create an intelligently designed registry be able to generate and use Mock objects for testing. For example, this is quite easy to do with Spring.

Definitely! But this is all possible since nothing is statically wired but glued together at runtime by DI.

Scoped Values

I am thinking of when Scoped Values come into play.

Scoped Values are indeed on the horizon and we should cater for that. We might indeed need to massage the current Recycler API to accommodate that. In particular, scoped values require a lambda (i.e., ScopedValue.where(var, val).run(Runnable)), whereas our current recycler requires explicit acquire/release calls. These ergonomics are pretty much orthongonal. I will try to flesh out some Recycler API support for this.

Requesting thread-related functionality while creating a recycler

Or if thread inheritance is required.

I have my reservations about this Ralph. Recycler is merely an efficient object pool. I think requesting thread-related functionality would break that assumption and sophisticate the design. Do you have a particular use case in mind that requires, e.g., thread inheritance?

Having a recycler registry

Recycler recycler = RecyclerRegistry.get("name", spec)
...

  1. Different use cases to potentially share the same ThreadLocal (depending on their spec declaration)

Do you have a [non-hypothetical] use case in mind for the recycler sharing you mention?

  1. Intelligent cleanup of all Recycler instances. This could be handled through callbacks that are part of the interface or other mechanism as we desire.

Let me again share a similar example from Spring. Imagine a Tree bean implementing Lifecycle. Other beans can inject this shared Tree bean and they don't need to manage the lifecycle of the Tree – Spring will take care of it. But, if the bean is prototype-scoped, that is, every injection point gets a new instance, Spring will not manage the bean's lifecycle anymore. I think what we are dealing with is somewhat analogous. If recyclers will be shared (of which I still need to hear a use case for), I understand the sentiment for having a single registry to shut them down. Otherwise, we can delegate that responsibility to the component creating it.

@rgoers
Copy link
Member

rgoers commented May 17, 2023

But things have changed, in a positive way: we now have a powerful DI system. Let's put that into use!

Unfortunately, the DI system as it is today is broken. I discovered this when working on context properties and Matt is aware of this. The problem is that we have a bunch of "Constants" that really just check a property when the "constant" is initialized. However, that makes it impossible to make those LoggerContext specific. In addition, if you don't happen to have the "correct" Injector instance you are kind of screwed. I'm not saying these can't be fixed, but the DI system is not going to solve every problem in Log4j. Certainly not the items I am raising here.

Yes and with any other system run by a decent DI, you shouldn't need static access. JTL contains the most complicated DI usage out there and there is not a single call to static DI.* methods.

Yes, the DI system doesn't have a static anchor. Instead we have many DI instances anchored in many places. Our DI system does have static methods (which I don't have a problem with). Again, the DI system really has nothing to do with what I am asking for.

I have provided several examples demonstrating why static registries are bad:

Hinders testing

No they don't. So long as you can register mock objects it makes no difference whether the registry has a static anchor or not.

Hinders composition (e.g., one cannot provide a PropertiesUtil implementation, etc.)

Huh? That would be a horrible mistake. That is like complaining because you cannot provide your own DI implementation. Somethings shouldn't be replaceable.

It is a bad-practice in a DI-backed system

If our DI system worked properly I might agree with you. It does not.

Note - a "Registry" does not necessarily require being anchored by a static. It could very well be anchored using the DI system, if that indeed was possible.

Scoped Values are indeed on the horizon and we should cater for that. We might indeed need to massage the current Recycler API to accommodate that. In particular, scoped values require a lambda (i.e., ScopedValue.where(var, val).run(Runnable)), whereas our current recycler requires explicit acquire/release calls. These ergonomics are pretty much orthongonal. I will try to flesh out some Recycler API support for this.

Here you are pretty much repeating back to me what I said.

I have my reservations about this Ralph. Recycler is merely an efficient object pool. I think requesting thread-related functionality would break that assumption and sophisticate the design. Do you have a particular use case in mind that requires, e.g., thread inheritance?

Well of course. I have two.

  1. Scoped Values. "Scoped values are automatically inherited by all child threads created using the StructuredTaskScope." Given that is the default I am sure we will want a way to disable it for certain use cases while leaving it enabled for others.
  2. I am using an InheritableThreadLocal in PropretiesUtil so that when configuration takes place any method called, even on a child thread, will get the correct property sources.

Do you have a [non-hypothetical] use case in mind for the recycler sharing you mention?

Well, maybe. I simply haven't looked. But we use so many ThreadLocals that I have a suspicion that a few of them share the same lifecycle. In that case, in my mind it makes sense to share the recycler. But again, without inspecting every use of ThreadLocal in Log4j, no I can't firmly give you an example.

Let me again share a similar example from Spring. Imagine a Tree bean implementing Lifecycle.

I have to stop you right there. The problem in Log4j is that we have several Lifecycles:

  1. The entire JVM
  2. The application instance (there can be several in a web container or OSGi). Generally, this will correlate to a LoggerContext.
  3. The configuration. As you know these can start and stop many times during the lifetime of a LoggerContext.
  4. AppenderManagers - These may live as long as the LoggerContext does or as short as each Configuration they are part of, depending on how the configuration changed.
  5. Even a LogEvent has a lifecycle of sorts.

Matt made a good attempt to make the DI system deal with all these different lifecycles in an appropriate manner, except it doesn't quite work (which probably isn't the fault of the DI code itself but other stuff that he didn't fix - like constants that shouldn't be constants.

I will note that you also didn't address the idea of using a Spec object instead of a simple String that directly names the implementation.

@vy
Copy link
Member Author

vy commented May 19, 2023

If our DI system worked properly I might agree with you. It does not.

If so, I think we should pause stacking features on top of it and first fix it. I would appreciate it if you can raise this observation of yours with some more detail in the mailing list. A solution proposal would also come pretty handy.

I have provided several examples demonstrating why static registries are bad:

Hinders testing

No they don't. So long as you can register mock objects it makes no difference whether the registry has a static anchor or not.

Ralph, I am tired of this. I am trying to be polite, giving you concrete examples, but you are slamming the door with hand waving. Either show me a green CI build with fork=false, parallel=true or please stop.

Do you have a particular use case in mind that requires, e.g., thread inheritance?

Well of course. I have two.

  1. Scoped Values. "Scoped values are automatically inherited by all child threads created using the StructuredTaskScope." Given that is the default I am sure we will want a way to disable it for certain use cases while leaving it enabled for others.

This is a concern of the ThreadLocalRecycler implementation, not the use-site. TLR needs to make sure that it doesn't return an already acquired object. As a matter of fact, this is what it does in its current form, since values are provided from a queue stored in a TL.

  1. I am using an InheritableThreadLocal in PropretiesUtil so that when configuration takes place any method called, even on a child thread, will get the correct property sources.

I don't see how your InheritableThreadLocal usage in PropretiesUtil has to anything to do with a recycler. As a matter of fact, the InheritableThreadLocal need there arises only because PropretiesUtil is accessed statically everywhere, instead of a singleton bean injected by DI.

Do you have a [non-hypothetical] use case in mind for the recycler sharing you mention?

... no I can't firmly give you an example.

I will postpone the idea of a recycler registry until a need arises. We can introduce one when needed in a backward compatible way.

I will note that you also didn't address the idea of using a Spec object instead of a simple String that directly names the implementation.

I did that on purpose, since I haven't heard a single use-case where it would be necessary. Until I hear one, I am inclined to save it for future.

@jvz
Copy link
Member

jvz commented Oct 15, 2023

Besides the merge conflict, I think you should take a look at the updated DI system to see if it fits in with the lifecycles you need.

@vy vy merged commit c4882c0 into main Nov 14, 2023
@vy vy deleted the recycler-api-3.x branch November 14, 2023 11:54
Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @vy!

Since we are introducing new interfaces and static register classes in log4j-api, I think we should take some time to choose were to "hide" those classes to minimize the disruption to Log4j implementors.

Implementors of the Log4j API need to provide an implementation of each interface in o.a.l.l.spi. As far as I understand semantic versioning of packages, each time we make a minor version bump of the package (e.g. we introduce a new implementation of RecyclerFactory), they need to check if it does not break their implementation.

So ideally the o.a.l.l.spi package should probably only contain interfaces and we should move other classes to other packages. E.g. abstract utility classes could be in o.a.l.l.spi.support, while our implementations in o.a.l.l.spi.internal.

What do you think?

public interface BlockingQueueFactory<E> {
public interface BlockingQueueFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface should be removed in favor of BoundedQueueFactory.

@vy vy mentioned this pull request Dec 7, 2023
@vy
Copy link
Member Author

vy commented Dec 21, 2023

Since we are introducing new interfaces and static register classes in log4j-api, I think we should take some time to choose were to "hide" those classes to minimize the disruption to Log4j implementors.

Implementors of the Log4j API need to provide an implementation of each interface in o.a.l.l.spi. As far as I understand semantic versioning of packages, each time we make a minor version bump of the package (e.g. we introduce a new implementation of RecyclerFactory), they need to check if it does not break their implementation.

So ideally the o.a.l.l.spi package should probably only contain interfaces and we should move other classes to other packages. E.g. abstract utility classes could be in o.a.l.l.spi.support, while our implementations in o.a.l.l.spi.internal.

What do you think?

I have applied this convention in the context of recyclers in #2104. For other packages/functionality, it is a whole different story.

@ppkarwasz
Copy link
Contributor

@vy,

Thanks. Remark that this is just my understanding of how SPIs should work. We should probably consult with experts and write some sort of policy for future extensions of the API. The old SPIs for BC are a whole different story, as you remark.

@tcmot
Copy link

tcmot commented Dec 23, 2023

@ppkarwasz

How does Recycler ensure thread safety?

org.apache.logging.log4j.util.Unbox,Replace the ThreadLocal in this class with Recycler?

Thanks.

@vy
Copy link
Member Author

vy commented Dec 29, 2023

How does Recycler ensure thread safety?

org.apache.logging.log4j.util.Unbox,Replace the ThreadLocal in this class with Recycler?

Thread-safety is a part of the recycler contract. All Log4j-provided implementations are thread-safe. @tcmot, if you happen to find a place where they aren't, please submit a bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Affects the public API performance Issues or PRs that affect performance, throughput, latency, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants