Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document fixed rate scheduling with CRaC #33490

Closed
asm0dey opened this issue Sep 5, 2024 · 2 comments
Closed

Document fixed rate scheduling with CRaC #33490

asm0dey opened this issue Sep 5, 2024 · 2 comments
Assignees
Labels
in: core Issues in core modules (aop, beans, core, context, expression) type: documentation A documentation task
Milestone

Comments

@asm0dey
Copy link

asm0dey commented Sep 5, 2024

Affects: Spring Boot 3.3.0, but I think every version supporting CRaC is affected


Consider the following simple application:

@SpringBootApplication
@EnableScheduling
class MyApp     

fun main(args: Array<String>) {
    runApplication<MyApp>(*args)
}

@RestController
class SchedulingController {
    val data = AtomicInteger(0)
    @Scheduled(timeUnit = TimeUnit.SECONDS, fixedRate = 1L)
    fun increment(){
        println(data.incrementAndGet())
    }
    @GetMapping("/")
    fun data() = data.get()
}

My actions are following

  1. ./gradlew build
  2. Build with the following Dockerfile (docker build -t last_edit_pre .):
FROM bellsoft/liberica-runtime-container:jdk-crac-slim

ADD build/libs/last_edit-0.0.1-SNAPSHOT.jar /app/app.jar
WORKDIR /app
ENTRYPOINT java -XX:CRaCCheckpointTo=/app/checkpoint -jar /app/app.jar
  1. Run it with docker run --privileged -p 8081:8080 -it --name last_edit_pre last_edit_pre:latest and wait for some time (for example, until count 10)
  2. Create a snapshot with docker exec -it last_edit_pre jcmd 129 JDK.checkpoint
  3. Commit the snapshot to new image docker commit last_edit_pre last_edit_post
  4. Run the newly-created image like this docker run -it --rm --entrypoint java last_edit_post:latest -XX:CRaCRestoreFrom=/app/checkpoint

Here I observe an interesting behavior: Counter very quickly rewinds from the checkpoint moment to current time. The later I restore from the snapshot the more iterations it quickly rewinds.

It is potentially dangerous: if the scheduled operation is CPU-intensive of performs a dangerous operation - it can actually crush the application with all range of causes.

I do realize that sometimes this behavior might be required, in this case it should probably be an application property.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Sep 5, 2024
@sdeleuze sdeleuze self-assigned this Sep 5, 2024
@sdeleuze sdeleuze added in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Sep 5, 2024
@sdeleuze sdeleuze added this to the 6.2.0-RC1 milestone Sep 5, 2024
@sdeleuze
Copy link
Contributor

sdeleuze commented Sep 9, 2024

@asm0dey So please find below our findings and proposal.

First, be aware that only the x86 variant of bellsoft/liberica-runtime-container:jdk-crac-slim is available, so I used on my Mac M2 a modified version of https://github.com/sdeleuze/spring-boot-crac-demo to reproduce.

Second, the behavior you report is only visible with the on-demand checkpoint/restore of a running application mode, not with the automatic checkpoint/restore at startup one.

Third, if we take a step back, the behavior we see kind of makes sense given the fact that fixedRate behavior is described as "execute the annotated method with a fixed period between invocations", with the first invocations being perfomed before the checkpoint. Interesting, fixedDelay works without such side effect if you want a behavior where a CRaC restoration is similar to just a faster startup as its definition is "execute the annotated method with a fixed period between the end of the last invocation and the start of the next". Notice also that cron works also as you would expect here as cron expressions are calculated after every task execution as well.

As you mention it yourself, sometimes current behavior might be required, sometimes not, so I don't think we should change the default behavior. And since fixedDelay and cron works as expected with CRaC if you want a behavior where a CRaC restoration is similar to just a faster startup, I think I would suggest to turn this issue into a documentation one that would add a sheduling section in the Spring CRaC refdoc to warn about this side effect of on-demand checkpoint when fixedRate is used, and recommending using fixedDelay and cron instead for that use case. Would that be ok from your POV?

@asm0dey
Copy link
Author

asm0dey commented Sep 9, 2024

@sdeleuze thank you for looking into it! Now, when you explained the intricacies of the behavior it makes a perfect sense!
And I now when I understand the behavior I totally agree that it's just a matter of documentation.

@sdeleuze sdeleuze added type: documentation A documentation task and removed type: enhancement A general enhancement labels Sep 9, 2024
@sdeleuze sdeleuze modified the milestones: 6.2.0-RC1, 6.1.13 Sep 9, 2024
@sdeleuze sdeleuze changed the title Restore from a CRaC Checkpoint backfills skipped @Scheduled iterations Document fixed rate scheduling with CRaC Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core Issues in core modules (aop, beans, core, context, expression) type: documentation A documentation task
Projects
None yet
Development

No branches or pull requests

3 participants