Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metadata-store] Fix metadata cache inconsistency on doing refresh #14283

Merged

Conversation

Demogorgon314
Copy link
Member

@Demogorgon314 Demogorgon314 commented Feb 15, 2022

Fixes #14110 #14096 #14219

Motivation

Currently, the metadata cache refresh has race condition.

The main reason is MetadataCacheImpl#refresh is not atomic operations.

// 1. Caffeine.asyncLoad already begin and reads an old value but is not done yet.
if (objCache.getIfPresent(path) != null) {
  objCache.synchronous().invalidate(path);
  // 2. Caffeine.asyncLoad done, the old value will store to cache.
  objCache.synchronous().refresh(path);
}

I added some log in MetadataCacheImpl in order to observe:

@Override
public CompletableFuture<Optional<CacheGetResult<T>>> asyncLoad(String key, Executor executor) {
  log.info("Async load for objCache {}", key);
  CompletableFuture<Optional<CacheGetResult<T>>> future = readValueFromStore(key);
  future.whenComplete((op, ex) -> {
    if (ex == null){
      op.ifPresent(tCacheGetResult -> log.info("Async load for objCache complete {} - {}", key, tCacheGetResult));
    }
  });
  return future;
}
---------------------
@Override
public void refresh(String path) {
  log.info("1. Refresh path {}", path);
  // Refresh object of path if only it is cached before.
  if (objCache.getIfPresent(path) != null) {
    objCache.synchronous().invalidate(path);
    objCache.synchronous().refresh(path);
    log.info("2. Refresh path complete {}:{}", path, get(path).join().get());
  } else {
    log.info("0. Skip Refresh path {}", path);
  }
}

Cache inconsistency, you can see the async load for objCache complete in mid of refresh :

2022-02-15T08:33:06,669+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,669+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,669+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552641536875 {}
2022-02-15T08:33:06,669+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552641536875 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=1), stat=Stat(path=/key-33552641536875, version=0, creationTimestamp=1644885186668, modificationTimestamp=1644885186668, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,669+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,669+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552641536875 {}
2022-02-15T08:33:06,670+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,670+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,670+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,670+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552641536875 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=1), stat=Stat(path=/key-33552641536875, version=0, creationTimestamp=1644885186668, modificationTimestamp=1644885186668, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@277 - 2. Refresh path complete /key-33552641536875:MetadataCacheTest.MyClass(a=a, b=1) {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552641536875 {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552641536875 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=2), stat=Stat(path=/key-33552641536875, version=1, creationTimestamp=1644885186668, modificationTimestamp=1644885186669, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@277 - 2. Refresh path complete /key-33552641536875:MetadataCacheTest.MyClass(a=a, b=2) {}

Correct log:

2022-02-15T08:33:06,659+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,659+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,659+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552632155000 {}
2022-02-15T08:33:06,659+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552632155000 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=1), stat=Stat(path=/key-33552632155000, version=0, creationTimestamp=1644885186659, modificationTimestamp=1644885186659, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [metadata-store-2436-1] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [metadata-store-2436-1] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552632155000 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=2), stat=Stat(path=/key-33552632155000, version=1, creationTimestamp=1644885186659, modificationTimestamp=1644885186659, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@277 - 2. Refresh path complete /key-33552632155000:MetadataCacheTest.MyClass(a=a, b=2) {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552632155000 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=2), stat=Stat(path=/key-33552632155000, version=1, creationTimestamp=1644885186659, modificationTimestamp=1644885186659, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@277 - 2. Refresh path complete /key-33552632155000:MetadataCacheTest.MyClass(a=a, b=2) {}
2022-02-15T08:33:06,662+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552635148375 {}

Modifications

Make metadata cache refresh method atomic.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

Need to update docs?

  • no-need-doc

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 15, 2022
@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Feb 15, 2022
@merlimat merlimat added this to the 2.10.0 milestone Feb 15, 2022
@Demogorgon314 Demogorgon314 marked this pull request as draft February 15, 2022 01:58
@codelipenghui codelipenghui added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Feb 15, 2022
@Demogorgon314 Demogorgon314 force-pushed the fix/make-metadata-cache-refresh-async branch from a71692d to ab6ef2b Compare February 15, 2022 08:16
@Demogorgon314 Demogorgon314 changed the title [metadata-store] Fix race condition by making metadata cache refresh async [metadata-store] Fix metadata cache inconsistency on doing refresh Feb 15, 2022
@Demogorgon314 Demogorgon314 marked this pull request as ready for review February 15, 2022 09:18
@lhotari lhotari merged commit 2e16b43 into apache:master Feb 15, 2022
@Demogorgon314 Demogorgon314 deleted the fix/make-metadata-cache-refresh-async branch February 15, 2022 11:22
@gaozhangmin
Copy link
Contributor

@Demogorgon314 I still get test error: https://github.com/apache/pulsar/runs/5231565908?check_suite_focus=true

@Demogorgon314
Copy link
Member Author

@Demogorgon314 I still get test error: https://github.com/apache/pulsar/runs/5231565908?check_suite_focus=true

I'll push a PR to fix it.

@Demogorgon314
Copy link
Member Author

@gaozhangmin PTAL #14354 : )

congbobo184 pushed a commit that referenced this pull request Nov 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/blocker Indicate the PR or issue that should block the release until it gets resolved release/2.9.4 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky-test: MetadataCacheTest.insertionDeletion
9 participants