Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: relax certain asserts in mdlog replay thread #55639

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

vshankar
Copy link
Contributor

Fixes: http://tracker.ceph.com/issues/57048

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@vshankar vshankar added the cephfs Ceph File System label Feb 19, 2024
@vshankar vshankar requested a review from a team February 19, 2024 12:07
@vshankar
Copy link
Contributor Author

jenkins test make check

Copy link
Contributor

@mchangir mchangir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vshankar
Copy link
Contributor Author

@vshankar vshankar requested a review from a team February 27, 2024 04:36
@dparmar18
Copy link
Contributor

so with this if the journaler turns unreadable by the time we hit this:

ceph/src/mds/MDLog.cc

Lines 1412 to 1417 in b11e6e2

// read it
uint64_t pos = journaler->get_read_pos();
bufferlist bl;
bool r = journaler->try_read_entry(bl);
if (!r && journaler->get_error())
continue;

then this will mean we continue at line 1417 and this goes on until the journaler becomes readable again?

@vshankar
Copy link
Contributor Author

then this will mean we continue at line 1417 and this goes on until the journaler becomes readable again?

I don't think so - it should enter the if (journaler->get_error()) block at the top the while loop and then the error should get handled.

@vshankar
Copy link
Contributor Author

then this will mean we continue at line 1417 and this goes on until the journaler becomes readable again?

I don't think so - it should enter the if (journaler->get_error()) block at the top the while loop and then the error should get handled.

EDIT: are you seeing any other case when the above is not true @dparmar18 ?

@dparmar18
Copy link
Contributor

then this will mean we continue at line 1417 and this goes on until the journaler becomes readable again?

I don't think so - it should enter the if (journaler->get_error()) block at the top the while loop and then the error should get handled.

EDIT: are you seeing any other case when the above is not true @dparmar18 ?

im trying to think of a case where we're stuck in this loop forever, i.e. readable at if (journaler->get_error()) but then turns unreadable when we try read an entry at the code block I highlighted above

@dparmar18
Copy link
Contributor

@

then this will mean we continue at line 1417 and this goes on until the journaler becomes readable again?

I don't think so - it should enter the if (journaler->get_error()) block at the top the while loop and then the error should get handled.

EDIT: are you seeing any other case when the above is not true @dparmar18 ?

im trying to think of a case where we're stuck in this loop forever, i.e. readable at if (journaler->get_error()) but then turns unreadable when we try read an entry at the code block I highlighted above

this seems to be almost impossible to hit, btw do you think we should log it if the read failure occured:

    if (!r && journaler->get_error()) {
      continue;
      dout(0) << "journaler became unreadable, retrying" << dendl;
      //maybe log to cluster log too?
      mds->clog->error() << "journaler suddenly turns unreadable";
    }
    ceph_assert(r);

@vshankar
Copy link
Contributor Author

@

then this will mean we continue at line 1417 and this goes on until the journaler becomes readable again?

I don't think so - it should enter the if (journaler->get_error()) block at the top the while loop and then the error should get handled.

EDIT: are you seeing any other case when the above is not true @dparmar18 ?

im trying to think of a case where we're stuck in this loop forever, i.e. readable at if (journaler->get_error()) but then turns unreadable when we try read an entry at the code block I highlighted above

this seems to be almost impossible to hit, btw do you think we should log it if the read failure occured:

    if (!r && journaler->get_error()) {
      continue;
      dout(0) << "journaler became unreadable, retrying" << dendl;
      //maybe log to cluster log too?
      mds->clog->error() << "journaler suddenly turns unreadable";
    }
    ceph_assert(r);

We can rely on the log messages from the retry.

@gregsfortytwo
Copy link
Member

An explanation in the commit message (and PR intro comment) about why this fixes anything would be great. It looks like it's just straight-up dropping an assert and anything that modifies those kinds of invariants should always come with the logic around why.

@vshankar
Copy link
Contributor Author

vshankar commented Mar 1, 2024

An explanation in the commit message (and PR intro comment) about why this fixes anything would be great. It looks like it's just straight-up dropping an assert and anything that modifies those kinds of invariants should always come with the logic around why.

Sure!

The calls to journaler->is_readable() and journaler->get_error()
in MDLog::_replay_thread() will drop Journaler::lock between
invocations, so, theoretically, its possible that the initial check:

  // loop
  int r = 0;
  while (1) {
    // wait for read?
    while (!journaler->is_readable() &&
       journaler->get_read_pos() < journaler->get_write_pos() &&
       !journaler->get_error()) {
      C_SaferCond readable_waiter;
      journaler->wait_for_readable(&readable_waiter);
      r = readable_waiter.wait();
    }
    if (journaler->get_error()) {
      r = journaler->get_error();
      dout(0) << "_replay journaler got error " << r << ", aborting" << dendl;

journaler->is_readable() returned true, thereby breaking out of
the (inner) while loop and by passing the journaler->get_error()
check, and by the time this hits the next set of checks:

    if (!journaler->is_readable() &&
    journaler->get_read_pos() == journaler->get_write_pos())
      break;

    ceph_assert(journaler->is_readable() || mds->is_daemon_stopping());

It's possible that the journal is unreadable due to some error that
happened during prefetch. In short, these checks are racy.

So, remove these racy assert check along with journaler->is_readable()
check when validating the journal end and rely on the next iteration
of reading the journal for error handling.

Fixes: http://tracker.ceph.com/issues/57048
Signed-off-by: Venky Shankar <vshankar@redhat.com>
@vshankar vshankar force-pushed the wip-mdlog-handle-enoent branch from fc8138f to 90393de Compare March 4, 2024 01:38
@vshankar
Copy link
Contributor Author

vshankar commented Mar 4, 2024

Commit message fixed and updated. Please check @ceph/cephfs

vshankar added a commit to vshankar/ceph that referenced this pull request Mar 4, 2024
* refs/pull/55639/head:
	mds: relax certain asserts in mdlog replay thread

Reviewed-by: Milind Changire <mchangir@redhat.com>
@vshankar
Copy link
Contributor Author

vshankar commented Mar 4, 2024

@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2024

@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2024

jenkins retest this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants