Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWX-21175: handle kernel crash in fastpath #229

Merged
merged 10 commits into from
Sep 14, 2021
Merged

PWX-21175: handle kernel crash in fastpath #229

merged 10 commits into from
Sep 14, 2021

Conversation

sulakshm
Copy link
Contributor

@sulakshm sulakshm commented Sep 5, 2021

Signed-off-by: Lakshmi Narasimhan Sundararajan lns@portworx.com

What this PR does / why we need it:

Which issue(s) this PR fixes (optional)
Closes # https://portworx.atlassian.net/browse/PWX-21175

Special notes for your reviewer:

PASSED last 6 consecutive runs in the automation test.

Lakshmi Narasimhan Sundararajan added 2 commits August 31, 2021 21:04
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
@sulakshm sulakshm requested a review from prabirpaul September 5, 2021 17:27
Lakshmi Narasimhan Sundararajan added 5 commits September 5, 2021 23:09
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
Copy link
Contributor

@prabirpaul prabirpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a description for the issue being fixed.

mutex_unlock(&pxd_dev->disk->queue->sysfs_lock);
mutex_unlock(&pxd_dev->disk->queue->sysfs_lock);
#else
blk_set_queue_dying(pxd_dev->disk->queue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does QUEUE_FLAG_SET() not work for all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may. There are additional actions done while marking a device queue as removing, please look inside blk_set_queue_dying. It would be safe to use a direct exported API instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but the reason we were not using blk_set_queue_dying() is because it's not available on all kernels. Please check if PX_BLKMQ covers that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PX_BLKMQ is defined from kernel 4.18+ or if el8 is defined. And in vanilla kernel is available from 3.19. We should be safe with this change. Will look out for any compilation issues after merge.

BUG_ON("unexpected condition");
}
#endif
BUG_ON(!rq_is_special(rq));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rq_is_special() only checks for discard, Based on the below it should also check for WRITE_SAME and zeroout if there's an op for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. px device only supports read/write and discard, write same and write zeros are not supported on px device.
The below logic converts the received discard to map to the backing device (btrfs file or dmthin volume).
It may convert discard internally to write same or write zero if discard is not available.

@sulakshm
Copy link
Contributor Author

sulakshm commented Sep 8, 2021

Could you please add a description for the issue being fixed.

Please see detailed inline comments on the first PR #227

mutex_unlock(&pxd_dev->disk->queue->sysfs_lock);
mutex_unlock(&pxd_dev->disk->queue->sysfs_lock);
#else
blk_set_queue_dying(pxd_dev->disk->queue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but the reason we were not using blk_set_queue_dying() is because it's not available on all kernels. Please check if PX_BLKMQ covers that.

rq_for_each_segment(bv, rq, rq_iter) nr_bvec++;
if (!specialops)
rq_for_each_segment(bv, rq, rq_iter) nr_bvec++;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. But we need a take a closer look at it.
Since the compilation doesn't fail a kernel upgrade on an existing px node will result in a kernel panic and likely in a loop. We either not use apis like this or have a way of sanity checking the arguments before the call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code cannot be future proof, since our driver is outside tree. I think the only focus we can make is issue these api calls only on a need basis and not assume they will behave the same otherwise. Like in this example, if this is a discard request, we do know there aren't any iovec carried, so skipping the call, but for read/writes this has to exist.
Yes exactly the latter from your argument, check if discard, then avoid making the call.

Lakshmi Narasimhan Sundararajan added 2 commits September 12, 2021 19:43
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
@sulakshm sulakshm changed the base branch from ln/master to master September 14, 2021 03:47
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
@sulakshm sulakshm merged commit 0252dbf into master Sep 14, 2021
sulakshm pushed a commit that referenced this pull request Sep 14, 2021
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
sulakshm added a commit that referenced this pull request Sep 14, 2021
…230)

* PWX-21175: handle kernel crash in fastpath(#229)
* compile fix - disable fastpath code during compile time
cherry picked changes into 2.10.0
Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants