PWX-21175: handle kernel crash in fastpath #229

sulakshm · 2021-09-05T17:27:34Z

Signed-off-by: Lakshmi Narasimhan Sundararajan lns@portworx.com

What this PR does / why we need it:

Which issue(s) this PR fixes (optional)
Closes # https://portworx.atlassian.net/browse/PWX-21175

Special notes for your reviewer:

PASSED last 6 consecutive runs in the automation test.

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

prabirpaul

Could you please add a description for the issue being fixed.

prabirpaul · 2021-09-08T07:03:10Z

pxd.c

-        mutex_unlock(&pxd_dev->disk->queue->sysfs_lock);
+		mutex_unlock(&pxd_dev->disk->queue->sysfs_lock);
+#else
+		blk_set_queue_dying(pxd_dev->disk->queue);


Why does QUEUE_FLAG_SET() not work for all cases?

It may. There are additional actions done while marking a device queue as removing, please look inside blk_set_queue_dying. It would be safe to use a direct exported API instead.

Right but the reason we were not using blk_set_queue_dying() is because it's not available on all kernels. Please check if PX_BLKMQ covers that.

PX_BLKMQ is defined from kernel 4.18+ or if el8 is defined. And in vanilla kernel is available from 3.19. We should be safe with this change. Will look out for any compilation issues after merge.

prabirpaul · 2021-09-08T07:12:46Z

pxd_bio_blkmq.c

-                BUG_ON("unexpected condition");
-        }
-#endif
+        BUG_ON(!rq_is_special(rq));


rq_is_special() only checks for discard, Based on the below it should also check for WRITE_SAME and zeroout if there's an op for it.

No. px device only supports read/write and discard, write same and write zeros are not supported on px device.
The below logic converts the received discard to map to the backing device (btrfs file or dmthin volume).
It may convert discard internally to write same or write zero if discard is not available.

sulakshm · 2021-09-08T11:12:40Z

Could you please add a description for the issue being fixed.

Please see detailed inline comments on the first PR #227

prabirpaul · 2021-09-08T23:09:11Z

pxd.c

-        mutex_unlock(&pxd_dev->disk->queue->sysfs_lock);
+		mutex_unlock(&pxd_dev->disk->queue->sysfs_lock);
+#else
+		blk_set_queue_dying(pxd_dev->disk->queue);


Right but the reason we were not using blk_set_queue_dying() is because it's not available on all kernels. Please check if PX_BLKMQ covers that.

prabirpaul · 2021-09-10T06:57:14Z

pxd_bio_blkmq.c

-        rq_for_each_segment(bv, rq, rq_iter) nr_bvec++;
+        if (!specialops)
+                rq_for_each_segment(bv, rq, rq_iter) nr_bvec++;
+


This is fine. But we need a take a closer look at it.
Since the compilation doesn't fail a kernel upgrade on an existing px node will result in a kernel panic and likely in a loop. We either not use apis like this or have a way of sanity checking the arguments before the call.

code cannot be future proof, since our driver is outside tree. I think the only focus we can make is issue these api calls only on a need basis and not assume they will behave the same otherwise. Like in this example, if this is a discard request, we do know there aren't any iovec carried, so skipping the call, but for read/writes this has to exist.
Yes exactly the latter from your argument, check if discard, then avoid making the call.

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

…230) * PWX-21175: handle kernel crash in fastpath(#229) * compile fix - disable fastpath code during compile time cherry picked changes into 2.10.0 Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

Lakshmi Narasimhan Sundararajan added 2 commits August 31, 2021 21:04

compile fix - disable fastpath code during compile time

c193b79

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

PWX-21175: handle kernel crash in fastpath

edb4834

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

sulakshm requested a review from prabirpaul September 5, 2021 17:27

Lakshmi Narasimhan Sundararajan added 5 commits September 5, 2021 23:09

cleanup

5cb4c1c

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

fix compilation in 3.10

df3bb97

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

Merge branch 'ln/master' into ln/PWX-21175

4e547c7

initialize bio size for discards

cfbc347

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

avoid freezing for deadlock

ab29747

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

prabirpaul reviewed Sep 8, 2021

View reviewed changes

prabirpaul approved these changes Sep 10, 2021

View reviewed changes

Lakshmi Narasimhan Sundararajan added 2 commits September 12, 2021 19:43

use global workqueue

869d9bf

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

fix compilation

2b7acba

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

sulakshm changed the base branch from ln/master to master September 14, 2021 03:47

Merge branch 'master' into ln/PWX-21175

a5940ac

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

sulakshm merged commit 0252dbf into master Sep 14, 2021

sulakshm pushed a commit that referenced this pull request Sep 14, 2021

PWX-21175: handle kernel crash in fastpath(#229)

c646375

Signed-off-by: Lakshmi Narasimhan Sundararajan <lns@portworx.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PWX-21175: handle kernel crash in fastpath #229

PWX-21175: handle kernel crash in fastpath #229

sulakshm commented Sep 5, 2021 •

edited

Loading

prabirpaul left a comment

prabirpaul Sep 8, 2021

sulakshm Sep 8, 2021

prabirpaul Sep 8, 2021

sulakshm Sep 14, 2021

prabirpaul Sep 8, 2021

sulakshm Sep 8, 2021

sulakshm commented Sep 8, 2021

prabirpaul Sep 8, 2021

prabirpaul Sep 10, 2021

sulakshm Sep 14, 2021

PWX-21175: handle kernel crash in fastpath #229

PWX-21175: handle kernel crash in fastpath #229

Conversation

sulakshm commented Sep 5, 2021 • edited Loading

prabirpaul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sulakshm commented Sep 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sulakshm commented Sep 5, 2021 •

edited

Loading