Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daily crashes on latest build #7

Open
n5q opened this issue Apr 5, 2020 · 2 comments
Open

Daily crashes on latest build #7

n5q opened this issue Apr 5, 2020 · 2 comments

Comments

@n5q
Copy link

n5q commented Apr 5, 2020

My phone freezes up and crashes forcing a reboot about twice a day running the March 11 update. I occasionally get pink/green stripes across the screen when this happens which I assume is graphics array corruption. I notice that when my device crashes from this it restarts a second time on the powered by android screen. I do not have Magisk or any other modifications to the device.

@n5q
Copy link
Author

n5q commented Apr 5, 2020

This is for the 16.0 build

@n5q
Copy link
Author

n5q commented Apr 10, 2020

Managed to get a boot log right after a crash, see attached.

boot.log

fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 26, 2020
[ Upstream commit 347ab9480313737c0f1aaa08e8f2e1a791235535 ]

This patch fixes deadlock warning if removing PWM device
when CONFIG_PROVE_LOCKING is enabled.

This issue can be reproceduced by the following steps on
the R-Car H3 Salvator-X board if the backlight is disabled:

 # cd /sys/class/pwm/pwmchip0
 # echo 0 > export
 # ls
 device  export  npwm  power  pwm0  subsystem  uevent  unexport
 # cd device/driver
 # ls
 bind  e6e31000.pwm  uevent  unbind
 # echo e6e31000.pwm > unbind

[   87.659974] ======================================================
[   87.666149] WARNING: possible circular locking dependency detected
[   87.672327] 5.0.0 ivanmeler#7 Not tainted
[   87.675549] ------------------------------------------------------
[   87.681723] bash/2986 is trying to acquire lock:
[   87.686337] 000000005ea0e178 (kn->count#58){++++}, at: kernfs_remove_by_name_ns+0x50/0xa0
[   87.694528]
[   87.694528] but task is already holding lock:
[   87.700353] 000000006313b17c (pwm_lock){+.+.}, at: pwmchip_remove+0x28/0x13c
[   87.707405]
[   87.707405] which lock already depends on the new lock.
[   87.707405]
[   87.715574]
[   87.715574] the existing dependency chain (in reverse order) is:
[   87.723048]
[   87.723048] -> ivanmeler#1 (pwm_lock){+.+.}:
[   87.728017]        __mutex_lock+0x70/0x7e4
[   87.732108]        mutex_lock_nested+0x1c/0x24
[   87.736547]        pwm_request_from_chip.part.6+0x34/0x74
[   87.741940]        pwm_request_from_chip+0x20/0x40
[   87.746725]        export_store+0x6c/0x1f4
[   87.750820]        dev_attr_store+0x18/0x28
[   87.754998]        sysfs_kf_write+0x54/0x64
[   87.759175]        kernfs_fop_write+0xe4/0x1e8
[   87.763615]        __vfs_write+0x40/0x184
[   87.767619]        vfs_write+0xa8/0x19c
[   87.771448]        ksys_write+0x58/0xbc
[   87.775278]        __arm64_sys_write+0x18/0x20
[   87.779721]        el0_svc_common+0xd0/0x124
[   87.783986]        el0_svc_compat_handler+0x1c/0x24
[   87.788858]        el0_svc_compat+0x8/0x18
[   87.792947]
[   87.792947] -> #0 (kn->count#58){++++}:
[   87.798260]        lock_acquire+0xc4/0x22c
[   87.802353]        __kernfs_remove+0x258/0x2c4
[   87.806790]        kernfs_remove_by_name_ns+0x50/0xa0
[   87.811836]        remove_files.isra.1+0x38/0x78
[   87.816447]        sysfs_remove_group+0x48/0x98
[   87.820971]        sysfs_remove_groups+0x34/0x4c
[   87.825583]        device_remove_attrs+0x6c/0x7c
[   87.830197]        device_del+0x11c/0x33c
[   87.834201]        device_unregister+0x14/0x2c
[   87.838638]        pwmchip_sysfs_unexport+0x40/0x4c
[   87.843509]        pwmchip_remove+0xf4/0x13c
[   87.847773]        rcar_pwm_remove+0x28/0x34
[   87.852039]        platform_drv_remove+0x24/0x64
[   87.856651]        device_release_driver_internal+0x18c/0x21c
[   87.862391]        device_release_driver+0x14/0x1c
[   87.867175]        unbind_store+0xe0/0x124
[   87.871265]        drv_attr_store+0x20/0x30
[   87.875442]        sysfs_kf_write+0x54/0x64
[   87.879618]        kernfs_fop_write+0xe4/0x1e8
[   87.884055]        __vfs_write+0x40/0x184
[   87.888057]        vfs_write+0xa8/0x19c
[   87.891887]        ksys_write+0x58/0xbc
[   87.895716]        __arm64_sys_write+0x18/0x20
[   87.900154]        el0_svc_common+0xd0/0x124
[   87.904417]        el0_svc_compat_handler+0x1c/0x24
[   87.909289]        el0_svc_compat+0x8/0x18
[   87.913378]
[   87.913378] other info that might help us debug this:
[   87.913378]
[   87.921374]  Possible unsafe locking scenario:
[   87.921374]
[   87.927286]        CPU0                    CPU1
[   87.931808]        ----                    ----
[   87.936331]   lock(pwm_lock);
[   87.939293]                                lock(kn->count#58);
[   87.945120]                                lock(pwm_lock);
[   87.950599]   lock(kn->count#58);
[   87.953908]
[   87.953908]  *** DEADLOCK ***
[   87.953908]
[   87.959821] 4 locks held by bash/2986:
[   87.963563]  #0: 00000000ace7bc30 (sb_writers#6){.+.+}, at: vfs_write+0x188/0x19c
[   87.971044]  ivanmeler#1: 00000000287991b2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xb4/0x1e8
[   87.978872]  ivanmeler#2: 00000000f739d016 (&dev->mutex){....}, at: device_release_driver_internal+0x40/0x21c
[   87.988001]  ivanmeler#3: 000000006313b17c (pwm_lock){+.+.}, at: pwmchip_remove+0x28/0x13c
[   87.995481]
[   87.995481] stack backtrace:
[   87.999836] CPU: 0 PID: 2986 Comm: bash Not tainted 5.0.0 ivanmeler#7
[   88.005489] Hardware name: Renesas Salvator-X board based on r8a7795 ES1.x (DT)
[   88.012791] Call trace:
[   88.015235]  dump_backtrace+0x0/0x190
[   88.018891]  show_stack+0x14/0x1c
[   88.022204]  dump_stack+0xb0/0xec
[   88.025514]  print_circular_bug.isra.32+0x1d0/0x2e0
[   88.030385]  __lock_acquire+0x1318/0x1864
[   88.034388]  lock_acquire+0xc4/0x22c
[   88.037958]  __kernfs_remove+0x258/0x2c4
[   88.041874]  kernfs_remove_by_name_ns+0x50/0xa0
[   88.046398]  remove_files.isra.1+0x38/0x78
[   88.050487]  sysfs_remove_group+0x48/0x98
[   88.054490]  sysfs_remove_groups+0x34/0x4c
[   88.058580]  device_remove_attrs+0x6c/0x7c
[   88.062671]  device_del+0x11c/0x33c
[   88.066154]  device_unregister+0x14/0x2c
[   88.070070]  pwmchip_sysfs_unexport+0x40/0x4c
[   88.074421]  pwmchip_remove+0xf4/0x13c
[   88.078163]  rcar_pwm_remove+0x28/0x34
[   88.081906]  platform_drv_remove+0x24/0x64
[   88.085996]  device_release_driver_internal+0x18c/0x21c
[   88.091215]  device_release_driver+0x14/0x1c
[   88.095478]  unbind_store+0xe0/0x124
[   88.099048]  drv_attr_store+0x20/0x30
[   88.102704]  sysfs_kf_write+0x54/0x64
[   88.106359]  kernfs_fop_write+0xe4/0x1e8
[   88.110275]  __vfs_write+0x40/0x184
[   88.113757]  vfs_write+0xa8/0x19c
[   88.117065]  ksys_write+0x58/0xbc
[   88.120374]  __arm64_sys_write+0x18/0x20
[   88.124291]  el0_svc_common+0xd0/0x124
[   88.128034]  el0_svc_compat_handler+0x1c/0x24
[   88.132384]  el0_svc_compat+0x8/0x18

The sysfs unexport in pwmchip_remove() is completely asymmetric
to what we do in pwmchip_add_with_polarity() and commit 0733424c9ba9
("pwm: Unexport children before chip removal") is a strong indication
that this was wrong to begin with. We should just move
pwmchip_sysfs_unexport() where it belongs, which is right after
pwmchip_sysfs_unexport_children(). In that case, we do not need
separate functions anymore either.

We also really want to remove sysfs irrespective of whether or not
the chip will be removed as a result of pwmchip_remove(). We can only
assume that the driver will be gone after that, so we shouldn't leave
any dangling sysfs files around.

This warning disappears if we move pwmchip_sysfs_unexport() to
the top of pwmchip_remove(), pwmchip_sysfs_unexport_children().
That way it is also outside of the pwm_lock section, which indeed
doesn't seem to be needed.

Moving the pwmchip_sysfs_export() call outside of that section also
seems fine and it'd be perfectly symmetric with pwmchip_remove() again.

So, this patch fixes them.

Signed-off-by: Phong Hoang <phong.hoang.wz@renesas.com>
[shimoda: revise the commit log and code]
Fixes: 76abbdd ("pwm: Add sysfs interface")
Fixes: 0733424c9ba9 ("pwm: Unexport children before chip removal")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Hoan Nguyen An <na-hoan@jinso.co.jp>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: If7fc36ed28d5bd95a4f32666dc2010939cbf66e5
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 26, 2020
[ Upstream commit 443f2d5ba13d65ccfd879460f77941875159d154 ]

Observe a segmentation fault when 'perf stat' is asked to repeat forever
with the interval option.

Without fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
  #           time             counts unit events
       5.000211692  3,13,89,82,34,157      cycles
      10.000380119  1,53,98,52,22,294      cycles
      10.040467280       17,16,79,265      cycles
  Segmentation fault

This problem was only observed when we use forever option aka -r 0 and
works with limited repeats. Calling print_counter with ts being set to
NULL, is not a correct option when interval is set. Hence avoid
print_counter(NULL,..)  if interval is set.

With fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
   #           time             counts unit events
       5.019866622  3,15,14,43,08,697      cycles
      10.039865756  3,15,16,31,95,261      cycles
      10.059950628     1,26,05,47,158      cycles
       5.009902655  3,14,52,62,33,932      cycles
      10.019880228  3,14,52,22,89,154      cycles
      10.030543876       66,90,18,333      cycles
       5.009848281  3,14,51,98,25,437      cycles
      10.029854402  3,15,14,93,04,918      cycles
       5.009834177  3,14,51,95,92,316      cycles

Committer notes:

Did the 'git bisect' to find the cset introducing the problem to add the
Fixes tag below, and at that time the problem reproduced as:

  (gdb) run stat -r0 -I500 sleep 1
  <SNIP>
  Program received signal SIGSEGV, Segmentation fault.
  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  866		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
  (gdb) bt
  #0  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  ivanmeler#1  0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938
  ivanmeler#2  0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411
  ivanmeler#3  0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370
  ivanmeler#4  0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429
  ivanmeler#5  0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473
  ivanmeler#6  0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588
  (gdb)

Mostly the same as just before this patch:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  964		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
  (gdb) bt
  #0  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  ivanmeler#1  0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670)
      at util/stat-display.c:1172
  ivanmeler#2  0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656
  ivanmeler#3  0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960
  ivanmeler#4  0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310
  ivanmeler#5  0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362
  ivanmeler#6  0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406
  ivanmeler#7  0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531
  (gdb)

Fixes: d4f63a4741a8 ("perf stat: Introduce print_counters function")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I03e14a64d8f774e0cd166298744216ebe9fb26bc
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 27, 2020
[ Upstream commit 443f2d5ba13d65ccfd879460f77941875159d154 ]

Observe a segmentation fault when 'perf stat' is asked to repeat forever
with the interval option.

Without fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
  #           time             counts unit events
       5.000211692  3,13,89,82,34,157      cycles
      10.000380119  1,53,98,52,22,294      cycles
      10.040467280       17,16,79,265      cycles
  Segmentation fault

This problem was only observed when we use forever option aka -r 0 and
works with limited repeats. Calling print_counter with ts being set to
NULL, is not a correct option when interval is set. Hence avoid
print_counter(NULL,..)  if interval is set.

With fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
   #           time             counts unit events
       5.019866622  3,15,14,43,08,697      cycles
      10.039865756  3,15,16,31,95,261      cycles
      10.059950628     1,26,05,47,158      cycles
       5.009902655  3,14,52,62,33,932      cycles
      10.019880228  3,14,52,22,89,154      cycles
      10.030543876       66,90,18,333      cycles
       5.009848281  3,14,51,98,25,437      cycles
      10.029854402  3,15,14,93,04,918      cycles
       5.009834177  3,14,51,95,92,316      cycles

Committer notes:

Did the 'git bisect' to find the cset introducing the problem to add the
Fixes tag below, and at that time the problem reproduced as:

  (gdb) run stat -r0 -I500 sleep 1
  <SNIP>
  Program received signal SIGSEGV, Segmentation fault.
  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  866		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
  (gdb) bt
  #0  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  ivanmeler#1  0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938
  ivanmeler#2  0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411
  ivanmeler#3  0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370
  ivanmeler#4  0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429
  ivanmeler#5  0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473
  ivanmeler#6  0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588
  (gdb)

Mostly the same as just before this patch:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  964		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
  (gdb) bt
  #0  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  ivanmeler#1  0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670)
      at util/stat-display.c:1172
  ivanmeler#2  0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656
  ivanmeler#3  0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960
  ivanmeler#4  0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310
  ivanmeler#5  0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362
  ivanmeler#6  0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406
  ivanmeler#7  0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531
  (gdb)

Fixes: d4f63a4741a8 ("perf stat: Introduce print_counters function")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I03e14a64d8f774e0cd166298744216ebe9fb26bc
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 27, 2020
[ Upstream commit 443f2d5ba13d65ccfd879460f77941875159d154 ]

Observe a segmentation fault when 'perf stat' is asked to repeat forever
with the interval option.

Without fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
  #           time             counts unit events
       5.000211692  3,13,89,82,34,157      cycles
      10.000380119  1,53,98,52,22,294      cycles
      10.040467280       17,16,79,265      cycles
  Segmentation fault

This problem was only observed when we use forever option aka -r 0 and
works with limited repeats. Calling print_counter with ts being set to
NULL, is not a correct option when interval is set. Hence avoid
print_counter(NULL,..)  if interval is set.

With fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
   #           time             counts unit events
       5.019866622  3,15,14,43,08,697      cycles
      10.039865756  3,15,16,31,95,261      cycles
      10.059950628     1,26,05,47,158      cycles
       5.009902655  3,14,52,62,33,932      cycles
      10.019880228  3,14,52,22,89,154      cycles
      10.030543876       66,90,18,333      cycles
       5.009848281  3,14,51,98,25,437      cycles
      10.029854402  3,15,14,93,04,918      cycles
       5.009834177  3,14,51,95,92,316      cycles

Committer notes:

Did the 'git bisect' to find the cset introducing the problem to add the
Fixes tag below, and at that time the problem reproduced as:

  (gdb) run stat -r0 -I500 sleep 1
  <SNIP>
  Program received signal SIGSEGV, Segmentation fault.
  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  866		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
  (gdb) bt
  #0  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  ivanmeler#1  0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938
  ivanmeler#2  0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411
  ivanmeler#3  0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370
  ivanmeler#4  0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429
  ivanmeler#5  0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473
  ivanmeler#6  0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588
  (gdb)

Mostly the same as just before this patch:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  964		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
  (gdb) bt
  #0  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  ivanmeler#1  0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670)
      at util/stat-display.c:1172
  ivanmeler#2  0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656
  ivanmeler#3  0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960
  ivanmeler#4  0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310
  ivanmeler#5  0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362
  ivanmeler#6  0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406
  ivanmeler#7  0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531
  (gdb)

Fixes: d4f63a4741a8 ("perf stat: Introduce print_counters function")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I03e14a64d8f774e0cd166298744216ebe9fb26bc
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 27, 2020
[ Upstream commit 443f2d5ba13d65ccfd879460f77941875159d154 ]

Observe a segmentation fault when 'perf stat' is asked to repeat forever
with the interval option.

Without fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
  #           time             counts unit events
       5.000211692  3,13,89,82,34,157      cycles
      10.000380119  1,53,98,52,22,294      cycles
      10.040467280       17,16,79,265      cycles
  Segmentation fault

This problem was only observed when we use forever option aka -r 0 and
works with limited repeats. Calling print_counter with ts being set to
NULL, is not a correct option when interval is set. Hence avoid
print_counter(NULL,..)  if interval is set.

With fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
   #           time             counts unit events
       5.019866622  3,15,14,43,08,697      cycles
      10.039865756  3,15,16,31,95,261      cycles
      10.059950628     1,26,05,47,158      cycles
       5.009902655  3,14,52,62,33,932      cycles
      10.019880228  3,14,52,22,89,154      cycles
      10.030543876       66,90,18,333      cycles
       5.009848281  3,14,51,98,25,437      cycles
      10.029854402  3,15,14,93,04,918      cycles
       5.009834177  3,14,51,95,92,316      cycles

Committer notes:

Did the 'git bisect' to find the cset introducing the problem to add the
Fixes tag below, and at that time the problem reproduced as:

  (gdb) run stat -r0 -I500 sleep 1
  <SNIP>
  Program received signal SIGSEGV, Segmentation fault.
  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  866		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
  (gdb) bt
  #0  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  ivanmeler#1  0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938
  ivanmeler#2  0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411
  ivanmeler#3  0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370
  ivanmeler#4  0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429
  ivanmeler#5  0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473
  ivanmeler#6  0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588
  (gdb)

Mostly the same as just before this patch:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  964		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
  (gdb) bt
  #0  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  ivanmeler#1  0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670)
      at util/stat-display.c:1172
  ivanmeler#2  0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656
  ivanmeler#3  0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960
  ivanmeler#4  0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310
  ivanmeler#5  0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362
  ivanmeler#6  0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406
  ivanmeler#7  0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531
  (gdb)

Fixes: d4f63a4741a8 ("perf stat: Introduce print_counters function")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I03e14a64d8f774e0cd166298744216ebe9fb26bc
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 28, 2020
[ Upstream commit 42ffb0bf584ae5b6b38f72259af1e0ee417ac77f ]

There exists a deadlock with range_cyclic that has existed forever.  If
we loop around with a bio already built we could deadlock with a writer
who has the page locked that we're attempting to write but is waiting on
a page in our bio to be written out.  The task traces are as follows

  PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: "kworker/u113:5"
   #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
   ivanmeler#1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
   ivanmeler#2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
   ivanmeler#3 [ffffc900297bb708] __lock_page at ffffffff811f145b
   ivanmeler#4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
   ivanmeler#5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
   ivanmeler#6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
   ivanmeler#7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
   ivanmeler#8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
   #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd

  PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
  "aio-dio-invalid"
   #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
   ivanmeler#1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
   ivanmeler#2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
   ivanmeler#3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
   ivanmeler#4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
   ivanmeler#5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
   ivanmeler#6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
   ivanmeler#7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
   ivanmeler#8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
   #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032

I used drgn to find the respective pages we were stuck on

page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874

As you can see the kworker is waiting for bit 0 (PG_locked) on index
7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
following

  crash> struct extent_page_data ffffc900297bbbb0
  struct extent_page_data {
    bio = 0xffff889f747ed830,
    tree = 0xffff889eed6ba448,
    extent_locked = 0,
    sync_io = 0
  }

Probably worth mentioning as well that it waits for writeback of the
page to complete while holding a lock on it (at prepare_pages()).

Using drgn I walked the bio pages looking for page
0xffffea00fbfc7500 which is the one we're waiting for writeback on

  bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
  for i in range(0, bio.bi_vcnt.value_()):
      bv = bio.bi_io_vec[i]
      if bv.bv_page.value_() == 0xffffea00fbfc7500:
	  print("FOUND IT")

which validated what I suspected.

The fix for this is simple, flush the epd before we loop back around to
the beginning of the file during writeout.

Fixes: b293f02 ("Btrfs: Add writepages support")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I5daee5ea01df5eb76df156cb489140e58826cab4
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 28, 2020
[ Upstream commit 54155ed4199c7aa3fd20866648024ab63c96d579 ]

A faulty userspace that calls destroy_session() before destroying the
connections can trigger the failure.  This patch prevents the issue by
refusing to destroy the session if there are outstanding connections.

------------[ cut here ]------------
kernel BUG at mm/slub.c:306!
invalid opcode: 0000 [ivanmeler#1] SMP PTI
CPU: 1 PID: 1224 Comm: iscsid Not tainted 5.4.0-rc2.iscsi+ ivanmeler#7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:__slab_free+0x181/0x350
[...]
[ 1209.686056] RSP: 0018:ffffa93d4074fae0 EFLAGS: 00010246
[ 1209.686694] RAX: ffff934efa5ad800 RBX: 000000008010000a RCX: ffff934efa5ad800
[ 1209.687651] RDX: ffff934efa5ad800 RSI: ffffeb4041e96b00 RDI: ffff934efd402c40
[ 1209.688582] RBP: ffffa93d4074fb80 R08: 0000000000000001 R09: ffffffffbb5dfa26
[ 1209.689425] R10: ffff934efa5ad800 R11: 0000000000000001 R12: ffffeb4041e96b00
[ 1209.690285] R13: ffff934efa5ad800 R14: ffff934efd402c40 R15: 0000000000000000
[ 1209.691213] FS:  00007f7945dfb540(0000) GS:ffff934efda80000(0000) knlGS:0000000000000000
[ 1209.692316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1209.693013] CR2: 000055877fd3da80 CR3: 0000000077384000 CR4: 00000000000006e0
[ 1209.693897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1209.694773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1209.695631] Call Trace:
[ 1209.695957]  ? __wake_up_common_lock+0x8a/0xc0
[ 1209.696712]  iscsi_pool_free+0x26/0x40
[ 1209.697263]  iscsi_session_teardown+0x2f/0xf0
[ 1209.698117]  iscsi_sw_tcp_session_destroy+0x45/0x60
[ 1209.698831]  iscsi_if_rx+0xd88/0x14e0
[ 1209.699370]  netlink_unicast+0x16f/0x200
[ 1209.699932]  netlink_sendmsg+0x21a/0x3e0
[ 1209.700446]  sock_sendmsg+0x4f/0x60
[ 1209.700902]  ___sys_sendmsg+0x2ae/0x320
[ 1209.701451]  ? cp_new_stat+0x150/0x180
[ 1209.701922]  __sys_sendmsg+0x59/0xa0
[ 1209.702357]  do_syscall_64+0x52/0x160
[ 1209.702812]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1209.703419] RIP: 0033:0x7f7946433914
[...]
[ 1209.706084] RSP: 002b:00007fffb99f2378 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1209.706994] RAX: ffffffffffffffda RBX: 000055bc869eac20 RCX: 00007f7946433914
[ 1209.708082] RDX: 0000000000000000 RSI: 00007fffb99f2390 RDI: 0000000000000005
[ 1209.709120] RBP: 00007fffb99f2390 R08: 000055bc84fe9320 R09: 00007fffb99f1f07
[ 1209.710110] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
[ 1209.711085] R13: 000055bc8502306e R14: 0000000000000000 R15: 0000000000000000
 Modules linked in:
 ---[ end trace a2d933ede7f730d8 ]---

Link: https://lore.kernel.org/r/20191226203148.2172200-1-krisman@collabora.com
Signed-off-by: Nick Black <nlb@google.com>
Co-developed-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Salman Qazi <sqazi@google.com>
Co-developed-by: Junho Ryu <jayr@google.com>
Signed-off-by: Junho Ryu <jayr@google.com>
Co-developed-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Co-developed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I6906a6593a01ddd315ff744843b61f9d50739697
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 28, 2020
[ Upstream commit 42ffb0bf584ae5b6b38f72259af1e0ee417ac77f ]

There exists a deadlock with range_cyclic that has existed forever.  If
we loop around with a bio already built we could deadlock with a writer
who has the page locked that we're attempting to write but is waiting on
a page in our bio to be written out.  The task traces are as follows

  PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: "kworker/u113:5"
   #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
   ivanmeler#1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
   ivanmeler#2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
   ivanmeler#3 [ffffc900297bb708] __lock_page at ffffffff811f145b
   ivanmeler#4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
   ivanmeler#5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
   ivanmeler#6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
   ivanmeler#7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
   ivanmeler#8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
   #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd

  PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
  "aio-dio-invalid"
   #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
   ivanmeler#1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
   ivanmeler#2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
   ivanmeler#3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
   ivanmeler#4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
   ivanmeler#5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
   ivanmeler#6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
   ivanmeler#7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
   ivanmeler#8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
   #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032

I used drgn to find the respective pages we were stuck on

page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874

As you can see the kworker is waiting for bit 0 (PG_locked) on index
7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
following

  crash> struct extent_page_data ffffc900297bbbb0
  struct extent_page_data {
    bio = 0xffff889f747ed830,
    tree = 0xffff889eed6ba448,
    extent_locked = 0,
    sync_io = 0
  }

Probably worth mentioning as well that it waits for writeback of the
page to complete while holding a lock on it (at prepare_pages()).

Using drgn I walked the bio pages looking for page
0xffffea00fbfc7500 which is the one we're waiting for writeback on

  bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
  for i in range(0, bio.bi_vcnt.value_()):
      bv = bio.bi_io_vec[i]
      if bv.bv_page.value_() == 0xffffea00fbfc7500:
	  print("FOUND IT")

which validated what I suspected.

The fix for this is simple, flush the epd before we loop back around to
the beginning of the file during writeout.

Fixes: b293f02 ("Btrfs: Add writepages support")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I5daee5ea01df5eb76df156cb489140e58826cab4
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 28, 2020
[ Upstream commit 54155ed4199c7aa3fd20866648024ab63c96d579 ]

A faulty userspace that calls destroy_session() before destroying the
connections can trigger the failure.  This patch prevents the issue by
refusing to destroy the session if there are outstanding connections.

------------[ cut here ]------------
kernel BUG at mm/slub.c:306!
invalid opcode: 0000 [ivanmeler#1] SMP PTI
CPU: 1 PID: 1224 Comm: iscsid Not tainted 5.4.0-rc2.iscsi+ ivanmeler#7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:__slab_free+0x181/0x350
[...]
[ 1209.686056] RSP: 0018:ffffa93d4074fae0 EFLAGS: 00010246
[ 1209.686694] RAX: ffff934efa5ad800 RBX: 000000008010000a RCX: ffff934efa5ad800
[ 1209.687651] RDX: ffff934efa5ad800 RSI: ffffeb4041e96b00 RDI: ffff934efd402c40
[ 1209.688582] RBP: ffffa93d4074fb80 R08: 0000000000000001 R09: ffffffffbb5dfa26
[ 1209.689425] R10: ffff934efa5ad800 R11: 0000000000000001 R12: ffffeb4041e96b00
[ 1209.690285] R13: ffff934efa5ad800 R14: ffff934efd402c40 R15: 0000000000000000
[ 1209.691213] FS:  00007f7945dfb540(0000) GS:ffff934efda80000(0000) knlGS:0000000000000000
[ 1209.692316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1209.693013] CR2: 000055877fd3da80 CR3: 0000000077384000 CR4: 00000000000006e0
[ 1209.693897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1209.694773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1209.695631] Call Trace:
[ 1209.695957]  ? __wake_up_common_lock+0x8a/0xc0
[ 1209.696712]  iscsi_pool_free+0x26/0x40
[ 1209.697263]  iscsi_session_teardown+0x2f/0xf0
[ 1209.698117]  iscsi_sw_tcp_session_destroy+0x45/0x60
[ 1209.698831]  iscsi_if_rx+0xd88/0x14e0
[ 1209.699370]  netlink_unicast+0x16f/0x200
[ 1209.699932]  netlink_sendmsg+0x21a/0x3e0
[ 1209.700446]  sock_sendmsg+0x4f/0x60
[ 1209.700902]  ___sys_sendmsg+0x2ae/0x320
[ 1209.701451]  ? cp_new_stat+0x150/0x180
[ 1209.701922]  __sys_sendmsg+0x59/0xa0
[ 1209.702357]  do_syscall_64+0x52/0x160
[ 1209.702812]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1209.703419] RIP: 0033:0x7f7946433914
[...]
[ 1209.706084] RSP: 002b:00007fffb99f2378 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1209.706994] RAX: ffffffffffffffda RBX: 000055bc869eac20 RCX: 00007f7946433914
[ 1209.708082] RDX: 0000000000000000 RSI: 00007fffb99f2390 RDI: 0000000000000005
[ 1209.709120] RBP: 00007fffb99f2390 R08: 000055bc84fe9320 R09: 00007fffb99f1f07
[ 1209.710110] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
[ 1209.711085] R13: 000055bc8502306e R14: 0000000000000000 R15: 0000000000000000
 Modules linked in:
 ---[ end trace a2d933ede7f730d8 ]---

Link: https://lore.kernel.org/r/20191226203148.2172200-1-krisman@collabora.com
Signed-off-by: Nick Black <nlb@google.com>
Co-developed-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Salman Qazi <sqazi@google.com>
Co-developed-by: Junho Ryu <jayr@google.com>
Signed-off-by: Junho Ryu <jayr@google.com>
Co-developed-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Co-developed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I6906a6593a01ddd315ff744843b61f9d50739697
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 28, 2020
[ Upstream commit 42ffb0bf584ae5b6b38f72259af1e0ee417ac77f ]

There exists a deadlock with range_cyclic that has existed forever.  If
we loop around with a bio already built we could deadlock with a writer
who has the page locked that we're attempting to write but is waiting on
a page in our bio to be written out.  The task traces are as follows

  PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: "kworker/u113:5"
   #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
   ivanmeler#1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
   ivanmeler#2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
   ivanmeler#3 [ffffc900297bb708] __lock_page at ffffffff811f145b
   ivanmeler#4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
   ivanmeler#5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
   ivanmeler#6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
   ivanmeler#7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
   ivanmeler#8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
   #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd

  PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
  "aio-dio-invalid"
   #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
   ivanmeler#1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
   ivanmeler#2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
   ivanmeler#3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
   ivanmeler#4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
   ivanmeler#5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
   ivanmeler#6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
   ivanmeler#7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
   ivanmeler#8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
   #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032

I used drgn to find the respective pages we were stuck on

page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874

As you can see the kworker is waiting for bit 0 (PG_locked) on index
7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
following

  crash> struct extent_page_data ffffc900297bbbb0
  struct extent_page_data {
    bio = 0xffff889f747ed830,
    tree = 0xffff889eed6ba448,
    extent_locked = 0,
    sync_io = 0
  }

Probably worth mentioning as well that it waits for writeback of the
page to complete while holding a lock on it (at prepare_pages()).

Using drgn I walked the bio pages looking for page
0xffffea00fbfc7500 which is the one we're waiting for writeback on

  bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
  for i in range(0, bio.bi_vcnt.value_()):
      bv = bio.bi_io_vec[i]
      if bv.bv_page.value_() == 0xffffea00fbfc7500:
	  print("FOUND IT")

which validated what I suspected.

The fix for this is simple, flush the epd before we loop back around to
the beginning of the file during writeout.

Fixes: b293f02 ("Btrfs: Add writepages support")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I5daee5ea01df5eb76df156cb489140e58826cab4
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 28, 2020
[ Upstream commit 54155ed4199c7aa3fd20866648024ab63c96d579 ]

A faulty userspace that calls destroy_session() before destroying the
connections can trigger the failure.  This patch prevents the issue by
refusing to destroy the session if there are outstanding connections.

------------[ cut here ]------------
kernel BUG at mm/slub.c:306!
invalid opcode: 0000 [ivanmeler#1] SMP PTI
CPU: 1 PID: 1224 Comm: iscsid Not tainted 5.4.0-rc2.iscsi+ ivanmeler#7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:__slab_free+0x181/0x350
[...]
[ 1209.686056] RSP: 0018:ffffa93d4074fae0 EFLAGS: 00010246
[ 1209.686694] RAX: ffff934efa5ad800 RBX: 000000008010000a RCX: ffff934efa5ad800
[ 1209.687651] RDX: ffff934efa5ad800 RSI: ffffeb4041e96b00 RDI: ffff934efd402c40
[ 1209.688582] RBP: ffffa93d4074fb80 R08: 0000000000000001 R09: ffffffffbb5dfa26
[ 1209.689425] R10: ffff934efa5ad800 R11: 0000000000000001 R12: ffffeb4041e96b00
[ 1209.690285] R13: ffff934efa5ad800 R14: ffff934efd402c40 R15: 0000000000000000
[ 1209.691213] FS:  00007f7945dfb540(0000) GS:ffff934efda80000(0000) knlGS:0000000000000000
[ 1209.692316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1209.693013] CR2: 000055877fd3da80 CR3: 0000000077384000 CR4: 00000000000006e0
[ 1209.693897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1209.694773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1209.695631] Call Trace:
[ 1209.695957]  ? __wake_up_common_lock+0x8a/0xc0
[ 1209.696712]  iscsi_pool_free+0x26/0x40
[ 1209.697263]  iscsi_session_teardown+0x2f/0xf0
[ 1209.698117]  iscsi_sw_tcp_session_destroy+0x45/0x60
[ 1209.698831]  iscsi_if_rx+0xd88/0x14e0
[ 1209.699370]  netlink_unicast+0x16f/0x200
[ 1209.699932]  netlink_sendmsg+0x21a/0x3e0
[ 1209.700446]  sock_sendmsg+0x4f/0x60
[ 1209.700902]  ___sys_sendmsg+0x2ae/0x320
[ 1209.701451]  ? cp_new_stat+0x150/0x180
[ 1209.701922]  __sys_sendmsg+0x59/0xa0
[ 1209.702357]  do_syscall_64+0x52/0x160
[ 1209.702812]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1209.703419] RIP: 0033:0x7f7946433914
[...]
[ 1209.706084] RSP: 002b:00007fffb99f2378 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1209.706994] RAX: ffffffffffffffda RBX: 000055bc869eac20 RCX: 00007f7946433914
[ 1209.708082] RDX: 0000000000000000 RSI: 00007fffb99f2390 RDI: 0000000000000005
[ 1209.709120] RBP: 00007fffb99f2390 R08: 000055bc84fe9320 R09: 00007fffb99f1f07
[ 1209.710110] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
[ 1209.711085] R13: 000055bc8502306e R14: 0000000000000000 R15: 0000000000000000
 Modules linked in:
 ---[ end trace a2d933ede7f730d8 ]---

Link: https://lore.kernel.org/r/20191226203148.2172200-1-krisman@collabora.com
Signed-off-by: Nick Black <nlb@google.com>
Co-developed-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Salman Qazi <sqazi@google.com>
Co-developed-by: Junho Ryu <jayr@google.com>
Signed-off-by: Junho Ryu <jayr@google.com>
Co-developed-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Co-developed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I6906a6593a01ddd315ff744843b61f9d50739697
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 28, 2020
commit 28936b62e71e41600bab319f262ea9f9b1027629 upstream.

inode->i_blocks could be accessed concurrently as noticed by KCSAN,

 BUG: KCSAN: data-race in ext4_do_update_inode [ext4] / inode_add_bytes

 write to 0xffff9a00d4b982d0 of 8 bytes by task 22100 on cpu 118:
  inode_add_bytes+0x65/0xf0
  __inode_add_bytes at fs/stat.c:689
  (inlined by) inode_add_bytes at fs/stat.c:702
  ext4_mb_new_blocks+0x418/0xca0 [ext4]
  ext4_ext_map_blocks+0x1a6b/0x27b0 [ext4]
  ext4_map_blocks+0x1a9/0x950 [ext4]
  _ext4_get_block+0xfc/0x270 [ext4]
  ext4_get_block_unwritten+0x33/0x50 [ext4]
  __block_write_begin_int+0x22e/0xae0
  __block_write_begin+0x39/0x50
  ext4_write_begin+0x388/0xb50 [ext4]
  ext4_da_write_begin+0x35f/0x8f0 [ext4]
  generic_perform_write+0x15d/0x290
  ext4_buffered_write_iter+0x11f/0x210 [ext4]
  ext4_file_write_iter+0xce/0x9e0 [ext4]
  new_sync_write+0x29c/0x3b0
  __vfs_write+0x92/0xa0
  vfs_write+0x103/0x260
  ksys_write+0x9d/0x130
  __x64_sys_write+0x4c/0x60
  do_syscall_64+0x91/0xb05
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 read to 0xffff9a00d4b982d0 of 8 bytes by task 8 on cpu 65:
  ext4_do_update_inode+0x4a0/0xf60 [ext4]
  ext4_inode_blocks_set at fs/ext4/inode.c:4815
  ext4_mark_iloc_dirty+0xaf/0x160 [ext4]
  ext4_mark_inode_dirty+0x129/0x3e0 [ext4]
  ext4_convert_unwritten_extents+0x253/0x2d0 [ext4]
  ext4_convert_unwritten_io_end_vec+0xc5/0x150 [ext4]
  ext4_end_io_rsv_work+0x22c/0x350 [ext4]
  process_one_work+0x54f/0xb90
  worker_thread+0x80/0x5f0
  kthread+0x1cd/0x1f0
  ret_from_fork+0x27/0x50

 4 locks held by kworker/u256:0/8:
  #0: ffff9a025abc4328 ((wq_completion)ext4-rsv-conversion){+.+.}, at: process_one_work+0x443/0xb90
  ivanmeler#1: ffffab5a862dbe20 ((work_completion)(&ei->i_rsv_conversion_work)){+.+.}, at: process_one_work+0x443/0xb90
  ivanmeler#2: ffff9a025a9d0f58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
  ivanmeler#3: ffff9a00d4b985d8 (&(&ei->i_raw_lock)->rlock){+.+.}, at: ext4_do_update_inode+0xaa/0xf60 [ext4]
 irq event stamp: 3009267
 hardirqs last  enabled at (3009267): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
 hardirqs last disabled at (3009266): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
 softirqs last  enabled at (3009230): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
 softirqs last disabled at (3009223): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 65 PID: 8 Comm: kworker/u256:0 Tainted: G L 5.6.0-rc2-next-20200221+ ivanmeler#7
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
 Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work [ext4]

The plain read is outside of inode->i_lock critical section which
results in a data race. Fix it by adding READ_ONCE() there.

Link: https://lore.kernel.org/r/20200222043258.2279-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Iae2d68b7f2e5317c8b256a7e4fbc22fdcdf0a8a4
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue Jun 4, 2020
[ Upstream commit 76e752701a8af4404bbd9c45723f7cbd6e4a251e ]

Some servers seem to accept connections while booting but never send
the SMBNegotiate response neither close the connection, causing all
processes accessing the share hang on uninterruptible sleep state.

This happens when the cifs_demultiplex_thread detects the server is
unresponsive so releases the socket and start trying to reconnect.
At some point, the faulty server will accept the socket and the TCP
status will be set to NeedNegotiate. The first issued command accessing
the share will start the negotiation (pid 5828 below), but the response
will never arrive so other commands will be blocked waiting on the mutex
(pid 55352).

This patch checks for unresponsive servers also on the negotiate stage
releasing the socket and reconnecting if the response is not received
and checking again the tcp state when the mutex is acquired.

PID: 55352  TASK: ffff880fd6cc02c0  CPU: 0   COMMAND: "ls"
 #0 [ffff880fd9add9f0] schedule at ffffffff81467eb9
 ivanmeler#1 [ffff880fd9addb38] __mutex_lock_slowpath at ffffffff81468fe0
 ivanmeler#2 [ffff880fd9addba8] mutex_lock at ffffffff81468b1a
 ivanmeler#3 [ffff880fd9addbc0] cifs_reconnect_tcon at ffffffffa042f905 [cifs]
 ivanmeler#4 [ffff880fd9addc60] smb_init at ffffffffa042faeb [cifs]
 ivanmeler#5 [ffff880fd9addca0] CIFSSMBQPathInfo at ffffffffa04360b5 [cifs]
 ....

Which is waiting a mutex owned by:

PID: 5828   TASK: ffff880fcc55e400  CPU: 0   COMMAND: "xxxx"
 #0 [ffff880fbfdc19b8] schedule at ffffffff81467eb9
 ivanmeler#1 [ffff880fbfdc1b00] wait_for_response at ffffffffa044f96d [cifs]
 ivanmeler#2 [ffff880fbfdc1b60] SendReceive at ffffffffa04505ce [cifs]
 ivanmeler#3 [ffff880fbfdc1bb0] CIFSSMBNegotiate at ffffffffa0438d79 [cifs]
 ivanmeler#4 [ffff880fbfdc1c50] cifs_negotiate_protocol at ffffffffa043b383 [cifs]
 ivanmeler#5 [ffff880fbfdc1c80] cifs_reconnect_tcon at ffffffffa042f911 [cifs]
 ivanmeler#6 [ffff880fbfdc1d20] smb_init at ffffffffa042faeb [cifs]
 ivanmeler#7 [ffff880fbfdc1d60] CIFSSMBQFSInfo at ffffffffa0434eb0 [cifs]
 ....

Signed-off-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: Aurélien Aptel <aaptel@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I588516ef6df5f4e7cb738f9cb602e59d94254a42
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 23, 2021
commit cf3591ef832915892f2499b7e54b51d4c578b28c upstream.

Revert the commit bd293d071ffe65e645b4d8104f9d8fe15ea13862. The proper
fix has been made available with commit d0a255e795ab ("loop: set
PF_MEMALLOC_NOIO for the worker thread").

Note that the fix offered by commit bd293d071ffe doesn't really prevent
the deadlock from occuring - if we look at the stacktrace reported by
Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
i.e. it has already successfully taken the mutex. Changing the mutex
from mutex_lock to mutex_trylock won't help with deadlocks that happen
afterwards.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   ivanmeler#1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   ivanmeler#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   ivanmeler#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   ivanmeler#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   ivanmeler#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   ivanmeler#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   ivanmeler#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   ivanmeler#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  ivanmeler#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  ivanmeler#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  ivanmeler#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  ivanmeler#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  ivanmeler#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: bd293d071ffe ("dm bufio: fix deadlock with loop device")
Depends-on: d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6cd6b254cca8cb8e72c197d44b91790c1a799554
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 29, 2021
[ Upstream commit ec9c7d19336ee98ecba8de80128aa405c45feebb ]

Exercising CPU hotplug on a 5.2 kernel with recent padata fixes from
cryptodev-2.6.git in an 8-CPU kvm guest...

    # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
    # echo 0 > /sys/devices/system/cpu/cpu1/online
    # echo c > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
    # modprobe tcrypt mode=215

...caused the following crash:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [ivanmeler#1] SMP PTI
    CPU: 2 PID: 134 Comm: kworker/2:2 Not tainted 5.2.0-padata-base+ ivanmeler#7
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-<snip>
    Workqueue: pencrypt padata_parallel_worker
    RIP: 0010:padata_reorder+0xcb/0x180
    ...
    Call Trace:
     padata_do_serial+0x57/0x60
     pcrypt_aead_enc+0x3a/0x50 [pcrypt]
     padata_parallel_worker+0x9b/0xe0
     process_one_work+0x1b5/0x3f0
     worker_thread+0x4a/0x3c0
     ...

In padata_alloc_pd, pd->cpu is set using the user-supplied cpumask
instead of the effective cpumask, and in this case cpumask_first picked
an offline CPU.

The offline CPU's reorder->list.next is NULL in padata_reorder because
the list wasn't initialized in padata_init_pqueues, which only operates
on CPUs in the effective mask.

Fix by using the effective mask in padata_alloc_pd.

Fixes: 6fc4dbcf0276 ("padata: Replace delayed timer with immediate workqueue in padata_reorder")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I768fd4ffb3de1ebcd3a71499eff1baec3607b431
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 29, 2021
commit 420902c9d086848a7548c83e0a49021514bd71b7 upstream.

If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can
deadlock our own worker - mount blocks kworker/3:2, sleeps forever more.

crash> ps|grep UN
    715      2   3  ffff880220734d30  UN   0.0       0      0  [kworker/3:2]
   9369   9341   2  ffff88021ffb7560  UN   1.3  493404 123184  Xorg
   9665   9664   3  ffff880225b92ab0  UN   0.0   47368    812  udisks-daemon
  10635  10403   3  ffff880222f22c70  UN   0.0   14904    936  mount
crash> bt ffff880220734d30
PID: 715    TASK: ffff880220734d30  CPU: 3   COMMAND: "kworker/3:2"
 #0 [ffff8802244c3c20] schedule at ffffffff8144584b
 ivanmeler#1 [ffff8802244c3cc8] __rt_mutex_slowlock at ffffffff814472b3
 ivanmeler#2 [ffff8802244c3d28] rt_mutex_slowlock at ffffffff814473f5
 ivanmeler#3 [ffff8802244c3dc8] reiserfs_write_lock at ffffffffa05f28fd [reiserfs]
 ivanmeler#4 [ffff8802244c3de8] flush_async_commits at ffffffffa05ec91d [reiserfs]
 ivanmeler#5 [ffff8802244c3e08] process_one_work at ffffffff81073726
 ivanmeler#6 [ffff8802244c3e68] worker_thread at ffffffff81073eba
 ivanmeler#7 [ffff8802244c3ec8] kthread at ffffffff810782e0
 ivanmeler#8 [ffff8802244c3f48] kernel_thread_helper at ffffffff81450064
crash> rd ffff8802244c3cc8 10
ffff8802244c3cc8:  ffffffff814472b3 ffff880222f23250   .rD.....P2."....
ffff8802244c3cd8:  0000000000000000 0000000000000286   ................
ffff8802244c3ce8:  ffff8802244c3d30 ffff880220734d80   0=L$.....Ms ....
ffff8802244c3cf8:  ffff880222e8f628 0000000000000000   (.."............
ffff8802244c3d08:  0000000000000000 0000000000000002   ................
crash> struct rt_mutex ffff880222e8f628
struct rt_mutex {
  wait_lock = {
    raw_lock = {
      slock = 65537
    }
  },
  wait_list = {
    node_list = {
      next = 0xffff8802244c3d48,
      prev = 0xffff8802244c3d48
    }
  },
  owner = 0xffff880222f22c71,
  save_state = 0
}
crash> bt 0xffff880222f22c70
PID: 10635  TASK: ffff880222f22c70  CPU: 3   COMMAND: "mount"
 #0 [ffff8802216a9868] schedule at ffffffff8144584b
 ivanmeler#1 [ffff8802216a9910] schedule_timeout at ffffffff81446865
 ivanmeler#2 [ffff8802216a99a0] wait_for_common at ffffffff81445f74
 ivanmeler#3 [ffff8802216a9a30] flush_work at ffffffff810712d3
 ivanmeler#4 [ffff8802216a9ab0] schedule_on_each_cpu at ffffffff81074463
 ivanmeler#5 [ffff8802216a9ae0] invalidate_bdev at ffffffff81178aba
 ivanmeler#6 [ffff8802216a9af0] vfs_load_quota_inode at ffffffff811a3632
 ivanmeler#7 [ffff8802216a9b50] dquot_quota_on_mount at ffffffff811a375c
 ivanmeler#8 [ffff8802216a9b80] finish_unfinished at ffffffffa05dd8b0 [reiserfs]
 #9 [ffff8802216a9cc0] reiserfs_fill_super at ffffffffa05de825 [reiserfs]
    RIP: 00007f7b9303997a  RSP: 00007ffff443c7a8  RFLAGS: 00010202
    RAX: 00000000000000a5  RBX: ffffffff8144ef12  RCX: 00007f7b932e9ee0
    RDX: 00007f7b93d9a400  RSI: 00007f7b93d9a3e0  RDI: 00007f7b93d9a3c0
    RBP: 00007f7b93d9a2c0   R8: 00007f7b93d9a550   R9: 0000000000000001
    R10: ffffffffc0ed040e  R11: 0000000000000202  R12: 000000000000040e
    R13: 0000000000000000  R14: 00000000c0ed040e  R15: 00007ffff443ca20
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ib963e18e5994a15e985b567c8b504ef91484e66d
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 30, 2021
[ Upstream commit e773ca7da8beeca7f17fe4c9d1284a2b66839cc1 ]

Actually, burst size is equal to '1 << desc->rqcfg.brst_size'.
we should use burst size, not desc->rqcfg.brst_size.

dma memcpy performance on Rockchip RV1126
@ 1512MHz A7, 1056MHz LPDDR3, 200MHz DMA:

dmatest:

/# echo dma0chan0 > /sys/module/dmatest/parameters/channel
/# echo 4194304 > /sys/module/dmatest/parameters/test_buf_size
/# echo 8 > /sys/module/dmatest/parameters/iterations
/# echo y > /sys/module/dmatest/parameters/norandom
/# echo y > /sys/module/dmatest/parameters/verbose
/# echo 1 > /sys/module/dmatest/parameters/run

dmatest: dma0chan0-copy0: result ivanmeler#1: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result ivanmeler#2: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result ivanmeler#3: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result ivanmeler#4: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result ivanmeler#5: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result ivanmeler#6: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result ivanmeler#7: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result ivanmeler#8: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000

Before:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 48 iops 200338 KB/s (0)

After this patch:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 179 iops 734873 KB/s (0)

After this patch and increase dma clk to 400MHz:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 259 iops 1062929 KB/s (0)

Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com>
Link: https://lore.kernel.org/r/1605326106-55681-1-git-send-email-sugar.zhang@rock-chips.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ifd84b76d02837e91073e5207961b50fae7f32429
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 31, 2021
[ Upstream commit ec4fbd64751de18729eaa816ec69e4b504b5a7a2 ]

Dmitry reported a lockdep splat [1] (false positive) that we can fix
by releasing the spinlock before calling icmp_send() from ip_expire()

This is a false positive because sending an ICMP message can not
possibly re-enter the IP frag engine.

[1]
[ INFO: possible circular locking dependency detected ]
4.10.0+ #29 Not tainted
-------------------------------------------------------
modprobe/12392 is trying to acquire lock:
 (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock
include/linux/spinlock.h:299 [inline]
 (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock
include/linux/netdevice.h:3486 [inline]
 (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>]
sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180

but task is already holding lock:
 (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
include/linux/spinlock.h:299 [inline]
 (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> ivanmeler#1 (&(&q->lock)->rlock){+.-...}:
       validate_chain kernel/locking/lockdep.c:2267 [inline]
       __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
       lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669
       ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713
       packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459
       deliver_skb net/core/dev.c:1834 [inline]
       dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890
       xmit_one net/core/dev.c:2903 [inline]
       dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923
       sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182
       __dev_xmit_skb net/core/dev.c:3092 [inline]
       __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
       neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308
       neigh_output include/net/neighbour.h:478 [inline]
       ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228
       ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672
       ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545
       ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
       dst_output include/net/dst.h:486 [inline]
       ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
       ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
       ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
       raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985
       __sys_sendmmsg+0x25c/0x750 net/socket.c:2075
       SYSC_sendmmsg net/socket.c:2106 [inline]
       SyS_sendmmsg+0x35/0x60 net/socket.c:2101
       do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
       return_from_SYSCALL_64+0x0/0x7a

-> #0 (_xmit_ETHER#2){+.-...}:
       check_prev_add kernel/locking/lockdep.c:1830 [inline]
       check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
       validate_chain kernel/locking/lockdep.c:2267 [inline]
       __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
       lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       __netif_tx_lock include/linux/netdevice.h:3486 [inline]
       sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
       __dev_xmit_skb net/core/dev.c:3092 [inline]
       __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
       neigh_hh_output include/net/neighbour.h:468 [inline]
       neigh_output include/net/neighbour.h:476 [inline]
       ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
       ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
       dst_output include/net/dst.h:486 [inline]
       ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
       ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
       ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
       icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
       icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
       ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
       call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
       expire_timers kernel/time/timer.c:1307 [inline]
       __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
       run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
       __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:657 [inline]
       smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
       apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
       __read_once_size include/linux/compiler.h:254 [inline]
       atomic_read arch/x86/include/asm/atomic.h:26 [inline]
       rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
       __rcu_is_watching kernel/rcu/tree.c:1133 [inline]
       rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
       rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
       radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
       filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
       do_fault_around mm/memory.c:3231 [inline]
       do_read_fault mm/memory.c:3265 [inline]
       do_fault+0xbd5/0x2080 mm/memory.c:3370
       handle_pte_fault mm/memory.c:3600 [inline]
       __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
       handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
       __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
       do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
       page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&q->lock)->rlock);
                               lock(_xmit_ETHER#2);
                               lock(&(&q->lock)->rlock);
  lock(_xmit_ETHER#2);

 *** DEADLOCK ***

10 locks held by modprobe/12392:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>]
__do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336
 ivanmeler#1:  (rcu_read_lock){......}, at: [<ffffffff8188cab6>]
filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324
 ivanmeler#2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
spin_lock include/linux/spinlock.h:299 [inline]
 ivanmeler#2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
pte_alloc_one_map mm/memory.c:2944 [inline]
 ivanmeler#2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072
 ivanmeler#3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
lockdep_copy_map include/linux/lockdep.h:175 [inline]
 ivanmeler#3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258
 ivanmeler#4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
include/linux/spinlock.h:299 [inline]
 ivanmeler#4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
 ivanmeler#5:  (rcu_read_lock){......}, at: [<ffffffff8389a633>]
ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216
 ivanmeler#6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock
include/linux/spinlock.h:309 [inline]
 ivanmeler#6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock
net/ipv4/icmp.c:219 [inline]
 ivanmeler#6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>]
icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681
 ivanmeler#7:  (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>]
ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198
 ivanmeler#8:  (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>]
__dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324
 #9:  (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at:
[<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423

stack backtrace:
CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
 check_prev_add kernel/locking/lockdep.c:1830 [inline]
 check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
 validate_chain kernel/locking/lockdep.c:2267 [inline]
 __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:299 [inline]
 __netif_tx_lock include/linux/netdevice.h:3486 [inline]
 sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
 __dev_xmit_skb net/core/dev.c:3092 [inline]
 __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
 neigh_hh_output include/net/neighbour.h:468 [inline]
 neigh_output include/net/neighbour.h:476 [inline]
 ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:246 [inline]
 ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
 dst_output include/net/dst.h:486 [inline]
 ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
 expire_timers kernel/time/timer.c:1307 [inline]
 __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:657 [inline]
 smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline]
RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline]
RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10
RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c
RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000
R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25
R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000
 </IRQ>
 rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
 filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
 do_fault_around mm/memory.c:3231 [inline]
 do_read_fault mm/memory.c:3265 [inline]
 do_fault+0xbd5/0x2080 mm/memory.c:3370
 handle_pte_fault mm/memory.c:3600 [inline]
 __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
RIP: 0033:0x7f83172f2786
RSP: 002b:00007fffe859ae80 EFLAGS: 00010293
RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970
RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000
R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040
R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I373ca4c676c87f8b606c93b9f0b1820ea80ce1c5
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue May 31, 2021
[ Upstream commit 2c0aa08631b86a4678dbc93b9caa5248014b4458 ]

Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 ivanmeler#1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 ivanmeler#2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 ivanmeler#3 [ffff898e35f15b60] no_context at ffffffff8104854c
 ivanmeler#4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 ivanmeler#5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 ivanmeler#6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 ivanmeler#7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 ivanmeler#8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 ivanmeler#10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 ivanmeler#11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 ivanmeler#1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 ivanmeler#2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 ivanmeler#3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 ivanmeler#4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 ivanmeler#5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 ivanmeler#6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 ivanmeler#7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 ivanmeler#8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Signed-off-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I58ede1e99f457b5c910d8316148052999798ce0a
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue Jun 2, 2021
[ Upstream commit e8bd76ede155fd54d8c41d045dda43cd3174d506 ]

kernel panic trace looks like:

 ivanmeler#5 [ffffb9e08698fc80] do_page_fault at ffffffffb666e0d7
 ivanmeler#6 [ffffb9e08698fcb0] page_fault at ffffffffb70010fe
    [exception RIP: amp_read_loc_assoc_final_data+63]
    RIP: ffffffffc06ab54f  RSP: ffffb9e08698fd68  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff8c8845a5a000  RCX: 0000000000000004
    RDX: 0000000000000000  RSI: ffff8c8b9153d000  RDI: ffff8c8845a5a000
    RBP: ffffb9e08698fe40   R8: 00000000000330e0   R9: ffffffffc0675c94
    R10: ffffb9e08698fe58  R11: 0000000000000001  R12: ffff8c8b9cbf6200
    R13: 0000000000000000  R14: 0000000000000000  R15: ffff8c8b2026da0b
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 ivanmeler#7 [ffffb9e08698fda8] hci_event_packet at ffffffffc0676904 [bluetooth]
 ivanmeler#8 [ffffb9e08698fe50] hci_rx_work at ffffffffc06629ac [bluetooth]
 #9 [ffffb9e08698fe98] process_one_work at ffffffffb66f95e7

hcon->amp_mgr seems NULL triggered kernel panic in following line inside
function amp_read_loc_assoc_final_data

        set_bit(READ_LOC_AMP_ASSOC_FINAL, &mgr->state);

Fixed by checking NULL for mgr.

Signed-off-by: Gopal Tiwari <gtiwari@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I8c92e8881caba89aa00813a213501af24f12b65f
fcuzzocrea pushed a commit to fcuzzocrea/android_kernel_samsung_universal8890 that referenced this issue Jul 11, 2021
commit 2800aadc18a64c96b051bcb7da8a7df7d505db3f upstream.

It's possible for iwl_pcie_enqueue_hcmd() to be called with hard IRQs
disabled (e.g. from LED core). We can't enable BHs in such a situation.

Turn the unconditional BH-enable/BH-disable code into
hardirq-disable/conditional-enable.

This fixes the warning below.

 WARNING: CPU: 1 PID: 1139 at kernel/softirq.c:178 __local_bh_enable_ip+0xa5/0xf0
 CPU: 1 PID: 1139 Comm: NetworkManager Not tainted 5.12.0-rc1-00004-gb4ded168af79 ivanmeler#7
 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017
 RIP: 0010:__local_bh_enable_ip+0xa5/0xf0
 Code: f7 69 e8 ee 23 14 00 fb 66 0f 1f 44 00 00 65 8b 05 f0 f4 f7 69 85 c0 74 3f 48 83 c4 08 5b c3 65 8b 05 9b fe f7 69 85 c0 75 8e <0f> 0b eb 8a 48 89 3c 24 e8 4e 20 14 00 48 8b 3c 24 eb 91 e8 13 4e
 RSP: 0018:ffffafd580b13298 EFLAGS: 00010046
 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 0000000000000000
 RDX: 0000000000000003 RSI: 0000000000000201 RDI: ffffffffc1272389
 RBP: ffff96517ae4c018 R08: 0000000000000001 R09: 0000000000000000
 R10: ffffafd580b13178 R11: 0000000000000001 R12: ffff96517b060000
 R13: 0000000000000000 R14: ffffffff80000000 R15: 0000000000000001
 FS:  00007fc604ebefc0(0000) GS:ffff965267480000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055fb3fef13b2 CR3: 0000000109112004 CR4: 00000000003706e0
 Call Trace:
  ? _raw_spin_unlock_bh+0x1f/0x30
  iwl_pcie_enqueue_hcmd+0x5d9/0xa00 [iwlwifi]
  iwl_trans_txq_send_hcmd+0x6c/0x430 [iwlwifi]
  iwl_trans_send_cmd+0x88/0x170 [iwlwifi]
  ? lock_acquire+0x277/0x3d0
  iwl_mvm_send_cmd+0x32/0x80 [iwlmvm]
  iwl_mvm_led_set+0xc2/0xe0 [iwlmvm]
  ? led_trigger_event+0x46/0x70
  led_trigger_event+0x46/0x70
  ieee80211_do_open+0x5c5/0xa20 [mac80211]
  ieee80211_open+0x67/0x90 [mac80211]
  __dev_open+0xd4/0x150
  __dev_change_flags+0x19e/0x1f0
  dev_change_flags+0x23/0x60
  do_setlink+0x30d/0x1230
  ? lock_is_held_type+0xb4/0x120
  ? __nla_validate_parse.part.7+0x57/0xcb0
  ? __lock_acquire+0x2e1/0x1a50
  __rtnl_newlink+0x560/0x910
  ? __lock_acquire+0x2e1/0x1a50
  ? __lock_acquire+0x2e1/0x1a50
  ? lock_acquire+0x277/0x3d0
  ? sock_def_readable+0x5/0x290
  ? lock_is_held_type+0xb4/0x120
  ? find_held_lock+0x2d/0x90
  ? sock_def_readable+0xb3/0x290
  ? lock_release+0x166/0x2a0
  ? lock_is_held_type+0x90/0x120
  rtnl_newlink+0x47/0x70
  rtnetlink_rcv_msg+0x25c/0x470
  ? netlink_deliver_tap+0x97/0x3e0
  ? validate_linkmsg+0x350/0x350
  netlink_rcv_skb+0x50/0x100
  netlink_unicast+0x1b2/0x280
  netlink_sendmsg+0x336/0x450
  sock_sendmsg+0x5b/0x60
  ____sys_sendmsg+0x1ed/0x250
  ? copy_msghdr_from_user+0x5c/0x90
  ___sys_sendmsg+0x88/0xd0
  ? lock_is_held_type+0xb4/0x120
  ? find_held_lock+0x2d/0x90
  ? lock_release+0x166/0x2a0
  ? __fget_files+0xfe/0x1d0
  ? __sys_sendmsg+0x5e/0xa0
  __sys_sendmsg+0x5e/0xa0
  ? lockdep_hardirqs_on_prepare+0xd9/0x170
  do_syscall_64+0x33/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fc605c9572d
 Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 da ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 2e ef ff ff 48
 RSP: 002b:00007fffc83789f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 000055ef468570c0 RCX: 00007fc605c9572d
 RDX: 0000000000000000 RSI: 00007fffc8378a30 RDI: 000000000000000c
 RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
 R13: 00007fffc8378b80 R14: 00007fffc8378b7c R15: 0000000000000000
 irq event stamp: 170785
 hardirqs last  enabled at (170783): [<ffffffff9609a8c2>] __local_bh_enable_ip+0x82/0xf0
 hardirqs last disabled at (170784): [<ffffffff96a8613d>] _raw_read_lock_irqsave+0x8d/0x90
 softirqs last  enabled at (170782): [<ffffffffc1272389>] iwl_pcie_enqueue_hcmd+0x5d9/0xa00 [iwlwifi]
 softirqs last disabled at (170785): [<ffffffffc1271ec6>] iwl_pcie_enqueue_hcmd+0x116/0xa00 [iwlwifi]

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v12.0.0-rc3
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2103021125430.12405@cbobk.fhfr.pm
Signed-off-by: Jari Ruusu <jariruusu@protonmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Icfea01bc9ff259b537691aa2aaf7a1b97c9b3111
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant