Skip to content

I2C lost interrupt panics #2004

@mkeeter

Description

@mkeeter

On cadbury:c71, @rmustacc noticed Gimlet Issues when he connected a particular driver (??).

The thermal, gimlet_seq and i2c_driver generation counts are spinning up:

matt@cadbury ~ () $ export HUMILITY_TARGET=c71
matt@cadbury ~ (c71) $ h tasks
humility: attached to 0483:3754:001100184741500820383733 via ST-Link V3
system time = 563017
ID TASK                       GEN PRI STATE
 0 jefe                         0   0 recv, notif: fault timer(T+87)
 1 net                          0   5 recv, notif: eth-irq(irq61) wake-timer(T+429)
 2 sys                          0   1 recv, notif: exti-wildcard-irq(irq6/irq7/irq8/irq9/irq10/irq23/irq40)
 3 spi2_driver                  0   3 recv
 4 i2c_driver                4174   3 FAULT: stack overflow; sp=0x24058fa8 (was: ready)
 5 spd                          0   2 notif: i2c1-irq(irq31/irq32)
 6 packrat                      0   1 recv
 7 thermal                   2086   5 wait: send to i2c_driver/gen14
 8 power                        1   6 wait: send to gimlet_seq/gen20
 9 hiffy                        0   5 notif: bit31(T+128)
10 gimlet_seq                1044   4 wait: send to i2c_driver/gen14
11 gimlet_inspector             0   6 notif: socket
12 hash_driver                  0   2 recv
13 hf                           0   3 recv
14 update_server                0   3 recv
15 sensor                       0   4 recv
16 host_sp_comms                0   8 recv, notif: jefe-state-change usart-irq(irq82) multitimer(T+37) control-plane-agent
17 udpecho                      0   6 notif: socket
18 udpbroadcast                 0   6 notif: bit31(T+358)
19 control_plane_agent          0   7 recv, notif: usart-irq(irq37) socket timer
20 sprot                        0   4 recv
21 validate                     0   5 recv
22 vpd                          0   4 recv
23 user_leds                    0   2 recv, notif: timer
24 dump_agent                   0   6 recv, notif: socket
25 sbrmi                        0   4 recv
26 idle                         0   9 RUNNING
27 udprpc                       0   6 notif: socket

The actual crash is a stack overflow in i2c_driver, which is seemingly not found by our static detector (#1890). In the dump, it seems like it's an overflow while panicking, but I can't make sense of the full trace:

matt@cadbury ~ (c71) $ h -d hubris.core.i2c_driver.0 tasks -sl --guess
humility: attached to dump
system time = 117420
ID TASK                       GEN PRI STATE
 4 i2c_driver                   0   3 FAULT: stack overflow; sp=0x24058fa8 (was: ready)
   could not read registers: read of 32 bytes from invalid address: 0x24058fa8
   guessing at stack trace using saved frame pointer
   |
   +--->  0x24059028 0x080493d0 core::fmt::Formatter::pad_integral
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/mod.rs:1340
          0x24059028 0x080492fa core::fmt::num::imp::fmt_u32
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/num.rs:213
          0x24059028 0x08049444 core::fmt::num::imp::<impl core::fmt::Display for u32>::fmt
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/num.rs:313
          0x24059088 0x080493d0 core::fmt::Formatter::pad_integral
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/mod.rs:1340
          0x24059088 0x080492fa core::fmt::num::imp::fmt_u32
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/num.rs:213
          0x24059088 0x08049444 core::fmt::num::imp::<impl core::fmt::Display for u32>::fmt
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/num.rs:313
          0x240590e0 0x08049d06 core::fmt::write
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/mod.rs:1168
          0x24059128 0x0804a61a <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/result.rs:1985
          0x24059128 0x0804a5dc <core::panic::panic_info::PanicInfo as core::fmt::Display>::fmt
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/panic/panic_info.rs:145
          0x24059128 0x0804a61a <&T as core::fmt::Display>::fmt
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/mod.rs:2390
          0x24059180 0x08049d06 core::fmt::write
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/mod.rs:1168
          0x240591b8 0x0804a8bc rust_begin_unwind
                     @ /hubris/sys/userlib/src/lib.rs:1382
          0x240591b8 0x0804954a core::fmt::Formatter::pad_integral::write_prefix
                     @ /rustc/c52c23b6f44cd19718721a5e3b2eeb169e9c96ff/library/core/src/fmt/mod.rs:1361

(note that this is using --guess to use the saved frame pointer, so we may be able to stare at the memory by hand)

The I2C ringbuf shows a call to stm32xx_i2c::panic, which logged the whole state and then panicked:

 NDX LINE      GEN    COUNT PAYLOAD
  30  671     7016        1 WriteWait(ISR, 0x8020)
  31  671     7016        2 WriteWait(ISR, 0x8021)
  32  671     7016        1 WriteWait(ISR, 0x8061)
  33  736     7016        2 Read(ISR, 0x8021)
  34  736     7016        1 Read(ISR, 0x8025)
  35  779     7016        1 ReadWait(ISR, 0x8021)
  36  779     7016        1 ReadWait(ISR, 0x8061)
  37  555     7016        1 Wait(ISR, 0x21)
  38  641     7016        1 Write(ISR, 0x21)
  39  641     7016        2 Write(ISR, 0x8021)
  40  641     7016        1 Write(ISR, 0x8023)
  41  671     7016        1 WriteWait(ISR, 0x8020)
  42  671     7016        2 WriteWait(ISR, 0x8021)
  43  671     7016        1 WriteWait(ISR, 0x8061)
  44  736     7016        2 Read(ISR, 0x8021)
  45  736     7016        1 Read(ISR, 0x8025)
  46  779     7016        1 ReadWait(ISR, 0x8021)
  47  779     7016        1 ReadWait(ISR, 0x8061)
   0  555     7017        1 Wait(ISR, 0x21)
   1  641     7017        1 Write(ISR, 0x21)
   2  641     7017        2 Write(ISR, 0x8021)
   3  641     7017        1 Write(ISR, 0x8023)
   4  671     7017        1 WriteWait(ISR, 0x8020)
   5  671     7017        2 WriteWait(ISR, 0x8021)
   6  671     7017        1 WriteWait(ISR, 0x8061)
   7  736     7017        2 Read(ISR, 0x8021)
   8  736     7017        1 Read(ISR, 0x8025)
   9  779     7017        1 ReadWait(ISR, 0x8021)
  10  779     7017        1 ReadWait(ISR, 0x8061)
  11  555     7017        1 Wait(ISR, 0x21)
  12  641     7017        1 Write(ISR, 0x21)
  13  641     7017        2 Write(ISR, 0x8021)
  14  641     7017        1 Write(ISR, 0x8121)
  15  431     7017        1 Reset(CR1, 0x1000d6)
  16  440     7017        1 Reset(CR2, 0x100e2)
  17  555     7017        3 Wait(ISR, 0x8001)
  18  555     7017        1 Wait(ISR, 0x1)
  19  641     7017        3 Write(ISR, 0x8001)
  20  530     7017        1 LostInterrupt
  21  481     7017        1 Panic(CR1, 0x1000d7)
  22  482     7017        1 Panic(CR2, 0x120e0)
  23  483     7017        1 Panic(OAR1, 0x0)
  24  484     7017        1 Panic(OAR2, 0x0)
  25  485     7017        1 Panic(TIMINGR, 0x3060767f)
  26  486     7017        1 Panic(TIMEOUTR, 0x84c4)
  27  487     7017        1 Panic(ISR, 0x8001)
  28  488     7017        1 Panic(PECR, 0x0)
  29  491     7017        1 IrqStatus { notification: 0x1, pending: false, enabled: true, posted: false }

Cores are in /staff/cores/hubris-2004/

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions