-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stm32] Prevent SPI RX FIFO overflow in SPI master #1223
Conversation
In full duplex mode (only mode supported by SpiMaster), every transmitted byte will produce a corresponding received byte which is pushed into the RX FIFO with some delay. The existing TX loop does not limit outstanding transactions to ensure they will fit within the RX FIFO. If the SPI PHY transmits a frame while the HAL is in its TX loop, it will freely push more TX frames until the TX FIFO is full. These extra frames may overflow the RX FIFO if the HAL hasn't gotten to popping incoming frames by the time they are received. This could be solved by checking for the SUSP (suspended to prevent overflow) during TX or by using the DXP bit to coordinate transactions instead of TXP/RXP. This commit instead chooses to track the number of outstanding frames using the existing counters and prevent transmit if there are more outstanding frames than would fit in the RX FIFO. This implementation assumes 8-bit/one-byte SPI data size, which is an existing limitation of the SpiMaster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good catch! Not considering the RX FIFO capacity is clearly a bug.
Setting
According to the reference manual it seems the DMA implementation has a similar failure mode. The SPI DMA subsection in the H723/5 reference manual states:
It could be worth adding some RX overflow handling to the SPI DMA driver. I'd first check what the CubeHAL does. However, it seems a fair assumption that this condition is less likely to occur in practice. To create this condition a steady stream of TX data would need to be delivered to the peripheral, while RX data is comparatively slow to be transferred. In most cases TX and RX DMA use the same data path. I'd assume whatever would let the RX buffer overflow would also stall transmissions. |
My reading of the Cube implementation is that it does the same as the modm one. It configures RX, configures TX, then initiates the transfer. It doesn't look to check the SUSP bit or do any special handling of the SUSP interrupt until EOT. So as far as I can tell they also make the assumption that the RX stream can't backpressure the TX. If my reading is correct I figure the existing DMA implementation in modm is fine.
It seems to me that all the AHBs and whatever clock the built-in SRAMs are on are locked together; the clock tree doesn't have any dividers after HPRE. So I guess this couldn't be done with SRAM1-SRAM4. Are you thinking of an external SRAM? Or am I misunderstanding the reference manual? |
You're right. I somehow misremembered how the clock tree looks like. My suggestion wouldn't work. |
Follow-up to 678fd9a and modm-io#1223. It seems that once the RX FIFO has been cleared and emptied, the next transmitted byte always enters SUSP mode. It is unclear why this happens. When MASRX is not set, neither SUSP nor OVR are seen for the same transmit sequence.
I was doing moderately high bandwidth streaming over SPI on an STM32H7. When I added some UART debug printing elsewhere in my application, it caused an indefinite hang in the SPI transfer() routine in this loop:
modm/src/modm/platform/spi/stm32h7/spi_master.cpp.in
Lines 73 to 84 in 28c87e4
I could see txIndex had run far ahead of rxIndex (>30 bytes). The TX loop had all those 30+ bytes in its first execution without the RX loop getting a chance to pop anything. The peripheral had hit an overrun and entered suspended state, meaning the TX FIFO had completely backpressured and the loop was spinning waiting for RX frames which were not forthcoming.
After some investigation I think there are two bugs/oversights in the SpiMaster implementation.
I believe the UART byte-by-byte TX interrupts were occurring while in the SPI transmit loop, delaying execution enough for the SPI peripheral to get ahead.
This PR includes a fix for (1) by computing the number of outstanding bytes and bailing out of the TX loop when it hits the RX FIFO capacity. This seems to match the Cube HAL implementation.
The non-H7 STM32 driver does a byte-by-byte transfer, which means it shouldn't be affected by this bug. I took a cursory look at the H7 DMA implementation and it isn't obvious to me whether it has a similar failure mode. It seems the operating theory of the DMA implementation is that the RX DMA will clear the FIFO faster than the RX FIFO is populated. I don't see any explicit overflow handling. So it might make sense to detect a suspension and resume the TX DMA once RX is flushed. Or it might be unnecessary if the above assumption holds true. Since TX and RX DMA are happening in parallel, it seems fair to me.
I don't have a great understanding of the STM32 SPI IP so I would appreciate a double-check that my diagnosis and fix are appropriate. I haven't done any testing beyond confirming that my Fiber-based test application no longer hangs and produces the expected result. I would also appreciate any thoughts on (2) above. Also, I would appreciate any suggestions on how to reliably hit this condition in a unit test.