Poul-Henning Kamp
2024-05-30 06:42:07 UTC
We perhaps could gracefully handle such lengthy buffer IO operations by
adding a timeout in bwait() - like say 10 minutes. If the buffer IO is not
completed in a few mins, it probably would not complete forever and/or
would be slowing down the entire system. So it is better to stop such
faulty IO operations.
I agree that the symptoms are bad, but disagree about putting a workaroundadding a timeout in bwait() - like say 10 minutes. If the buffer IO is not
completed in a few mins, it probably would not complete forever and/or
would be slowing down the entire system. So it is better to stop such
faulty IO operations.
in bread(), because you get system corruption if the I/O operation
completes anyway after the timeout.
The fundamental issue with timing out I/O, is stopping the operation
in progress.
If you do a "I'm not waiting for this any more", you have to sequester
the destination of the I/O operation, until you have 100% confirmation
that the operation has either been completed or sucessfuly neutered.
(As a policy choice, you may also want to write-protect the source.)
This is why hi-rel systems never allow direct(-mapped) I/O: By
insisting that data go through dedicated I/O buffers, failing buffers
can be sequestered as long as necessary, without complicating the
application logic.
Before Virtual Memory, the UNIX buffer-cache worked that way, and
MERT did that. (MERT = Early five-nines UNIX for telephone switches.)
Between "intelligent I/O controllers" with DMA access, virtual
memory and direct-mapped I/O, we /have/ to make sure the underlying
I/O operation is /guaranteed/ dead, before we wake up the thread.
The only place that can and should happen is in the device driver,
possibly assisted by infrastructure such as CAM.
You need to find out which device driver is ultimately responsible
for the hanging bread(), because that's where the timeout should
happen.
Poul-Henning
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de