[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Full-disclosure] "IO wait chains" in Linux??



Thank you for your detailed reply!

Here's the kinda thing I was looking for (this is just a mockup):

21000 - /usr/local/sbin/nginx - [D]
 - /tmp/.somefile
    - other PIDs waiting on this file (not just children of the parent)
        - 51283 - /usr/local/sbin/apache (4.6 seconds)
        - 31028 - /usr/local/sbin/python2.6 (1.9 seconds)

Sadly, I don't know much about how the kernel and the IO schedulers handle
these things behind the scenes, so what I'm asking for may be impossible
(apart from your other suggestion using watchdog+dmesg).

On Mon, Feb 7, 2011 at 4:28 PM, <Valdis.Kletnieks@xxxxxx> wrote:

> On Mon, 07 Feb 2011 06:41:53 GMT, "Cal Leeming [Simplicity Media Ltd]"
> said:
>
> > Is anyone aware of a Linux based CLI equivalent, which will show the
> > processes stuck in IO wait, in a tree format?
>
> ps ax | grep ' [D] '   gives a pretty good approximation of "currently in
> I/O wait".
> But remember that each process (or actually, each thread within a process)
> can individually be stuck in I/O wait, so it's unclear what the "tree
> format"
> would consist of, exactly.  If you have a process that has parent,
> siblings,
> and children, what else would show up in the tree if it's in an I/O wait?
>
> There's the slightly more difficult issue that if you're trying to do
> system-level analysis, you're looking at really bad race conditions.
>  Processes
> often go into and leave I/O wait status in literally milliseconds.  At
> best,
> you can run through the process list several times and get a statistical
> view
> of "these 4 processes are in I/O wait most of the time".  'pstree' mostly
> avoids that issue because if the system is small enough that the pstree
> output
> is still useful, the fork/exec rate is low enough that pstree can mostly
> ignore
> it.  That's not true for I/O.
>
> If you're trying to identify processes that are truly and literally *stuck*
> in
> I/O wait due to a hardware or kernel error, you're probably better off
> enabling
> the watchdog timer in the kernel and watching dmesg for it triggering.
>
>
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/