Thursday, December 17, 2009
One More Time
Well, those were all great ideas. Or so I thought.
BH still hung, with the same symptoms. Back to the legendary drawing board.
Since time immemorial (the 90s), when I had a stack of 10mB NE2000 NIC clones, I learned a Wonderful Thing about Linux and NIC driver modules, specifically, of course, NE2000 NICs.
If you had a system with two NE2000s, without fail the kernel would only detect the first one at boot time (and, yes I tried boot time parameters). If you had one NE2000 and one SMC NIC, the kernel would detect them both. I eventually found that if you compiled the NE2000 driver into the kernel that both NICs would be detected. So that's what I did.
Time went by and the stack of NE2000s went into the landfill to be replaced by a stack of RealTek 100mB NICs (RTL8139s). I found the same issue and ran across another: not all RTL8139s were created equal. Some liked one driver (8139cp) and some liked another (8139too). I ended up building both drivers into the kernel because for one thing, you can't tell which NIC will like which driver just by looking at them (I have at least one, now marked with a Big Black "X", that doesn't like either).
This may not have been a good idea, although once again this is exactly how EXPIV is set up, except EXPIV has only one NIC.
So tonight I rebuilt the kernel with only the 8139too driver, which seems to be the preferred way to go from what I've been reading on the Web.
I am not optimistic, but one of the symptoms appears to be that the CPU is spending 100% of its cycles on IO when the system hangs, at least according to the Gnome panel system applet that runs on the desktop. That, combined with the odd network behavior BH shows when it's hung, leads me to believe the drivers may be confused about which one is in charge.
Maybe I'm grasping at straws at this point, but time will tell.