Saturday, December 26, 2009

EXP /// Risen From The Ashes


I dragged the old NetVista out of its resting place and experimented with dropping connections when players request a local file.

It doesn't look like it's gonna happen, boys and girls. Although there is a unique byte sequence at the beginning of every UT mod file, the server apparently doesn't start sending data exactly from byte zero. Sometimes it does, sometimes it doesn't. But mostly it doesn't. Scratch that idea (but I do have a Plan B).

This old box didn't take well to sitting in a corner for the past year. Booting it was iffy. Sometimes it took six power on/off cycles to come up, so on Christmas Morning, in true Busman's Holiday style, I replaced the boot drive, which was a tiny 6G Maxtor with a manufacturing date of 10/30/1998.

Whoa. No wonder it had problems booting.

I replaced it with a 20G Maxtor that had "GOOD" written on it in Sharpie marker. That's the only way I can keep track of these things anymore (I have a huge box of "BAD" Sharpie'd CD ROM drives in the basement. I'd throw them out but the spiders think they're upscale condos).

I used Clonezilla to copy the drive. It was the first time I've ever used it and was quite pleased withe the result, although the user interface leaves much to be desired.

I moved the /var partition to the "new" drive and put the box back together. Then just for the Hell of it I upgraded the kernel to 2.6.32.2 (yes, while you weren't looking they up-revved it twice). Re-compiling the kernel killed about four hours.

Now it runs better than ever, with no sign of the DMA issues BOT House had, even though I raided its RAM to beef up the old iPaq I've been twiddling around with (it suffered a horrible fate when I attempted to upgrade it to Ubuntu 9.10, BTW).

The only complaint I had with the new kernel was that damned Linux penguin (Tux) appeared out of nowhere on the boot screen. I've always hated that little rat bastard ever since they made it the official logo (this is blasphemy, BTW) and I had one Hell of a time getting rid of the little fucker. It happens there's a little known kernel default option - logo.nologo - that shuts him off, although I've never had to use it on any other box.

So that got me wondering if I could put my own logo on the boot screen. It's possible, but you have to recompile the kernel to make it happen. I thought it would be cute to have a little Hinky head in there, but I just couldn't seem to get a good 80x80 pixel version in the required 16 color (not 16 bit color) console format. But I wasted hours playing around with it.

HOURS.

So anyway, that was my Christmas. I hope you had more fun than I did.

Thursday, December 24, 2009

More BOT House Twitter Pollution


I was somewhat amused to find BOT House's tweets echoed on this real estate page for Henrietta, Florida.



I captured the page for posterity.

Tuesday, December 22, 2009

70+ Hours On 2.6.32


I am declaring myself the victor in this epic battle against kernel 2.6.32!

Disabling DMA on all IDE interfaces did the trick. If BOT House was a file server I'd probably be pissed at the resulting loss in performance, but it's not, so I'm a happy camper. And if there has been a drop in performance I haven't noticed it (nor do I have a baseline to compare any tests against - tests which I haven't performed anyway).

UT Files seems to work OK. There is one music file in Classic ]I[ on one map (formerly) in BITCH House (DM-Clementine) that had issues with the remote versions, but it is a very fast redirector.

Stunningly fast.

In fact, it's almost too good to be true. I don't see it lasting for a long time, although the guy has over 1000 UT servers as clients. But, he depends (???) on donations, so how long can that last? The cheapest plan at his hosting provider is $75/mo ($900/year).

Maybe he makes the Big Bucks with ads (I wouldn't know, I haven't seen an Internet ad in years). Regardless, I have no Plan B if this guy goes down, which bugs the HELL out of me.

With that in mind I'm going to take another look at trashing the connections of players who download directly from my servers. I didn't find anything the first time around, but I have devised a better approach this time, using one of the many VMs I have with UT99 support built-in (or perhaps the old EXP III server itself - it hasn't been powered up in months and the last time it was powered up it was having "issues"). So that's back in the Master Plan.

I still plan to consolidate the GoDaddy servers onto their buggy Linux platform. That issue is really weird. Sometimes it redirects, other times it doesn't. To make matters worse, the log files available to me don't seem to bother to report a 302 redirect when it happens. This must be some kind of weird load-balancing hardware that they haven't ported to their "beefy" Windows servers yet (hence the "not a good long-term solution" comment by "Just Jonathan" of GoDaddy fame). Internally, something appears to be juggling IP addresses around. Although the IP of the server externally is 97.74.26.128, the logs show that the last octet is changing frequently. BUT the different address is not in the redirect responses. WTF is up with that?

ATTENTION: "Just Jonathan" at GoDaddy



I told you I'd have the #1 Google Search for "just jonathan godaddy" on Sunday and now I have it.

Enjoy.

You bastard!

Monday, December 21, 2009

Fuck GoDaddy and "Just Jonathan"


And the horse they road in on.

I went to The Unreal Admin Page originally to rant about GoDaddy and their suck-ass Tech Support but I ended up just browsing through the forums. While there, I found someone promoting an Unreal mod redirection service, ut-files.com.

Since they had all the files required for Classic ]I[ Online, I set them up as the redirector. I'm not really comfortable with this decision, because I can't stand relying on someone else to always to the Right Thing, another reason I set up the GoDaddy account originally. But since GoDaddy has demonstrated they are incapable of doing the Right Thing anyway... well, why not use ut-files.com?

I contemplated this switch for some time, but in the end the deciding factor was speed. UT Files turned out to be three times faster than GoDaddy's junky servers. And, there's no "Terms of Service" agreement at UT Files (or at least none that I can find so far).

So I opened an account and I'll be uploading the files they're missing that are required to run BITCH House (not many, actually), at which point I'll move it over. BOT House has nothing special besides FuckIdlers, my custom-hacked version of KickIdlers.

And apparently everything needed by the Too Many Mods server is there as well.

There's another part of the puzzle I'm looking into: dropping the connection when a player requests a file from poor, bandwidth-limited me instead of the redirection service (naturally, via iptables). If I can nail that one down, you can kiss lag good-bye forever.

UPTIME: Forty-Five Hours


This is the longest-running BOT House 2.6.32 kernel yet.

I was going to bounce it this morning "just because" but I think I'll let it run. Besides, Pinky Dink has the day off and I can have her bounce it if worse comes to worse.

I am looking into some 3rd party hosting options for the UT mods and in the meantime I've been twiddling file attributes at proxyobsession.net to see if I can get around the 302 redirects reliably, regardless of whether I'm violating GoDaddy's Terms of Service or not. That seems to be working, but it "seemed to be working" last time I messed around with it, too.

During all that messing around I discovered a few mod files were missing, so I uploaded them. One was BP4Handler7C.u.uz, which is part of UTPure. It's a file everyone needs when they play BH. Since it was missing it was probably the cause of some short-lived lag on the server.

Anyway, it's Monday morning. Gotta run.

Sunday, December 20, 2009

GoDaddy SCREWS THE POOCH AGAIN!


As if I don't have enough problems...

I just got off the phone with "Just Jonathan" of GoDaddy. He's not getting a favorable Customer Satisfaction survey from me.

Here's the scoop: as I mentioned earlier, I have two hosting accounts at GoDaddy. A new, Linux-based account and good old reliable mrhinkydink.com, which is Windows.

Up since 2005, mrhinkydink.com is where you get your UT mods from when you play on any of my UT servers. Since Day One, that was the reason I opened the account, and it has worked well for that purpose. However, getting it hosted on Windows was a mistake in the first place. Now, I want to put everything on the Linux server.

Sadly, it can't handle UT mods!

When you ask for a mod, your UT console sends an HTTP request - a regular, "nothing special" HTTP request like millions of other http requests made every microsecond - for the file. The Web server then sends you that file, no questions asked. Sounds easy? Maybe it's too easy. GoDaddy's Linux servers send UT a "302 Moved Temporarily" redirect back to UT, which it dies not understand.

For instance, if you ask for SkeletalChars.u.uz in the "utmods" folder, their Linux (Apache) server responds like this:

HTTP 302 Moved Temporarily
Location: /utmods/SkeletalChars.u.uz?29e15220

... whereas the Windows (IIS6) server responds:

HTTP 200 OK

...and sends the file. Notice the appended "?29e15220"? What the HELL is that? UT doesn't understand and drops back to downloading it from my server. I don't have the bandwidth for that. I never have had the bandwidth for that, which is why I got the GoDaddy account in the first place.

I explain the situation to "Just Jonathan", who puts me on hold while he goes to talk to Someone Who Should Know About These Things.

"Just Jonathan" comes back to tell me that BY SERVING UT MODS I'M VIOLATING THE TERMS OF THE SERVICE AGREEMENT and they don't have to support my issue. He says it's not a good "long term solution" (although I've been doing it for ALMOST FIVE YEARS NOW on a GODDAMNED WINDOWS BOX I NEVER WANTED IN THE FIRST PLACE) and if I really want to do this kind of thing I need to buy a dedicated hosting account.

Some "Tech Support" that is. Can't fix a problem? Upsell the sucker to a dedicated account and let him do his own fucking tech support.

UnFUCKINGbelievable.

Losing Battle?


Although uptime is now 20 hours - the best since the initial install last weekend - I am not optimistic about 2.6.32 on BH.

I jumped the gun yesterday and re-compiled for SMP & Hyperthreading. And I did a quick test to see if I could lose the SiS5513 IDE driver (which has ZERO native options for disabling DMA) and run with the generic EIDE driver. No such luck. That experiment left me with an unbootable system (easily repaired, though).

I also considered the possibility that the drive is going bad. Since it supports S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) natively I downloaded a few tools to see what the drive had to say about itself. It appears to be happy and healthy, although the "uptime" value is blatantly wrong. It says it's been up for 191 days, but in reality that drive has been running 24x7 for almost two years now (do the math). If that's wrong, the health report may be bogus as well.

The thought occurred to me to check a Release Candidate (RC) version of 2.6.33, but I ran across this little bit of bad news in the ChangeLog...

The current Kconfig text for CONFIG_IDE doesn't give a hint to users that this subsystem is currently in maintenance mode and isn't actively developed.

Yes, kiddies, IDE is nearly extinct. This motherboard doesn't do SATA and since it only has a grand total of ONE PCI slot (currently occupied by the second NIC), SATA isn't going to happen. But the thought occurred to me that someone, somewhere must make a combination SATA/Ethernet card so I started searching and actually found one.

One.

And it's no longer in production, probably because it could do SATA or Ethernet, but not both at the same time.

So right now I'm waiting on the next crash. In preparation for that I think I'll fire up a Deb 4 VM and see what I can get out of the latest stock Debian kernel.

Friday, December 18, 2009

So much for THAT theory...


And now for something completely different.

I had set up netconsole to monitor BH's problem remotely. Netconsole sends debugging messages across the network to another system so you can see what's happening up to the moment of a crash.

And here we have it:

hda: ide_dma_sff_timer_expiry: DMA status (0x20)
hda: DMA timeout retryhda: timeout waiting for DMA
hda: DMA timeout: status=0x58 { DriveReady SeekComplete DataRequest }
hda: possibly failed opcode: 0x35
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: possibly failed opcode: 0x35

hda: drive not ready for command
ide0: reset: success

It's not a NIC driver issue after all. It's the drive hanging. Or rather, some kind of DMA issue with the drive and the controller.

So it's back to the drawing board.

But right now I have shut off DMA on the drive manually, so I expect things to be OK while I re-hack the kernel.

Looking Good...


... so far everything seems O.K.

Uptime is over 20 hours. If this works out, I'm going to re-hack the EXPIV kernel and get rid of the 8139cp driver on it as well. Even though there aren't any visible problems there, better to get rid of it if it's not needed.

If BOT House makes it until Sunday, I'm going to re-hack the BOT House kernel to re-enable SMP and Hyperthreading, since it was never an issue in the past.

I took some time to browse the code for both RTL8139 drivers to see if there are any obvious clues to problems like the ones I have been having. Nothing in there jumps out at me. They seem to share a lot of common code, but at first blush the 8139too driver appears to be not quite as complex as the 8139cp driver.

But who knows?

Last night after I took out the 8139cp driver, I had to wait out a bunch of players. It took over THREE HOURS for everyone to leave, but it was worth the wait. Since I didn't have anything to do but wait, I jumped in and played too. It was a good crowd until Koga showed up and started beating everyone's asses. She managed to clear everyone out pretty quick.

I knew she was good for something.

Thursday, December 17, 2009

One More Time


Well, those were all great ideas. Or so I thought.

BH still hung, with the same symptoms. Back to the legendary drawing board.

Since time immemorial (the 90s), when I had a stack of 10mB NE2000 NIC clones, I learned a Wonderful Thing about Linux and NIC driver modules, specifically, of course, NE2000 NICs.

If you had a system with two NE2000s, without fail the kernel would only detect the first one at boot time (and, yes I tried boot time parameters). If you had one NE2000 and one SMC NIC, the kernel would detect them both. I eventually found that if you compiled the NE2000 driver into the kernel that both NICs would be detected. So that's what I did.

Time went by and the stack of NE2000s went into the landfill to be replaced by a stack of RealTek 100mB NICs (RTL8139s). I found the same issue and ran across another: not all RTL8139s were created equal. Some liked one driver (8139cp) and some liked another (8139too). I ended up building both drivers into the kernel because for one thing, you can't tell which NIC will like which driver just by looking at them (I have at least one, now marked with a Big Black "X", that doesn't like either).

This may not have been a good idea, although once again this is exactly how EXPIV is set up, except EXPIV has only one NIC.

So tonight I rebuilt the kernel with only the 8139too driver, which seems to be the preferred way to go from what I've been reading on the Web.

I am not optimistic, but one of the symptoms appears to be that the CPU is spending 100% of its cycles on IO when the system hangs, at least according to the Gnome panel system applet that runs on the desktop. That, combined with the odd network behavior BH shows when it's hung, leads me to believe the drivers may be confused about which one is in charge.

Maybe I'm grasping at straws at this point, but time will tell.

Wednesday, December 16, 2009

This Shit Is Killing Me


As you may have realized by now, the glowing reports about all being fine with the kernel upgrade on BOT House were premature.

BH is now choked. It's not taking new connections. Oddly enough the old connections are working fine. Odder still, anything that passes through BH (which, among other things, is the firewall here on DinkNet) is hunky-dory.

In fact I'm passing through BH right now to blog about this happy horseshit.

BH survived the initial upgrade for about 36 hours. When it died Monday morning (when I was absolutely unprepared to do anything about it), everything else on the network was humming along fine. I bounced it and things were, again, fine. Sometime during the day it choked again, but my connections from work to home stayed up. But I could do nothing on BH remotely.

This really pissed me off, but it made me all the more determined to figure out WHAT THE FUCKING FUCK IS WRONG WITH THIS FUCKING BOX.

Monday night when I bounced it again I started to simplify the firewall rules. And I installed conntrackd to do some statistics on the firewall.

Tuesday morning, all was well. I zipped off to work, nearly getting killed in the process (long story - almost got run off the road but I swerved clear and the car that almost hit me hit someone else and they both ended up crashing into the restraining wall), sat down at my desk and at about 10AM everything died again.

Plus, my connection from work went with it. I could not reconnect, but, again, everything going through BH was fine, even new connections. It was starting to look more and more like a firewall issue.

So Tuesday evening I took a closer look at everything. I shut down the proxy server on BH, which left ssh and nfs as the only services on the box (besides UT, that is).

I played a few EXCELLENT rounds of UT on BH (people started hitting it as soon as it was back up) and hit the sack at about 11:30PM.

I woke up the next morning (Wednesday) to find the box fucked again. It appeared everything choked right after midnight.

In fact it was becoming clear that every time it choked, it was at xx:02 AM or PM, which is meaningful since that is when the proxy project box does its all of its dirty work (this particular system continues to crank away while BH is down, BTW).

So I bounced the box, went off to work, and the thing dies once again on the hour of 10AM plus change.

This time around I had set up an alternate, pass-through ssh connection so I don't get locked out like Tuesday. It tunnels through BH directly to the box running EXP IV, which still shows no adverse reaction to the same kernel upgrade (apples to oranges? Same everything except it's a 64bit AMD dual core and BH is a 32bit Intel single core... hmmm...).

So that's where we were at Wednesday. Down.

When I got home Wednesday, I noted the time of the last conntrack log (once again "on the hour"), rebooted, and sat down to generate another 2.6.32 kernel image, which takes about two hours with the Debian make-kpkg tool.

This time around I took out SMP (Symmetrical Multi Processing) and Hyperthreading support (actually, hyperthreading simply disappeared as an option after SMP was removed). It is a single CPU box, after all, but it is a P4 and SMP support never seemed to matter in kernels past. While it was cranking away at the code I hit BH to see how that affected performance of UT, since building the kernel was chewing up most of the CPU cycles. No problems there.

Once the kernel & modules were built and installed, I had to rebuild iptables, ipset, conntrackd support and tools/libs, and reboot one more time.

Now, on Thursday (12/172009), BH has been running for a little less than twelve hours. Whether it will keep running is anyone's guess. I am optimistic that removing SMP was the way to go, since EXP IV, a true multicore system, has had zero problems with this kernel version.

And now I can get back to my other projects, like messing around with the TOO MANY MODS Windows UT99 server (I will be taking requests, so if you have some favorite maps or other UT99 extensions, let me know).

One slightly positive outcome of all this is that I leveraged a private (at the moment) Websense hack to get the pass-through connection back to EXP IV to work through the corporate proxy. It's an extremely small and elegant hack for completely bypassing monitoring and filtering that I've been working on for a few months now. I have contacted Websense but they seem to be ignoring me. It may be a unique flaw in our environment (I suspect it could be the Microsoft ISA servers), but my testing facilities are limited. We have a Websense upgrade scheduled for January and if the hack survives the upgrade I plan on shoving it up Websense's ass.

Or, I may just keep it in my private toolbox.

Sunday, December 13, 2009

Classic ]I[ Files Moved


Since August I've been running two hosting accounts at GoDaddy, www.mrhinkydink.com and proxyobsession.net. Proxy Obsession is WordPress, which I really like. Mr. Hinky Dink is (ugh) some kind of FrontPage compatible Windows box, which was a mistake from Day One.

It seems I clicked the "back" button when I was signing up and the account defaulted back to Windows (I wanted Linux). But since back then (2006) all it was doing was holding UT mod files I didn't bother with moving it to Linux like I wanted in the first place.

Anyway, I want to change all that and move everything to the Linux box and consolidate the two accounts/domains.

With that in mind today I moved the Classic ]I[ files to the Linux box. Soon I hope to have everything moved, but it's a lot of files and transferring them really sucks up the UT bandwidth. Downloading them isn't bad, but uploading is a KILLER.

And it turns out you can't simply transfer files between boxes at GoDaddy. You have to download/upload to move things around. Luckily the Proxy Obsession box has SSH/SFTP, so I can trim the speed down to a rate (~64kBps) that still lets people play UT without (much) lag. Naturally, it takes longer but I have the time to waste.

The upgrades appear to have gone without incident. It's now been over 24 hours and everything is hunk-dory so far.

Let's hope it stays that way.

UPDATE 13:40 EST


The Classic ]I[ test worked well, so I'm moving everything else over to Proxy Obsession. But what an incredible pain in the ass this has been!

The fact that it's slow doesn't bother me (it's been an hour and a half so far and I'm only halfway done), but GoDaddy keeps dropping the SSH connection on me. At least WinSCP is smart enough to pick up where it left off, but Jesus Fuck this is annoying. I hope it's not a sign of things to come. If it's OK with the small transfers (the Proxy List, the Map, etc.) I'll be a happy camper.

If this goes well, I may put up a server I use locally. It's called "Too Many Mods" and - you guessed it - it has a lot of mods. It's also a Windows server, which could get ugly, but I've been wanting to experiment with it. All the scripts & shit still work, since they get all their data from the Web interface anyway. It runs on the box that runs the Linux VM that generates the Proxy List. A few more CPU cycles shouldn't be an issue.

FWIW, since I wrote this I ran across a "green ghost" in EXP IV. It might have something to do with the upgrade, but it went away after the game was over. I think I remember those green artifacts from when the new EXP IV box went online earlier this year, but I'm not sure.

Saturday, December 12, 2009

What A Pain In The Ass


Linux kernel 2.6.32 + ipset v4.1 + iptables 1.4.6 are now current on BOT House and EXP IV.

Just like last time, the upgrade went without a hitch on EXP IV. Everything is smooth as silk. No issues at all. Period. Well, maybe one. Once again, the NVidia driver had to be replaced because of the kernel upgrade, but that too compiled on the first shot and installed without a problem.

BOT House, on the other hand, would not cooperate.

Sure, the kernel compiled on the first attempt, but then the fun began. The NFS (Network File System) server choked on reboot. This was not good because EXP IV gets all its scripts from an NFS share on BOT House. This turned out to be the Debian NFS start-up script. Fixed. Rebooted. Then iptables wouldn't come up. It suddenly didn't like the order it was started in, so I changed that. Bingo.

BOT House took about four and a half hours to upgrade, two and a half for the kernel build and another two hacking around with the side issues. Now if it stays up without a problem for 24 hours I will be a happy fella. The upgrade to 2.6.31 in September was fine until the next day, when the whole system ground to a halt. I think I did something silly during the configure back then, so this time around I used the stock Debian config. Only time will tell.

Anyway, I'll be playing on and off the whole weekend. See you there.

Friday, December 11, 2009

iptables-1.4.6


Here we go again.

EXP IV now has the latest, allegedly greatest, version of both ipset and iptables. And of course kernel 2.6.31, which is now (naturally) .01 revision behind the times. Or .8 revisions, depending on how you look at it (it went through 7 revisions during its lifetime, the last being 2.6.31.7).

BOT House had some serious issues with 2.6.31 and I have been threatening to try to upgrade it again for some time now. This looks like a good weekend to do so. It's been running a stock Debian kernel (2.6.28.5) without ipsets for a long time now and it bugs the HELL out of me.

None of this should impact playing, but there's always the possibility of the IP address of the server changing.

Keep that in mind if you can't connect.

Monday, December 07, 2009

Google Pollution via Twitter+UT99


Try this search for ZSnDYcunt on Google.

Great lulz.

Of course, ZSnDYcunt (ZSnDY for short) is a bot on EXPIV. Only the first two Google hits are auto-tweets from the server, the rest are various other (sometimes bizarre) sites that have picked up on the tweets from EXPIV for whatever reason.

Real interesting. Just goes to show there's a lot of crap on the Web. I'm glad to know at least some of it is mine!