TFTS 1: What NOT to do when your server crashes randomly.
I recently built a little server whose main task is to serve files. Bought a motherboard that had a decent amount of SATA ports, added two additional SATA controller cards and packed the whole case full of disks. As the server is at home, I didn’t bother with enterprise-grade hardware stuff.
Fun was had, but soon I noticed it crashing when doing heavy I/O. First, I suspected the mainboard, and some quick googleing recommended to turn off the Enhanced Halt State (C1E), among other some other new CPU features, as this apparently led the mainboard to hiccup. When I did this, the system seemed to be stable. Happily, I went on.
But soon the crashes started again. And most often, when heavy I/O was going on.Â As it’s not uncommon for cheap SATA chipsets to overheat, I now suspected one of the additional controllers to be the culprit, to be more exactly, the one which was used when accessing the “big” RAID in this system (the other one had just a few small disks connected and never moved a lot of data). What should have put me off was that it wasn’t exactly cheap back in the day, I never had problems with it, and while it wasn’t a big name brand, the manufacturer is still there and enjoys a good reputation. But I just figured that it’s age (over 3 years) to take its toll and bought a brand-new Adaptec controller.
Now the system was okay for a while, but as you may have guessed: crashes again. Desperately, I switched the second controller as well, but no dice.
Now that I had the controllers out of the picture, I wondered what else could crash the server that randomly. The RAM? Nah, c’mon, usually that’s solid. It either works or not. But still, can’t hurt to test.
After 2 minutes of memtest86+, I got errors left, right, and centre.
Morale of the story: Don’t believe the RAM you buy for your home systems has the same reliability as the enterprise RAM you use at your employer’s server room. And always check the RAM first, if you can’t make out exactly what’s failing in your system.