kevin gilmore Posted November 21, 2007 Report Posted November 21, 2007 So in this politically correct society we now all live in, a moron as in 1. A stupid person; a dolt. 2. Psychology A person of mild mental retardation having a mental age of from 7 to 12 years and generally having communication and social skills enabling some degree of academic or vocational education. The term belongs to a classification system no longer in use and is now considered offensive. is considered to be unacceptable. I'm sure that there may be acceptable ways to say the same thing, but that is not the point. Now i have called Jude a Maroon. Which by many definitions is also considered offensive. My definition comes from the bugs bunny page on wikipedia, which has since been deleted. 1) A person who acts like a Moron when they clearly are NOT so. Jude like him or hate him is clearly a person of above average intelligence. So what (politically correct term) do you call someone that runs a major website with great influence that has no hot failover capabilities and no backups? It is clear that this is going to cost Jude huge and in many different ways for a significant amount of time. And it is not necessarily over yet either. Will jude learn from this, i certainly think so.
ph0rk Posted November 21, 2007 Report Posted November 21, 2007 So what (politically correct term) do you call someone that runs a major website with great influence that has no hot failover capabilities and no backups? I'd say amateur. Though from my time working for the man it was at times hard to get companies with deep pockets to foot the bill for a proper backup system, so it isn't entirely surprising. I'd have thought his colo provider would have sold the hell out of a backup solution, though.
spritzer Posted November 21, 2007 Report Posted November 21, 2007 I would call him reckless as he put the fate of his company in the hands of fragile electronics with no backup. All electronic components will fail and it is pretty irresponsible to assume otherwise even if it was for only a short while during upgrades.
slwiser Posted November 21, 2007 Report Posted November 21, 2007 When I was designing a web site for myself and another guy for a database oriented site I planned on having two locations both in diverse locations just to keep any outages from hitting both with automatic dual data sets at both locations. The thought was that if anything and I mean anything happened that I would have one site working even if it were slower. Oh! well, we all can't do what it optimum at the right time. I am sure Jude will not allow this to happen again. At least in the same manner.
flashbak Posted November 21, 2007 Report Posted November 21, 2007 His status messages early on indicate he indeed have backups, but they were dated Looks like he gambled and lost. However with 24 drives incorporated in that disk array one would have thought that controller redundancy by the vendor would have been part of their hardware strategy!
ph0rk Posted November 21, 2007 Report Posted November 21, 2007 The problem with a raid array as redundancy is it won't protect you from an errant DROP TABLE command or the like - good backups are as much for user error as for hardware. something like "drop table posts, users, attachments;" would make even a raid 16 array useless as it would just mirror the change instantly.
tom_hankins Posted November 21, 2007 Report Posted November 21, 2007 I'm sure he has learned his lesson. Could aragance have played a part in any of this? Or do you think he was sitting around with his fingers crossed until he got to it?
mjg Posted November 21, 2007 Report Posted November 21, 2007 being careless at times does not make one a moron. I don't think he is any kind of superlative person, just a normal human being that makes mistakes. He does have a preachy, self rigeous, and some what hypocritical attitude, but having met the guy personally, and knowing about successful people in general, most are I feel. You need to have strict fiber, drink plenty of your own kool aide, and step on lots of toes to be the best.
n_maher Posted November 21, 2007 Report Posted November 21, 2007 My 2c is that the recent debacle was really just error by omission and a tendency that most of us have to stick with what has worked in the past and not further complicate things, even if they really should be more complicated. There's no doubt that Jude has learned a painful lesson, I hope not too painful.
grawk Posted November 21, 2007 Report Posted November 21, 2007 I'm nowhere near perfect enough to throw stones over this. I feel his pain, and hope it doesn't hurt him too much long term.
pabbi1 Posted November 21, 2007 Report Posted November 21, 2007 Well, after having my IT fucktards crap out my 'old' external portal over the past weekend, and working late into the night last night to recover it (we won't even get into why an IT group is hacking code in a production site without dev <sigh> - petty fucking politics), I feel his pain yet again. Another vacation ruined by my 'team mates'. Unilateral fuckups are now being considered 'team building' and 'character'. If I deliver shit for code, it is a cataclysm. Surprised he hasn't put up an announcement drafted by legal counsel... this is really bad for him. My experience in systems has always been that it isn't how bad you screw up, but how fast you recover. Slow recovery in this case may be lethal.
mjg Posted November 21, 2007 Report Posted November 21, 2007 all the more reason why i'm glad I don't do IT stuff anymore, at all. what a skull fuck. What I do is easy in comparison to dealing with disasters like u guys do.
JBLoudG20 Posted November 21, 2007 Report Posted November 21, 2007 My thought mirror those of Nate and Dan. I wish Jude the best of luck in recovering all his important data.
Dusty Chalk Posted November 21, 2007 Report Posted November 21, 2007 I'm nowhere near perfect enough to throw stones over this. Pretty much where I am. That said, I can't argue with Kevin -- one should be able to call someone out for running websites so shabbily. It should be said -- I know everyone's thinking, "oh, he probably learned his lesson", but there's a difference between knowing something, and hearing someone else say it to your face. Repeatedly. Ad nauseum. Lots of different people. I suspect the correct phrase is "lazy ass". It was laziness that made him not do something -- anything -- about his backups. He knew better, and he just didn't do it.
ph0rk Posted November 21, 2007 Report Posted November 21, 2007 In my experience, it takes a data loss of some significance before most people take backups seriously. The idea of a hard drive going tits-up is the furthest thing from their mind. Jude was just unlucky enough in this case for the data loss to be very public .
pabbi1 Posted November 21, 2007 Report Posted November 21, 2007 Many of us who do have paying customers hitting the site (mine: 18k ++) are always publicly huimiliated, and have the marketecture ready to roll (Welcome to our site - Sorry about your impetuous nature) when facing these type things. But, a few million for a hot backup is where the rubber meets the road, especially when we aren't talking x? millions in revenue. Sometimes, like this week, many are on vacation, and not really pounding us. Unfortunately, we do eat our own dogfood, and use the site for services engagements and managing subcontractors, so maybe my management will wake up and fork over. Sad part is they entrust this to the twits who are fucking us over time and again - maybe they feel it is just good $,$$$,$$$ after bad.
archosman Posted November 21, 2007 Report Posted November 21, 2007 Isn't it possible that the company hosting the servers could have promised him that they would be doing backups... and didn't? Or would that be his responsibility?
tkam Posted November 21, 2007 Report Posted November 21, 2007 Isn't it possible that the company hosting the servers could have promised him that they would be doing backups... and didn't? Or would that be his responsibility? He co-locates the hardware, which means he pays the provider for rack or cabinet space, power and bandwidth. Everything else is his responsibility.
aerius Posted November 21, 2007 Report Posted November 21, 2007 Cheapass beancounter who doesn't learn. Head-fi has gone down more times than I can count, and he never bothered to spend the time & money fix it right. Rather, it was one half-ass patch job after the other until the whole house of cards came crashing down. I have no sympathy for him, a crash like this one was inevitable and he took no steps to avoid it, which is completely inexcusable given the past downtime and bug history of Head-fi.
pabbi1 Posted November 21, 2007 Report Posted November 21, 2007 Too late to edit... lest anyone think a real moron _can_ do this stuff, here is a simplified diagram of my newest system (the other isn't quite as robust), missing a few bolt on pieces and other system (domain) interractions. We will grow this until we hit 20 servers, then outsource the entire mess. Affinity, load balancing, and firewalling increase the complexity. http://home.swbell.net/pabbi/Picture1.jpg
itsborken Posted November 21, 2007 Report Posted November 21, 2007 His status messages early on indicate he indeed have backups, but they were dated Looks like he gambled and lost. However with 24 drives incorporated in that disk array one would have thought that controller redundancy by the vendor would have been part of their hardware strategy! Controller redundancy won't help when the controller doesn't outright fail. If they are running 2x 12 bay raid array with RAID6 there is a good bit of cost there and I'd expect it to have redundant controllers as a given.
itsborken Posted November 21, 2007 Report Posted November 21, 2007 The problem with a raid array as redundancy is it won't protect you from an errant DROP TABLE command or the like - good backups are as much for user error as for hardware. something like "drop table posts, users, attachments;" would make even a raid 16 array useless as it would just mirror the change instantly. NAS snapshots are normally used to recover from this situation but when the controller flakes out the production volume and snapshots will be equally hosed. The only thing that saves one from severe controller corruption is a tape backup from a time before the corruption began. It is possible for corruption to go on for a while before anyone notices it's severity. Depending on the depth of the backup pool, The last good set may be overwritten before the problem is noticed. Oh, and as people don't generally test their backups by restoring them, people go through the motions only to find out the tapes are worn out/tape drive marginally faulty, etc. and the backup tape can't be recovered. People and companies tend to neglect Murphy way too often.
pabbi1 Posted November 21, 2007 Report Posted November 21, 2007 We do test the disaster recovery on some of our AS/400 based systems with IBM, as they have a facility to do exactly that in Colorado somewhere. Microsoft _will_ in one of the Technology Centers, IFF you are a big enough fish to warrant it, or can shoehorn it in on some other technology research 'adventure'.
archosman Posted November 21, 2007 Report Posted November 21, 2007 He co-locates the hardware, which means he pays the provider for rack or cabinet space, power and bandwidth. Everything else is his responsibility. oops...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now