IT Guy

From: Queeg 500 (JESUSONEEZ)30 Jan 2013 11:15
To: ALL1 of 19
Our IT guy is a knob.

I'm the Database Manager. Wednesday last week, my database went down (it is (or was) a 4D Server running on a oldish Mac XServe).

Very simply, "something" happened, the RAID card was beeping, he couldn't get it booting, couldn't get anything off the drives, couldn't get anything off the backups and everything was lost (this process took about a day and a half).

I look for a data recovery firm, and the recovered contents are winging their merry way back to me as I type.

It seems the following happened.

One of the drives in the RAID 5 array had failed probably months ago and he did nothing.
He also decided to get rid of our multi-tape backup solution and go cloud, except he didn't add the 4D Server contents. He assumedly thought the "just in case" external USB/Firewire drive backup of just the data files was sufficient.
At some point he didn't have enough sockets on the UPS units in the server room and unplugged my server and the USB drive from it and plugged them straight into the mains (you can see where this is going).

He thinks a power surge or something happened, and strangely it killed the RAID card, buggered the remaining hard drives and also the external USB which unbeknownst to me was the only bloody backup.

Do I get to kick him in the balls?
EDITED: 30 Jan 2013 11:16 by JESUSONEEZ
From: Dan (HERMAND)30 Jan 2013 11:16
To: Queeg 500 (JESUSONEEZ) 2 of 19
Yes.
From: Queeg 500 (JESUSONEEZ)30 Jan 2013 11:16
To: Dan (HERMAND) 3 of 19
Thanks. I will.
From: Serg (NUKKLEAR)30 Jan 2013 18:24
To: Queeg 500 (JESUSONEEZ) 4 of 19
Yes. Seconded.
From: Kriv30 Jan 2013 18:41
To: Queeg 500 (JESUSONEEZ) 5 of 19
Instant dismissal in our place, providing that hinderance of the data was the result of significant financial loss.
From: Ken (SHIELDSIT)31 Jan 2013 08:09
To: Queeg 500 (JESUSONEEZ) 6 of 19
Yeah that's just plain negligence. Kick him two times. Once in each nad.  
From: Ken (SHIELDSIT)31 Jan 2013 08:10
To: Queeg 500 (JESUSONEEZ) 7 of 19
Shit, looks like I need to update my signature to include ears as well!
From: Queeg 500 (JESUSONEEZ)31 Jan 2013 12:45
To: Ken (SHIELDSIT) 8 of 19
I think I'll show him this. I don't think he quite realises what all the fuss is about since he doesn't use the system day in day out like nearly everyone else.

I think he will when the board haul him in for a kicking in the next few days.

Data came back this morning and I'm currently running a bunch of tools (repair, re-index, compact and all the usual stuff) but it all looks good so far.

Thank fuck.

If we were running Windows 8 we'd no doubt be screwed in the panties and the ear.
From: Dan (HERMAND)31 Jan 2013 12:56
To: Queeg 500 (JESUSONEEZ) 9 of 19
I'm sure you have your reasons (old server being one) but RAID5 is generally considered bad practice now. 
From: Ken (SHIELDSIT)31 Jan 2013 13:21
To: Queeg 500 (JESUSONEEZ) 10 of 19
My SQL & Exchange servers are the two that I make absolutely sure are being backed up without issue.  Our SQL server has every bit of inventory and financials on it, we'd be straight up fucked if it died, and I'd imagine most other companies are that way.
From: Queeg 500 (JESUSONEEZ)31 Jan 2013 17:47
To: Dan (HERMAND) 11 of 19
Really? Wonder if my IT guy knows that. He may but probably doesn't give a shit given he ignored a dead drive in the array.

Why is it bad practice and what's replaced it as good practice?
From: Queeg 500 (JESUSONEEZ)31 Jan 2013 17:48
To: Ken (SHIELDSIT) 12 of 19
Financials are elsewhere, but he probably doesn't back that up either.
From: 99% of gargoyles look like (MR_BASTARD)31 Jan 2013 21:27
To: Queeg 500 (JESUSONEEZ) 13 of 19
Why is it bad practice and what's replaced it as good practice?

There's a shitload of stuff on why it's bad practice.

I'm still confused as to what's good practice though.

From: Serg (NUKKLEAR)31 Jan 2013 23:39
To: 99% of gargoyles look like (MR_BASTARD) 14 of 19
RAID1 or 10 and reliable backups at the right frequency, depending on how often the data changes. Real time data synch as well if it's that important.
From: Dan (HERMAND) 1 Feb 2013 10:22
To: Queeg 500 (JESUSONEEZ) 15 of 19
It's bad practice primarily because RAID5 can only sustain one disk failure. The problem with this is that when replacing the disk, you're putting a huge amount of strain on the others - meaning that the chances of another failure (And therefore losing the whole array) during the rebuild is surprisingly high.

This is compounded even more by the fact that people tend to buy all their disks from the same supplier at the same time - meaning they all come from the same batch so will probably fail at around the same time anyway.

Depending on use, RAID1, RAID6 or RAID10 are considered good day to day setups now.
From: ANT_THOMAS 1 Feb 2013 10:50
To: Dan (HERMAND) 16 of 19
quote:
This is compounded even more by the fact that people tend to buy all their disks from the same supplier at the same time - meaning they all come from the same batch so will probably fail at around the same time anyway.

What do you recommend to prevent that?

Simply buying the same size drives from different manufacturers? Or the same drives from different resellers to try and avoid the same batch?

From: Dan (HERMAND) 1 Feb 2013 11:15
To: ANT_THOMAS 17 of 19
Either one of those is the theory, but in truth, few people do. It's even harder when you consider that, really, to get the best support and reliability you should be buying disks from your server / SAN manufacturer. So, for example, we put in a HP SAN last week with about 70 disks - we pretty much have to buy them from HP, so you're kind of stuck there. 

It's just one of those things to be aware of, and a good reason to avoid RAID5 at nearly all costs.
From: Queeg 500 (JESUSONEEZ) 1 Feb 2013 12:44
To: Dan (HERMAND) 18 of 19
OK, that makes sense, thanks.
From: 99% of gargoyles look like (MR_BASTARD) 1 Feb 2013 22:12
To: Serg (NUKKLEAR) 19 of 19
Ahh, yes, I seem to recall that that's where I was heading...except that no one who pontificates over these things on t'webz actually makes a hard-and-fast recommendation that those of us with goldfish-like attention spans can follow.

Until now...until /YOU/!