Friday, 5 June 2015

RAID: Is It Worth It?

RAID

Mark Pickavance looks at RAID and what you can do to make this technology work for you

I’ve written numerous features on RAID over the years, as it progressed from being an exotic technology that was only used in servers to something that is available to almo st every desktop PC owner.

Early solutions used SCSI, but eventually IDE and then SATA and SAS became the chosen connection technologies. The advent of SATA drives (and later SAS) made for much slicker cabling systems and better hot-swap drive trays, which can be useful in RAID. Even cheap boards now have some simple RAID options, even if they’re just variations of striping (RAID 0) or mirror (RAID 1) functionality. The latest Intel platforms also offer parity plans (RAID 5) that combine performance advantages with resilience.


The basic premise of RAID is to take multiple drives, called the ‘pack’ or ‘array’, and make them appear to the system like they’re a single drive. Depending exactly how they’re put together, this can either result in a high performance system or a damage resistant one, or even a subtle mix of those two different objectives.

But as attractive as that might seem, there is always a little pain to go with any gain. RAID 0 ‘striped’ arrays configured for high speed achieve this by trading the survivability of the volume against the life of each disc in it. If you have a six-drive pack, you might see it outperform a single disk, but the death of any drive will kill the entire volume, which by definition is six times more likely to experience that failure.

Conversely, a RAID 1 ‘mirror’ array or other protective striping organisation that allows for drives to expire, but which guarantees the data won’t be lost, trades the overall capacity available for the protection offered. If data is in two locations, then it takes up twice as much space, as a rule. Given all these limitations and the huge capacity drives available, why do people use this technology?

Often they’re looking to combine existing resources to gain an advantage or enhance an existing system in a cost effective way.

RAID can achieve those objectives if it’s used well, though it doesn’t come without some associated risks.

Those who decide to take the RAID route first need to decide how they’re going to implement it, as there are several ways to cook this particular goose.

Soft, Firm Or Hard?


It might not be something you’re ever aware of when using a PC, but some subsystems, like SATA hard drives, don’t have much computing power of their own. They do largely what they’re told to do by a computer, so the more of them that are connected, the greater drain they are on computing resources.

This is less of an issue these days, because machines are so powerful, but when RAID first arrived, it was a problem, which is why SCSI was a popular solution.

In that technology, the SCSI interface has its own microprocessor dedicated to driving the attached hardware, so operations can take place without the direct involvement of the host PC. For example, a file is copied from one SCSI drive to another attached to the same controller. This would happen in isolation of the computer, which would be informed only of progress, errors and completion. But what does this have to do with RAID?

In a RAID system there’s a more complicated than normal control of the drives and reorganisation of the data, both things that naturally lend themselves to a dedicated drive controller. A hardware controller is the most expensive way to get RAID, but the performance offered by these solutions is exceptional, and they have the ability to do things independently of the PC to rebuild the RAID pack or activate hot replacements.

It’s also very easy to take a controller card and its drives and move them to another PC relatively painlessly, should the computer fail. And the more sophisticated controllers will even hold write data in a battery backed cache, so power failure won’t cause corruption in the RAID pack.

The type of RAID offered in most motherboards these days is more like a software RAID, where the PC does most of the controlling, though some functionality is handled by the chipset. It can’t rebuild while being active or any of the other neat tricks that only dedicated hardware can do. It’s also very dependent on the firmware and chipset configuration of that specific PC, and in that respect, moving it elsewhere is highly problematic.

And finally, there’s entirely software RAID that’s done in the OS, where everything is handled by the computer, creating a form of virtual RAID. This is transportable between systems but tends to have other limitations, like you generally can’t boot from the array.

To summarise, the options are shown in the table above.

What RAID Is Right For You?


If you refer to the ‘Many Flavours of RAID’ boxout, you’ll see that there are plenty of possible structures for an array depending on what features you’re looking for and how many drives you have available.

The tree most commonly used are RAID 0 (stripe), RAID 1 (mirror) and RAID 5 (parity). Striping gives you the performance of multiple drives running simultaneously but costs resilience. Mirroring keeps your data safe but does nothing for performance and reduces the amount of storage by 50%.

The attraction of parity packs is that you get most of the performance on offer by striping, but some resilience. Minimum pack size is three drives for parity RAID 5, and you forfeit the capacity of a single drive to provide the redundancy. The downside of this option is that it needs more computing power than the other choices, and if two drives in the pack die, the data on it won’t be recoverable.

If you use RAID 0, then assume at some point that you’ll have a drive failure (it’s inevitable, really) and that you should only put data on a location that you can recover from elsewhere.

Those that use RAID 1 should consider RAID 10 if they want a performance boost, but that will require a minimum of four drives, and you’ll only get the capacity of two of them.

The parity option is an attractive one if you have plenty of drives, ideally four or five, and you keep a close eye on the status of the pack and have a spare drive handy if for whatever reason you have a failure and need to rebuild.

Whatever option you choose, I wouldn’t make this the only data security that you have, because the added complexity that RAID naturally introduces has the potential to also increase the likelihood of failure, so be prepared.

Assuming I haven’t put you off the idea entirely, how do you proceed? Well, firstly, here are some tips on precisely not what to do.

RAID: What Not To Do


While I was testing out some of the theories I’ve presented here, I ran into some problems that highlight what a minefield that RAID can be to the unwary. When I initially started testing RAID on a modern system, my first experiment was to try the entirely software RAID that Microsoft has included with Windows for some time now. It’s built into the Disk Management console, so implementing it only requires that you cable up the bare drives and then right-click on them in the system to define the array you’d like.

That might seem painless, and as Windows software RAID can do many of the things that RAID on the motherboard can offer, surely you should use that, right? No, you really shouldn’t.

In my first test, I took two 1.5TB Samsung drives and decided to turn them into a RAID 0 stripe, giving me a 3TB drive and a performance boost. Setting this up is remarkably easy. You just right-click on one drive, select ‘Striped Volume’ and then define what drives it includes.

That’s fine, but what then proceeds isn’t, because the system needs to check every single byte of those drives before it will sign off this configuration, and that takes seemingly forever. On my test system, it took about six hours to do two 1.5TB – a timescale that hints at silly waiting times for systems with larger capacity drives and more of them.

There is one use I’d put this aspect of Windows to, which isn’t always supported by motherboard RAID, and that’s spanning. While technically not actual RAID of any number, this feature allows you to take a collection of drives of any size and glue them into what appears to be a single volume. This might be useful if you have some unused drives of various sizes, though you’ll experience data loss if any of them subsequently die.

Motherboard Raid


The best combination of cost and performance is probably offered by motherboard RAID, which is really software RAID but linked more directly to the capabilities of the chipset and firmware. All the latest AMD and Intel chipsets offer this technology, and they have done for a number of years, so unless you have a really old PC, then it’s a feature you’ll have access to.

However, before anyone starts down this path, I’d seriously recommend you consider all the implications of adding RAID to your existing system, the downsides and the upsides, and I’d also make sure you have a complete and recent backup of the computer in case things go sideways for whatever reason. For my example, I’ve used a reasonably modern ASRock Z97-based motherboard, and while the exact details might differ, the basic methodology of what happens is broadly similar to early chipsets and AMD platforms.

What you need to prepare beforehand is to find the RAID driver files, usually from the motherboard maker’s website, and all the hardware you intend to deploy. In my example, I’m going to add some disks to an existing system and RAID them, but not include the boot drive. If you intend to boot from your RAID volume, you might need to reinstall Windows, as any existing configurations are usually installed in AHCI mode.

Here’s how it should work, assuming you have a setup a little like the one I used.

Enough Theory, Let’s Test


With the system now in RAID mode, the time was right to do some testing to reveal what advantages or otherwise each of the possible configurations offered. With four drives to choose, I have the possibility of three RAID 0 modes (2, 3 and 4 drives), a couple of three- and four-drive RAID 5 layouts and also a few mirror options with and without striping, and a single drive alone.

That last one is effectively the standard against which all other choices will be compared, because what we’re trying to do is improve on that baseline level. What this obviously doesn’t consider is how reliable any choice is. At this stage, the options are purely speed evaluated.

In interpreting these results, I need to clarify some anomalies, for which I have some explanations. But before those, the first obvious conclusion is that RAID 0 really works in terms of boosting drive performance. And the more drives you include, the greater the effect.

Generally, each subsequent drive adds another 85% performance over a single drive, and with six drives you’d probably have a system that operated at SSD levels but with conventional drives. You also have lots of space, because none of your drives are put aside for protection. The flipside to that equation is that any drive failure will be fatal for whatever data you put on this volume, and the more drives you have, the greater this scenario becomes.

RAID 1 doesn’t have this problem, but it takes the shine off what performance you had already, so redundancy has a cost both in performance and available capacity. You can get some performance back using RAID 10, but you only get two drives worth of space for four drives. The answer should be RAID 5, balancing speed and total capacity, but as these numbers show, it is dire at writing. Or rather, it appeared to be.

Confused why it should be so bad, I did some research and discovered that for whatever reason, the driver doesn’t automatically configure its cache control to the correct mode for RAID 5, and Intel doesn’t point this out to you either.

Anyone testing this would immediately conclude that RAID 5 isn’t a good choice, but with some work it can suddenly turn into the perfect combination. What you need to do is change the cache from ‘Write Through’ to ‘Write Back’ – something the Intel RST tool allows you to do under the ‘performance’ tab.

Once this change was made, the transformation was dramatic when I retested the array. Read speed was generally unaffected, but write performance was a stunning 600% better, and above what I got with four drives in RAID 0 mode.

Though I didn’t have time to test it, I suspect RAID 0 write speeds would have been also improved by this cache mode. The only caveat is that, in general, Write Back isn’t as safe as Write Through in terms of avoiding corruption in the event of a power failure, in case you wondered why this wasn’t the default setting.

In conclusion, RAID 5 can be good, if you’re prepared to tune the settings and take some risks with the potential disaster of a power cut. If you want this level of performance, redundancy and not to worry about corruption, then you need to invest in a hardware controller that has a battery backed cache.

Final Thoughts


Before I talk about anything else, I need to cover Windows 10 and Intel RST, because this wasn’t something I could easily cover in my instructions. If you’re using Windows 7, you need to change the value of these from 3 to 0, before installing the drivers and rebooting:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\pciide
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\msahci
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\iaStorV
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\iaStor
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\atapi

Those using Windows 8 should alter these two in a similar fashion:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\storahci\
KEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\storahci\StartOverride

And, in theory, Windows 10 should be the same as Windows 8, but it isn’t. With Windows 10, I was confronted by the Intel RST installation routine that insisted on .NET 4.5 being on the machine, when this OS has .NET 4.6 pre-installed. You can’t put a prior version on, and the install routine won’t work without it.

How we’ve got this far down the line with Windows 10 with nobody at Intel noticing how badly they wrote the install routine for RAID, I’ve no idea. What I did to solve this was to use WinZip to pull the driver files out of the installation package and install them manually. And I then stole the management console apps from a Windows 7 PC that had them on to get around that part of the roadblock. It all worked in the end, though I did have a few hair tearing moments along the way. My fault for using an unreleased OS, I guess.

Once those hurdles were overcome, the system worked flawlessly, and I could probe the value of RAID in an analytical way.

These test numbers demonstrate perfectly why RAID is often worth having, irrespective of exactly how you deliver it. And the RAID offered with motherboards these days is the poor cousin of the hardware version it once was.

However, when I originally first did these types of benchmarks years ago, there wasn’t any other way to get these levels of performance, and these days we have SSD drives. A single SSD can deliver the same level of performance as four fast conventional hard drives in a RAID 0 stripe, yet it’s inherently more reliable and uses significantly less electrical power.

While I didn’t throw any numbers up to support this, there isn’t any reason why you can’t use SSD drives in a RAID array, and the performance level, as you might expect, is astonishing. The only problems are that with drive size generally capped at about 1TB, the arrays won’t be that big, and the cost might be excessive for personal use.

For numerous reasons, SSD drives work better in RAID than physical hard drives, as they don’t have a head move around and they don’t suffer with fragmentation issues.

Personally, I’d generally dissuade people from using RAID 5 on software or BIOS configurations, because it can’t be rebuilt while the system is running should a drive die. Downtime isn’t good, so don’t go there.

But the same drives organised with a hardware controller make for a good option, as you get some of the performance of stripes without the danger of trashing the pack, should one disk suddenly expire.

For those using computers at work, if you have a small server, then hardware RAID is the smart choice generally. I’d also consider it as a good means of providing resilience in a critical workstation, where downtime has major financial implications.

Also, while some software controllers offer ‘hot spares’, only the hardware controllers can bring these into use off their own bat, and return an array from being critical back to full operation unattended.

For home users, this is only worth considering if you have a collection of same-size unused drives sat around or you get a great deal on a set. Just don’t come to me when you’ve converted your system to RAID 0 and then trash it entirely a few months later. You’ve been well warned.


The Many Flavours Of RAID


Here’s an overview of what the more common RAID levels are called and how they’re organised.
• RAID 0: Striped set
• RAID 1: Mirrored set
• RAID 1e: Combined mirror and stripe
• RAID 3: Striped with dedicated parity
• RAID 5: Striped set with distributed parity
• RAID 6: RAID 5 with additional parity block
• RAID 01: A mirror of stripes
• RAID 10: A stripe of mirrors
• RAID 50: A stripe across dedicated parity RAID systems
• RAID 60: Mirror of dual RAID 6 systems
• RAID 51: A mirror striped set with distributed parity (also called RAID 53)
• RAID 100: A stripe of a stripe of mirrors

Hardware RAID


Advantages

Performance – Because the CPU isn’t involved directly in drive control, it can be doing other things, improving overall PC performance. And they often include drive caches to smooth read and write cycles.
Flexibility – They’re capable of migrating between difference schemes should you plans change. They’re also bootable.
Transportability – Easily moved to another PC.
Resilience – Battery backed cache memory reduces the chance of corruption.
Scalability – Controllers that can use 16, 32 or even 64 drives are possible.

Disadvantages

Cost – Hardware controllers can be very expensive depending what features you’d like.

Motherboard RAID


Advantages

Cost – It’s included in the price.
Performance – Generally good, even with a relatively low performance PC.
Flexibility – While they can’t generally migrate, they can be used as a boot volume.

Disadvantages

Transportability – Not easily moved to another PC.
Repair Time – A rebuild can’t happen while the PC is being used.
Resilience – Crashes or power cuts can damage the array.
Install – Switching an existing system from AHCI or IDE to RAID can be technical.

Software RAID


Advantages

Cost – It’s included in the price of the OS.
Performance – Generally acceptable, even with a relatively low performance PC.
Easy install – No drivers required.
Spanning – Make lots of different drives one big volume.

Disadvantages

Repair Time – A rebuild can’t happen while the PC is being used.
Resilience – Crashes or power cuts can damage the array.
Flexibility – They can be used as a boot volume normally.