RAID disk arrays: what is it and why is it needed? Types of RAID arrays

If you have ever thought about purchasing servers or NAS storage, then you have probably heard the magical term “RAID”. RAID stands for Redundant Array of Independent Disks - a redundant array of independent disks. In general, RAID systems use two or more hard drives to either improve performance, fault tolerance, or both. Fault tolerance, in this case, means that the equipment (for example, a server) will be able to operate and data will not be lost even if one (or even more) of the disks fails.

In order to understand exactly how RAID helps improve performance and fault tolerance, you need to understand what RAID levels are. The RAID level depends on how many disks are in the array, how critical a possible disk failure is, and how important the system speed is. For example, for business applications, data safety in the event of component failure is much more important, but for home users, speed may be a deciding factor. RAID levels represent different combinations of balancing performance, fault tolerance, and cost of the solution.

RAID Technology Overview

As a rule, RAID is used in companies where fault tolerance and performance are not a luxury, but a necessity. Servers and NAS storages, in most cases, are equipped with so-called RAID controllers - hardware modules that manage arrays of SATA or SSD drives. Also, most modern operating systems support software RAID, where disks and arrays are managed by the operating system itself.

What RAID level do I need?

As already mentioned, there are several levels of RAID, depending on what you want to achieve - greater performance, greater reliability, or both. It is also important whether hardware or software RAID is used. Software RAID does not support all levels, and if you use hardware RAID, you need to think about choosing the appropriate controller.

The most common RAID levels.

RAID0 – used to improve performance. Also known as an "interleaved" array. This means that the data stream is sort of divided across several disks, instead of using one all the time. In this way, “parallelism” of reading or writing is achieved, which speeds up the work. RAID0 requires a minimum of two disks. RAID0 is supported by both hardware and software solutions. The disadvantage of RAID0 is that there is no fault tolerance - if any disk fails, information is lost.

RAID1 – used to improve reliability. Also known as a "mirrored" array. From the name it is clear that in the case of RAID1, information is simultaneously written to two disks, resulting in two copies of the data - two “mirrors”. If one of the disks fails, the second one continues to work and no data is lost. This is the simplest and relatively inexpensive way to increase fault tolerance. The downside of this solution is a slight decrease in performance. RAID1 requires a minimum of two drives. RAID1 can be assembled either in software or using a hardware controller.

RAID5 is probably the most common RAID configuration. RAID5 provides better performance than mirroring, while also providing fault tolerance. In a RAID5 configuration, blocks of data and so-called parity (an additional block of data to be recovered) are written sequentially across three or more disks. If one of the disks fails, data is recovered from the remaining blocks and parity automatically and seamlessly. Naturally, in this case the system remains fully operational. Another advantage of RAID5 is “hot swap” - the ability to change any of the disks without interrupting the operation of the system (server or storage). A negative aspect of using RAID5 is a sharp decrease in performance during data recovery on a newly replaced disk. Also, RAID5 is, in principle, demanding on computing resources, so it is recommended to use a hardware controller, although programmatically It is also possible to create RAID5.

RAID10 is a combination of RAID1 and RAID0. Combines RAID1 mirroring and RAID0 striping. Provides good performance and fault tolerance, but is quite expensive, because it requires at least four disks and total capacity the array will be equal to half the capacity of the physical disks.

There are other RAID levels - RAID2, RAID4, RAID7, RAID50, RAID01, in most - they are specific combinations and variants of the configurations already described. For small businesses and typical solutions, the most common levels are 0, 1, 5 and 10.

It is worth mentioning that if you use disks of different capacities, the array will be equal to the capacity of the smallest disk. For example, the capacity of RAID1 of two disks 1000 GB and 500 GB will be equal to 500 GB. It is quite natural that for RAID it is recommended to use disks of the same capacity.

Also, for performance and reliability reasons, it is recommended to use disks of the same model and preferably within the same batch. Different disks, especially different manufacturers, can wear out and cause delays completely unpredictably.

It's good to remember that RAID is not a replacement for backup. RAID may be in a great way improving reliability and performance, but this is only part of a data recovery strategy.

Hard drives play an important role in a computer. It is stored on them various information user, the OS is launched from them, etc. Hard drives do not last forever and have a certain margin of safety. And each hard drive has its own distinctive characteristics.

Most likely, you have heard at some point that so-called raid arrays can be made from ordinary hard drives. This is necessary in order to improve the performance of drives, as well as ensure the reliability of information storage. In addition, such arrays can have their own numbers (0, 1, 2, 3, 4, etc.). In this article we will tell you about RAID arrays.

RAID is a collection of hard drives or a disk array. As we have already said, such an array ensures reliable data storage and also increases the speed of reading or writing information. There are various RAID array configurations, which are marked with numbers 1, 2, 3, 4, etc. and differ in the functions they perform. By using such arrays with configuration 0 you will get significant performance improvements. A single RAID array guarantees complete safety of your data, since if one of the drives fails, the information will be located on the second hard drive.

Essentially RAID array– this is 2 or the nth number of hard drives connected to motherboard, which supports the ability to create raids. Programmatically, you can select the raid configuration, that is, specify how these same disks should work. To do this, you will need to specify the settings in the BIOS.

To install the array, we need a motherboard that supports raid technology, 2 identical ones (completely in all respects) hard drives, which we connect to the motherboard. In the BIOS you need to set the parameter SATA Configuration: RAID. When the computer boots, press the key combination CTR-I, and already there we configure RAID. And after that, we install Windows as usual.

It is worth paying attention to the fact that if you create or delete a raid, then all information that is on the drives is deleted. Therefore, you must first make a copy of it.

Let's look at the RAID configurations we've already talked about. There are several of them: RAID 1, RAID 2, RAID 3, RAID 4, RAID 5, RAID 6, etc.

RAID-0 (striping), also known as a zero-level array or “null array”. This level increases the speed of working with disks by an order of magnitude, but does not provide additional fault tolerance. In fact, this configuration is a raid array purely formally, because with this configuration there is no redundancy. Recording in such a bundle occurs in blocks, alternately written to different disks of the array. The main disadvantage here is the unreliability of data storage: if one of the array disks fails, all information is destroyed. Why does this happen? This happens because each file can be written in blocks to several hard drives at once, and if any of them malfunctions, the integrity of the file is violated, and, therefore, it is not possible to restore it. If you value performance and regularly make backups, then this array level can be used on your home PC, which will give a noticeable increase in performance.

RAID-1 (mirroring)– “mirror mode”. You can call this level of RAID arrays the paranoid level: this mode gives almost no increase in system performance, but absolutely protects your data from damage. Even if one of the disks fails, an exact copy of the lost one will be stored on another disk. This mode, like the first, can also be implemented on a home PC for people who value the data on their disks extremely highly.

When constructing these arrays, an information recovery algorithm is used using Hamming codes (an American engineer who developed this algorithm in 1950 to correct errors in the operation of electromechanical computers). To ensure the operation of this RAID controller, two groups of disks are created - one for storing data, the second group for storing error correction codes.

This type of RAID has become less widespread in home systems due to the excessive redundancy of the number of hard drives - for example, in an array of seven hard drives, only four will be allocated for data. As the number of disks increases, redundancy decreases, which is reflected in the table below.

The main advantage of RAID 2 is the ability to correct errors on the fly without reducing the speed of data exchange between the disk array and the central processor.

RAID 3 and RAID 4

These two types of disk arrays are very similar in design. Both use multiple hard drives to store information, one of which is used exclusively for storing checksums. Three hard drives are enough to create RAID 3 and RAID 4. Unlike RAID 2, data recovery “on the fly” is impossible - information is restored after replacing the failed one. hard drive for some time.

The difference between RAID 3 and RAID 4 is the level of data partitioning. In RAID 3, information is split into individual bytes, which leads to serious slowdowns when writing/reading large quantity small files. RAID 4 splits data into separate blocks, the size of which does not exceed the size of one sector on the disk. As a result, the processing speed of small files increases, which is critical for personal computers. For this reason, RAID 4 has become more widespread.

A significant disadvantage of the arrays under consideration is the increased load on the hard drive intended for storing checksums, which significantly reduces its resource.

RAID-5. The so-called fault-tolerant array of independent disks with distributed storage of checksums. This means that on an array of n disks, n-1 disk will be allocated for direct storage data, and the latter will store the checksum of the n-1 stripe iteration. To explain more clearly, let's imagine that we need to write a file. It will be divided into portions of the same length and will alternately begin to be written cyclically to all n-1 disks. A checksum of bytes of data portions of each iteration will be written to the last disk, where the checksum will be implemented by a bitwise XOR operation.

It’s worth warning right away that if any of the disks fail, it will all go into emergency mode, which will significantly reduce performance, because To put the file together, unnecessary manipulations will be performed to restore its “missing” parts. If two or more disks fail at the same time, the information stored on them cannot be restored. In general, the implementation of a level 5 raid array provides sufficient high speed access, parallel access to various files and good fault tolerance.

To a large extent, the above problem is solved by constructing arrays using the RAID 6 scheme. In these structures, a memory volume equal to the volume of two hard drives is allocated for storing checksums, which are also cyclically and evenly distributed to different disks. Instead of one, two checksums are calculated, which guarantees data integrity in the event of simultaneous failure of two hard drives in the array.

The advantages of RAID 6 are a high degree of information security and less performance loss than in RAID 5 during data recovery when replacing a damaged disk.

Disadvantage of RAID 6 - reduction overall speed data exchange by approximately 10% due to an increase in the volume of necessary checksum calculations, as well as due to an increase in the volume of written/read information.

Combined RAID types

In addition to the main types discussed above, various combinations of them are widely used, which compensate for certain disadvantages of simple RAID. In particular, the use of RAID 10 and RAID 0+1 schemes is widespread. In the first case, a pair of mirrored arrays are combined into RAID 0, in the second, on the contrary, two RAID 0 are combined into a mirror. In both cases, the increased performance of RAID 0 is added to the information security of RAID 1.

Often in order to increase the level of protection important information RAID 51 or RAID 61 construction schemes are used - mirroring of already highly protected arrays ensures exceptional data safety in the event of any failures. However, it is impractical to implement such arrays at home due to excessive redundancy.

Building a disk array - from theory to practice

A specialized RAID controller is responsible for building and managing the operation of any RAID. Much to the relief of the average user personal computer, in most modern motherboards these controllers are already implemented at the chipset southbridge level. So, to build an array of hard drives, all you have to do is purchase the required number of them and determine the desired RAID type in the appropriate section of the BIOS settings. After this, instead of several hard drives in the system, you will see only one, which can be divided into partitions and logical drives if desired. Please note that those who are still using Windows XP will need to install an additional driver.

And finally, one more piece of advice - to create a RAID, purchase hard drives the same volume, same manufacturer, same model and preferably from the same batch. Then they will be equipped with the same logic sets and the operation of the array of these hard drives will be the most stable.

Tags: , https://site/wp-content/uploads/2017/01/RAID1-400x333.jpg 333 400 Leonid Borislavsky /wp-content/uploads/2018/05/logo.pngLeonid Borislavsky 2017-01-16 08:57:09 2017-01-16 07:12:59 What are RAID arrays and why are they needed?

Today we will talk about RAID arrays. Let's figure out what it is, why we need it, what it is like and how to use all this magnificence in practice.

So, in order: what is RAID array or just RAID? This abbreviation stands for "Redundant Array of Independent Disks" or "redundant (backup) array of independent disks." To put it simply, RAID array this is a collection of physical disks combined into one logical disk.

It usually happens the other way around - in system unit one physical disk is installed, which we split into several logical ones. Here the situation is the opposite - several hard drives are first combined into one, and then the operating system perceives them as one. Those. The OS firmly believes that it physically only has one disk.

RAID arrays There are hardware and software.

Hardware RAID arrays are created before the OS boots via special utilities, hardwired into RAID controller- something like a BIOS. As a result of creating such RAID array already at the OS installation stage, the distribution kit “sees” one disk.

Software RAID arrays created by OS tools. Those. During boot, the operating system “understands” that it has several physical disks, and only after the OS starts, through software, the disks are combined into arrays. Naturally, the operating system itself is not located on RAID array, since it is set before it is created.

"Why is all this needed?" - you ask? The answer is: to increase the speed of reading/writing data and/or increase fault tolerance and security.

"How RAID array can increase speed or secure data?" - to answer this question, consider the main types RAID arrays, how they are formed and what it gives as a result.

RAID-0. Also called "Stripe" or "Tape". Two or more hard drives are combined into one by sequential merging and summing up the volumes. Those. if we take two 500GB disks and create them RAID-0, the operating system will perceive this as one terabyte disk. At the same time, the read/write speed of this array will be twice as high as that of one disk, since, for example, if the database is physically located in this way on two disks, one user can read data from one disk, and another user can write to another disk at the same time. Whereas, if the database is located on one disk, the hard disk itself will perform read/write tasks of different users sequentially. RAID-0 will allow reading/writing in parallel. As a consequence, the more disks in the array RAID-0, the faster the array itself works. The dependence is directly proportional - the speed increases N times, where N is the number of disks in the array.
At the array RAID-0 there is only one drawback that outweighs all the advantages of using it - the complete lack of fault tolerance. If one of the physical disks of the array dies, the entire array dies. There's an old joke about this: "What does the '0' in the title mean? RAID-0? - the amount of information restored after the death of the array!"

RAID-1. Also called "Mirror" or "Mirror". Two or more hard drives are combined into one by parallel merging. Those. if we take two 500GB disks and create them RAID-1, the operating system will perceive this as one 500GB disk. In this case, the read/write speed of this array will be the same as that of one disk, since information is read/written to both disks simultaneously. RAID-1 does not provide a gain in speed, but provides greater fault tolerance, since in the event of the death of one of the hard drives, there is always a complete duplicate of information located on the second drive. It must be remembered that fault tolerance is provided only against the death of one of the array disks. If the data was deleted purposefully, it is deleted from all disks of the array simultaneously!

RAID-5. More safe option RAID-0. The volume of the array is calculated using the formula (N - 1) * DiskSize RAID-5 from three 500GB disks, we get an array of 1 terabyte. The essence of the array RAID-5 is that several disks are combined into RAID-0, and the last disk stores the so-called “checksum” - service information intended to restore one of the array disks in the event of its death. Array write speed RAID-5 somewhat lower, since time is spent on calculation and recording checksum to a separate disk, but the read speed is the same as in RAID-0.
If one of the array disks RAID-5 dies, the read/write speed drops sharply, since all operations are accompanied by additional manipulations. Actually RAID-5 turns into RAID-0 and if recovery is not taken care of in a timely manner RAID array there is a significant risk of losing data completely.
With an array RAID-5 You can use the so-called Spare disk, i.e. spare. During stable operation RAID array This disk is idle and not used. However, in the event of a critical situation, recovery RAID array starts automatically - information from the damaged one is restored to the spare disk using checksums located on a separate disk.
RAID-5 is created from at least three disks and saves from single errors. In case of simultaneous occurrence of different errors on different disks RAID-5 doesn't save.

RAID-6- is an improved version of RAID-5. The essence is the same, only for checksums, not one, but two disks are used, and the checksums are calculated using different algorithms, which significantly increases the fault tolerance of everything RAID array generally. RAID-6 assembled from at least four disks. The formula for calculating the volume of an array looks like (N - 2) * DiskSize, where N is the number of disks in the array, and DiskSize is the size of each disk. Those. when creating RAID-6 from five 500GB disks, we get an array of 1.5 terabytes.
Write speed RAID-6 lower than RAID-5 by about 10-15%, which is due to additional time spent on calculating and writing checksums.

RAID-10- also sometimes called RAID 0+1 or RAID 1+0. It is a symbiosis of RAID-0 and RAID-1. The array is built from at least four disks: on the first RAID-0 channel, on the second RAID-0 to increase read/write speed, and between them in a RAID-1 mirror to increase fault tolerance. Thus, RAID-10 combines the advantages of the first two options - fast and fault-tolerant.

RAID-50- similarly, RAID-10 is a symbiosis of RAID-0 and RAID-5 - in fact, RAID-5 is built, only its constituent elements are not independent hard drives, but RAID-0 arrays. Thus, RAID-50 gives very good speed read/write and contains the resilience and reliability of RAID-5.

RAID-60- the same idea: we actually have RAID-6, assembled from several RAID-0 arrays.

There are also other combined arrays RAID 5+1 And RAID 6+1- they look like RAID-50 And RAID-60 the only difference is that the basic elements of the array are not RAID-0 tapes, but RAID-1 mirrors.

How do you understand combined RAID arrays: RAID-10, RAID-50, RAID-60 and options RAID X+1 are direct heirs basic types arrays RAID-0, RAID-1, RAID-5 And RAID-6 and serve only to increase either read/write speed or increase fault tolerance, while carrying the functionality of basic, parent types RAID arrays.

If we move on to practice and talk about the use of certain RAID arrays in life, the logic is quite simple:

RAID-0 We do not use it in its pure form at all;

RAID-1 We use it where read/write speed is not particularly important, but fault tolerance is important - for example, on RAID-1 It’s good to install operating systems. In this case, no one except the OS accesses the disks, the speed of the hard disks themselves is quite sufficient for operation, fault tolerance is ensured;

RAID-5 We install it where you need speed and fault tolerance, but don’t have enough money to buy it more hard drives or there is a need to restore arrays in case of damage without stopping work - spare Spare drives will help us here. Common Application RAID-5- data storage;

RAID-6 used where it is simply scary or there is real threat death of several disks in an array at once. In practice it is quite rare, mainly among paranoid people;

RAID-10- used where it is necessary to work quickly and reliably. Also the main direction for use RAID-10 are file servers and database servers.

Again, if we simplify further, we come to the conclusion that where there is no large and voluminous work with files, it is quite enough RAID-1- operating system, AD, TS, mail, proxy, etc. Where serious work with files is required: RAID-5 or RAID-10.

The ideal solution for a database server seems to be a machine with six physical disks, two of which are combined into a mirror RAID-1 and the OS is installed on it, and the remaining four are combined into RAID-10 for fast and reliable data processing.

If, after reading all of the above, you decide to install it on your servers RAID arrays, but don’t know how to do it and where to start - contact us! - we will help you choose necessary equipment, and we will also carry out installation work to implement RAID arrays.

Greetings to blog readers!
Today there will be another article on a computer topic, and it will be devoted to such a concept as Raid disk array- I’m sure this concept will mean absolutely nothing to many, and those who have already heard about it somewhere have no idea what it is. Let's figure it out together!

Without going into details of terminology, a Raid array is a kind of complex built from several hard drives, which allows you to more competently distribute functions between them. How do we usually place hard drives in a computer? We connect one hard drive to Sata, then another, then a third. And disks D, E, F and so on appear in our operating system. We can place some files on them or install Windows, but essentially these will be separate disks - if we take out one of them, we will not notice anything at all (if the OS was not installed on it) except that we will not have access to those recorded on them files. But there is another way - to combine these disks into a system, give them a certain algorithm for working together, as a result of which the reliability of information storage or the speed of their operation will significantly increase.

But before we can create this system, we need to know whether the motherboard supports Raid disk arrays. Many modern motherboards already have a built-in Raid controller, which allows you to combine hard drives. Supported array circuits are available in the descriptions for the motherboard. For example, let’s take the first one that caught my eye in Yandex Market ASRock board P45R2000-WiFi.

Here, a description of the supported Raid arrays is displayed in the "Sata Disk Controllers" section.

IN in this example we see that the Sata controller supports the creation of Raid arrays: 0, 1, 5, 10. What do these numbers mean? This designation various types arrays in which disks communicate with each other via different schemes, which are designed, as I already said, to either speed up their work or increase reliability against data loss.

If motherboard computer does not support Raid, you can purchase a separate Raid controller in the form of a PCI card, which is inserted into the PCI slot on the motherboard and gives it the ability to create arrays of disks. For the controller to work, after installing it, you will also need to install the raid driver, which either comes on the disk with this model, or can simply be downloaded from the Internet. Best on this device do not save money and buy from some well-known manufacturer, for example Asus, and with Intel chipsets.

I suspect that you still don’t have a good idea of what we’re talking about, so let’s take a closer look at each of the most popular types of Raid arrays to make everything clearer.

RAID 1 array

Raid 1 array is one of the most common and budget options that uses 2 hard drives. This array is designed to provide maximum protection for user data, because all files will be simultaneously copied to 2 hard drives. In order to create it, we take two hard drives of equal size, for example 500 GB each, and make the appropriate settings in the BIOS to create the array. After this, your system will see one hard drive measuring not 1 TB, but 500 GB, although physically two hard drives work - the calculation formula is given below. And all files will be simultaneously written to two disks, that is, the second will be a full backup copy of the first. As you understand, if one of the disks fails, you will not lose a single piece of your information, since you will have a second copy of this disk.

Also, the failure will not be noticed by the operating system, which will continue to work with the second disk - it will only notify you of the problem special program, which controls the operation of the array. You just need to remove the faulty disk and connect the same one, only a working one - the system will automatically copy all the data from the remaining working disk to it and continue working.

The disk volume that the system will see is calculated here using the formula:

V = 1 x Vmin, where V is the total capacity and Vmin is the storage capacity of the smallest hard drive.

RAID 0 array

Another popular scheme, which is designed to increase not the reliability of storage, but, on the contrary, the speed of operation. It also consists of two HDDs, but in this case the OS already sees the full total volume of the two disks, i.e. if you combine 500 GB disks into Raid 0, the system will see one 1 TB disk. The speed of reading and writing increases due to the fact that blocks of files are written alternately to two disks - but at the same time, the fault tolerance of this system is minimal - if one of the disks fails, almost all files will be damaged and you will lose part of the data - the one that was written to broken disk. After this, you will have to restore the information at the service center.

Formula for calculating the total disk volume, visible Windows, looks like this:

If, before reading this article, you weren’t really worried about the fault tolerance of your system, but would like to increase the speed of operation, then you can buy an additional hard drive and feel free to use this type. By and large, at home, the overwhelming majority of users do not store some super-important information, but copy some important files You can use a separate external hard drive.

Array Raid 10 (0+1)

As the name itself suggests, this type of array combines the properties of the two previous ones - it’s like two Raid 0 arrays combined into Raid 1. Four hard drives are used, information is written to two of them in blocks one by one, as was the case in Raid 0 , and for the other two - are created full copies the first two. The system is very reliable and at the same time quite fast, but very expensive to organize. To create, you need 4 HDDs, and the system will see the total volume using the formula:

That is, if we take 4 disks of 500 GB, then the system will see 1 disk of 1 TB in size.

This type, as well as the next one, is most often used in organizations, on server computers, where it is necessary to ensure both high speed and maximum safety from loss of information in case of unforeseen circumstances.

RAID 5 array

The Raid 5 array is the optimal combination of price, speed and reliability. In this array, a minimum of 3 HDDs can be used; the volume is calculated using a more complex formula:

V = N x Vmin – 1 x Vmin, where N is the number of hard drives.

So, let's say we have 3 disks of 500 GB each. The volume visible to the OS will be 1 TB.

The array's operation scheme is as follows: blocks of divided files are written to the first two disks (or three, depending on their number), and the checksum of the first two (or three) is written to the third (or fourth). Thus, if one of the disks fails, its contents can be easily restored using the checksum available on the last disk. The performance of such an array is lower than that of Raid 0, but is as reliable as Raid 1 or Raid 10 and at the same time cheaper than the latter, because You can save on the fourth hard drive.

The diagram below shows a Raid 5 layout of four HDDs.

There are also other modes - Raid 2,3, 4, 6, 30, etc., but they are largely derivative of those listed above.

How to install Raid disk array on Windows?

I hope you understand the theory. Now let's look at the practice - insert into PCI slot Raid controller and installing drivers, I think, will not be difficult for experienced PC users.

How now to create in the operating system Windows Raid an array of connected hard drives?

It is best, of course, to do this when you have just purchased and connected clean hard drives without an installed OS. First, we restart the computer and go into the BIOS settings - here we need to find the SATA controllers to which our hard drives are connected and set them to RAID mode.

After that, save the settings and restart the PC. On a black screen, information will appear that you have Raid mode enabled and about the key with which you can access its settings. The example below asks you to press the "TAB" key.

Depending on the Raid controller model, it may be different. For example, "CNTRL+F"

We go into the configuration utility and click something like “Create array” or “Create Raid” in the menu - the labels may differ. Also, if the controller supports several Raid types, you will be asked to choose which one you want to create. In my example, only Raid 0 is available.

After this, we return back to the BIOS and in the boot order setting we see more than a few separate disks, and one as an array.

That's all - RAID is configured and now the computer will treat your disks as one. This is how, for example, Raid will be visible when installing Windows.

I think you have already understood the benefits of using Raid. Finally, I’ll give you comparison table measuring the speed of writing and reading a disk separately or as part of Raid modes - the result, as they say, is obvious.

RAID(English) redundant array of independent disks - redundant array of independent hard drives) - an array of several disks controlled by a controller, interconnected by high-speed channels and perceived external system as a whole. Depending on the type of array used, it can provide varying degrees of fault tolerance and performance. Serves to increase the reliability of data storage and/or to increase the speed of reading/writing information. Initially, such arrays were built as a backup for media based on random access memory (RAM), which was expensive at that time. Over time, the abbreviation acquired a second meaning - the array was already made up of independent disks, implying the use of several disks, rather than partitions of one disk, as well as the high cost (now relatively just several disks) of the equipment necessary to build this very array.

Let's look at what RAID arrays there are. First, let's look at the levels that were presented by scientists from Berkeley, then their combinations and unusual modes. It is worth noting that if disks are used different sizes(which is not recommended), then they will work at the smallest volume. The extra capacity of large disks will simply not be available.

RAID 0. Striped disk array without fault tolerance/parity (Stripe)

It is an array where data is divided into blocks (the block size can be set when creating the array) and then written to separate disks. In the simplest case, there are two disks, one block is written to the first disk, another to the second, then again to the first, and so on. This mode is also called “interleave”, because when writing blocks of data, the disks on which the recording is performed are interleaved. Accordingly, the blocks are also read one by one. This way, I/O operations are executed in parallel, resulting in better performance. If earlier we could read one block per unit of time, now we can do this from several disks at once. The main advantage this mode This is precisely the high data transfer rate.

However, miracles do not happen, and if they do, they are rare. Performance does not increase by N times (N is the number of disks), but less. First of all, the disk access time increases N times, which is already high relative to other computer subsystems. The quality of the controller has an equally important impact. If it is not the best, then the speed may differ barely noticeably from the speed of a single disk. Well, the interface with which the RAID controller is connected to the rest of the system has a significant influence. All this can lead not only to a less than N increase in speed linear reading, but also to the limit on the number of disks, a setting above which will no longer provide an increase at all. Or, conversely, it will slightly reduce speed. In real tasks, with a large number of requests, the chance of encountering this phenomenon is minimal, because the speed very much depends on the hard drive and its capabilities.

As you can see, in this mode there is no redundancy as such. Everything is used disk space. However, if one of the disks fails, then obviously all information is lost.

RAID 1. Mirroring

The essence of this RAID mode is to create a copy (mirror) of the disk in order to increase fault tolerance. If one disk fails, then the work does not stop, but continues, but with one disk. This mode requires an even number of disks. The idea of this method is close to backup, but everything happens on the fly, as well as recovery after a failure (which is sometimes very important) and there is no need to waste time on it.

Disadvantages: high redundancy, since you need twice as many disks to create such an array. Another disadvantage is that there is no performance gain - after all, a copy of the data from the first is simply written to the second disk.

RAID 2 Array using fault-tolerant Hamming code.

This code allows you to correct and detect double faults. Actively used in error correcting memory (ECC). In this mode, the disks are divided into two groups - one part is used for data storage and works similarly to RAID 0, splitting data blocks across different disks; the second part is used to store ECC codes.

The advantages include on-the-fly error correction and high data streaming speed.

The main disadvantage is high redundancy (with a small number of disks it is almost double, n-1). As the number of disks increases, the specific number of disks storing ECC codes becomes smaller (specific redundancy decreases). The second disadvantage is the low speed of working with small files. Due to its bulkiness and high redundancy with a small number of disks, this RAID level is not currently used, having given way to higher levels.

RAID 3. Fault-tolerant array with bit striping and parity.

This mode writes data block by block to different disks, like RAID 0, but uses another disk for parity storage. Thus, the redundancy is much lower than in RAID 2 and is only one disk. If one disk fails, the speed remains virtually unchanged.

Of the main disadvantages it should be noted low speed when working with small files and many requests. This is due to the fact that all control codes are stored on one disk and must be rewritten during I/O operations. The speed of this disk limits the speed of the entire array. Parity bits are written only when data is written. And when reading, they are checked. Because of this, there is an imbalance in read/write speed. Single reading of small files is also characterized by low speed, which is due to the impossibility of parallel access from independent disks when different disks execute requests in parallel.

RAID 4

Data is written in blocks to different disks, one disk is used to store parity bits. The difference from RAID 3 is that blocks are divided not into bits and bytes, but into sectors. The advantages include high transfer speeds when working with large files. The speed of working with a large number of read requests is also high. Among the shortcomings, we can note those inherited from RAID 3 - an imbalance in the speed of read/write operations and the existence of conditions that make parallel access to data difficult.

RAID 5. Disk array with striping and distributed parity.

The method is similar to the previous one, but instead of allocating a separate disk for parity bits, this information is distributed among all disks. That is, if N disks are used, then the capacity of N-1 disks will be available. The volume of one will be allocated for parity bits, as in RAID 3.4. But they are not stored on a separate disk, but separated. Each disk has (N-1)/N amount of information and 1/N of the amount is filled with parity bits. If one disk in the array fails, it remains operational (the data stored on it is calculated based on the parity and data of other disks “on the fly”). That is, the failure occurs transparently to the user and sometimes even with a minimal drop in performance (depending on the computing ability of the RAID controller). Among the advantages, we note the high speeds of reading and writing data, both with large volumes and with a large number of requests. Flaws - complex recovery data and lower read speed than RAID 4.

RAID 6. Disk array with striping and double distributed parity.

The difference comes down to the fact that two parity schemes are used. The system is tolerant to failures of two disks. The main difficulty is that to implement this you have to do more operations when performing a write. Because of this, the write speed is extremely slow.

Combined (nested) RAID levels.

Since RAID arrays are transparent to the OS, the time has soon come to create arrays whose elements are not disks, but arrays of other levels. They are usually written with a plus. The first digit means what level arrays are included as elements, and the second digit means what organization it has top level, which combines elements.

RAID 0+1

A combination that is a RAID 1 array built on the basis of RAID 0 arrays. As in a RAID 1 array, only half the disk capacity will be available. But, as with RAID 0, the speed will be higher than with a single disk. To implement such a solution, a minimum of 4 disks are required.

RAID 1+0

Also known as RAID 10. It is a stripe of mirrors, that is, a RAID 0 array built from RAID 1 arrays. Almost similar to the previous solution.

RAID 0+3

Array with dedicated parity over stripe. It is a 3rd level array in which data is divided in blocks and written to RAID 0 arrays. Combinations other than the simplest 0+1 and 1+0 require specialized controllers, often quite expensive. The reliability of this type is lower than that of the next option.

RAID 3+0

Also known as RAID 30. It is a stripe (RAID 0 array) from RAID 3 arrays. It has a very high data transfer speed, coupled with good fault tolerance. The data is first divided into blocks (as in RAID 0) and placed into element arrays. There they are again divided into blocks, their parity is calculated, the blocks are written to all disks except one, to which the parity bits are written. IN in this case, one of the disks of each of the RAID 3 arrays may fail.

RAID 5+0 (50)

It is created by combining RAID 5 arrays into a RAID 0 array. It has a high speed of data transfer and query processing. It has an average data recovery speed and good fault tolerance. The RAID 0+5 combination also exists, but more theoretically, as it provides too few advantages.

RAID 5+1 (51)

A combination of mirroring and striping with distributed parity. RAID 15 (1+5) is also an option. Has very high fault tolerance. The 1+5 array can operate with three drive failures, and the 5+1 array can operate with five out of eight drives.

RAID 6+0 (60)

Interleaving with double distributed parity. In other words, a stripe from RAID 6. As already mentioned in relation to RAID 0+5, RAID 6 from stripes has not become widespread (0+6). Similar techniques (stripping arrays with parity) can increase the speed of the array. Another advantage is that this can easily increase the capacity without complicating the situation with the delays required to calculate and write more parity bits.

RAID 100 (10+0)

RAID 100, also spelled RAID 10+0, is a stripe of RAID 10. In essence, it is similar to the wider RAID 10 array, which uses twice as many disks. But this “three-story” structure has its own explanation. Most often, RAID 10 is made in hardware, that is, using the controller, and stripes are made from them in software. This trick is resorted to in order to avoid the problem that was mentioned at the beginning of the article - controllers have their own scalability limitations and if you plug double the number of disks into one controller, under some conditions you may not see any growth at all. Software RAID 0 allows you to create it on the basis of two controllers, each of which contains RAID 10 on board. Thus, we avoid the “bottleneck” represented by the controller. Another useful point is to work around the problem with the maximum number of connectors on one controller - by doubling their number, we double the number of available connectors.

Non-standard RAID modes

Double parity

A common addition to the listed RAID levels is double parity, sometimes implemented and therefore called “diagonal parity.” Double parity is already implemented in RAID 6. But, unlike it, parity is counted over other data blocks. Recently, the RAID 6 specification was expanded, so diagonal parity can be considered RAID 6. For RAID 6, parity is considered to be the result of adding modulo 2 bits in a row (that is, the sum of the first bit on the first disk, the first bit on the second, etc. .), then there is a shift in the diagonal parity. Operating in disk failure mode is not recommended (due to the difficulty of calculating lost bits from checksums).

It is a development of a NetApp RAID array with double parity and falls under the updated definition of RAID 6. It uses a data recording scheme different from the classic RAID 6 implementation. Writing is done first to the NVRAM cache, which is backed by an uninterruptible power supply to prevent data loss during a power outage. The controller software writes only solid blocks to disks whenever possible. This scheme provides more protection than RAID 1 and is faster than regular RAID 6.

RAID 1.5

It was proposed by Highpoint, but is now used very often in RAID 1 controllers, without any emphasis on this feature. The essence comes down to simple optimization - data is written as to a regular RAID 1 array (which is what 1.5 essentially is), and data is read interleaved from two disks (as in RAID 0). In a specific implementation from Highpoint, used on DFI boards of the LanParty series on the nForce 2 chipset, the increase was barely noticeable, and sometimes even zero. This is probably due to the low speed of the controllers of this manufacturer in general at that time.

Combines RAID 0 and RAID 1. Creates a minimum of three discs. Data is written interleaved onto three disks, and a copy of it is written with a shift by 1 disk. If one block is written to three disks, then a copy of the first part is written to the second disk, and a copy of the second part to the third disk. When using an even number of disks, it is, of course, better to use RAID 10.

Typically, when building RAID 5, one disk is left free (spare), so that in the event of a failure, the system immediately begins to rebuild the array. During normal operation, this drive runs idle. The RAID 5E system involves using this disk as an element of the array. And the volume of this free disk is distributed throughout the array and is located at the end of the disks. The minimum number of disks is 4 pieces. Available volume is equal to n-2, the volume of one disk is used (being distributed among all) for parity, the volume of another is free. When a disk fails, the array is compressed to 3 disks (using the minimum number as an example) by filling free space. The result is a regular RAID 5 array, resistant to the failure of another disk. When a new disk is connected, the array expands and occupies all the disks again. It's worth noting that during compression and decompression, the drive is not resistant to another drive coming out. It is also not read/write at this time. The main advantage is greater speed of operation, since striping occurs on a larger number of disks. Minus - what is not allowed this disk assign to several arrays at once, which is possible in a simple RAID 5 array.

RAID 5EE

It differs from the previous one only in that the areas of free space on the disks are not reserved in one piece at the end of the disk, but are interleaved in blocks with parity bits. This technology significantly speeds up recovery after a system failure. Blocks can be written directly to free space, without having to move around the disk.

Similar to RAID 5E, it uses an additional disk to improve performance and load distribution. Free space is divided between other disks and is located at the end of the disks.

This technology is a registered trademark of Storage Computer Corporation. RAID 3, 4 based array optimized for performance. The main advantage is the use of read/write caching. Requests for data transfer are carried out asynchronously. SCSI disks are used during construction. The speed is approximately 1.5-6 times higher than RAID 3.4 solutions.

Intel Matrix RAID

Is a technology introduced by Intel in southbridges starting with ICH6R. The point comes down to the possibility of combination RAID arrays different levels on disk partitions rather than on individual disks. Let's say, on two disks you can organize two partitions, two of them will store operating system on a RAID 0 array, and the other two - operating in RAID 1 mode - store copies of documents.

Linux MD RAID 10

This is a RAID driver Linux kernels, which provides the ability to create a more advanced version of RAID 10. So, if for RAID 10 there was a limitation in the form of an even number of disks, then this driver can work with an odd one. The principle for three disks will be the same as in RAID 1E, where the disks are striped one at a time to create a copy and stripe blocks, as in RAID 0. For four disks, this will be equivalent to a regular RAID 10. In addition, you can specify in which area a copy will be stored on the disk. Let's say the original will be in the first half of the first disc, and its copy will be in the second half of the second. With the second half of the data it’s the other way around. Data can be duplicated several times. Storing copies on different parts disk allows you to achieve higher access speeds as a result of the heterogeneity of the hard drive (access speed varies depending on the location of the data on the platter, usually the difference is two times).

Developed by Kaleidescape for use in their media devices. Similar to RAID 4 using double parity, but uses a different fault tolerance method. The user can easily expand the array by simply adding disks, and if it contains data, the data will simply be added to it, instead of being deleted, as is usually required.

Developed by Sun. Most big problem RAID 5 is the loss of information resulting from a power failure when the information is out disk cache(which is volatile memory, that is, it does not store data without electricity) did not have time to be saved to magnetic plates. This mismatch of information in the cache and on disk is called incoherence. The organization of the array itself is associated with the Sun Solaris file system – ZFS. Forced writing of the contents of disk cache memory is used; you can restore not only the entire disk, but also a block “on the fly” when the checksum does not match. Another important aspect is the ideology of ZFS - it does not change data when necessary. Instead, it writes updated data and then, making sure that the operation was already successful, changes the pointer to it. Thus, it is possible to avoid data loss during modification. Small files are duplicated instead of creating checksums. This is also done by force file system, because it is familiar with the data structure (RAID array) and can allocate space for these purposes. There is also RAID-Z2, which, like RAID 6, can survive two drive failures by using two checksums.

Something that is not RAID in principle, but is often used together with it. Literally translated as “just a bunch of disks” The technology combines all the disks installed in the system into one large logical drive. That is, instead of three disks, one large one will be visible. The entire total disk capacity is used. There is no acceleration, no reliability, no performance.

Drive Extender

Function embedded in Window Home Server. Combines JBOD and RAID 1. If it is necessary to create a copy, it does not immediately duplicate the file, but puts a label on the NTFS partition indicating the data. When idle, the system copies the file so that the disk space is maximized (disks of different sizes can be used). Allows you to achieve many of the advantages of RAID - fault tolerance and the ability to easily replace a failed disk and restore it in background, transparency of the file location (regardless of what disk it is on). It is also possible to perform parallel access from different disks using the above labels, obtaining similar performance to RAID 0.

Developed by Lime technology LLC. This scheme differs from conventional RAID arrays in that it allows you to mix SATA and PATA drives in one array and drives of different sizes and speeds. A dedicated disk is used for checksum (parity). Data is not striped between disks. If one drive fails, only the files stored on it are lost. However, they can be restored using parity. UNRAID is implemented as an add-on to Linux MD (multidisk).

Most types of RAID arrays are not widespread; some are used in narrow areas of application. The most widespread, from ordinary users to servers entry level steel RAID 0, 1, 0+1/10, 5 and 6. Whether you need a raid array for your tasks is up to you to decide. Now you know how they differ from each other.