• Practical tips for creating RAID arrays on home PCs

    (+) : Has high reliability - it works as long as at least one disk in the array is functioning. The probability of failure of two disks at once is equal to the product of the probabilities of failure of each disk. In practice, if one of the disks fails, immediate action must be taken to restore redundancy. To do this, it is recommended to use hot spare disks with any RAID level (except zero). The advantage of this approach is maintaining constant availability.

    (-) : The disadvantage is that you have to pay the cost of two hard drives, getting the usable capacity of only one hard drive.

    RAID 1+0 and RAID 0+1

    Mirror on many disks - RAID 1+0 or RAID 0+1. RAID 10 (RAID 1+0) refers to the option when two or more RAID 1 are combined into RAID 0. RAID 0+1 can mean two options:

    RAID 2

    Arrays of this type are based on the use of Hamming code. Disks are divided into two groups: for data and for error correction codes, and if data is stored on disks, then disks are needed to store correction codes. Data is distributed across disks intended for storing information, the same as in RAID 0, i.e. they are divided into small blocks according to the number of disks. The remaining disks store error correction codes, which can be used to restore information if any hard disk fails. The Hamming method has long been used in ECC memory and allows on-the-fly correction of single errors and detection of double errors.

    Dignity RAID 2 is an improvement in the speed of disk operations compared to the performance of a single disk.

    Disadvantage RAID 2 array is that the minimum number of disks at which it makes sense to use it is 7. In this case, a structure of almost double the number of disks is needed (for n=3 the data will be stored on 4 disks), so this type of array is not widespread . If there are about 30-60 disks, then the overrun is 11-19%.


    RAID 3

    In a RAID 3 array of disks, data is split into smaller-than-sector-sized chunks (broken into bytes) or blocks and distributed across the disks. Another disk is used to store parity blocks. RAID 2 used a disk for this purpose, but most of the information on the control disks was used for on-the-fly error correction, while most users are satisfied with simply restoring information in the event of a disk failure, which is enough information to fit on one dedicated hard drive.

    Differences between RAID 3 and RAID 2: the inability to correct errors on the fly and less redundancy.

    Advantages:

    • high speed reading and writing data;
    • The minimum number of disks to create an array is three.

    Flaws:

    • an array of this type is only good for single-task work with large files, since the access time to an individual sector, divided across disks, is equal to the maximum of the access intervals to the sectors of each disk. For small blocks, the access time is much longer than the read time.
    • there is a large load on the control disk, and, as a result, its reliability drops significantly compared to disks storing data.


    RAID 4

    RAID 4 is similar to RAID 3, but differs in that the data is divided into blocks rather than bytes. Thus, it was possible to partially overcome the problem of low data transfer speed of small volumes. Writing is slow due to the fact that parity for the block is generated during recording and written to a single disk. Among the widely used storage systems, RAID-4 is used on storage devices from NetApp (NetApp FAS), where its shortcomings are successfully eliminated due to the operation of disks in a special mode group recording, determined by the internal WAFL file system used on devices.

    RAID 5

    The main disadvantage of RAID levels 2 to 4 is the inability to perform parallel write operations, since a separate control disk is used to store parity information. RAID 5 does not have this disadvantage. Data blocks and checksums are cyclically written to all disks of the array; there is no asymmetry in the disk configuration. Checksums mean the result of an XOR (exclusive or) operation. Xor has a feature that is used in RAID 5, which makes it possible to replace any operand with the result, and, using the algorithm xor, get the missing operand as a result. For example: a xor b = c(Where a, b, c- three disks of the raid array), in case a refuses, we can get him by putting him in his place c and after spending xor between c And b: c xor b = a. This applies regardless of the number of operands: a xor b xor c xor d = e. If it refuses c Then e takes his place and holding xor as a result we get c: a xor b xor e xor d = c. This method essentially provides version 5 fault tolerance. To store the result of xor, only 1 disk is required, the size of which is equal to the size of any other disk in the raid.

    (+) : RAID5 has become widespread, primarily due to its cost-effectiveness. Volume disk array RAID5 is calculated using the formula (n-1)*hddsize, where n is the number of disks in the array, and hddsize is the size of the smallest disk. For example, for an array of 4 disks of 80 gigabytes, the total volume will be (4 - 1) * 80 = 240 gigabytes. Writing information to a RAID 5 volume requires additional resources and performance decreases, since additional calculations and write operations are required, but when reading (compared to a separate hard drive), there is a gain because data streams from several disks in the array can be processed in parallel.

    (-) : RAID 5 performance is noticeably lower, especially on Random Write operations, where performance drops by 10-25% of RAID performance 0 (or RAID 10), as it requires more disk operations (each server write operation is replaced on the RAID controller by three - one read operation and two write operations). The disadvantages of RAID 5 appear when one of the disks fails - the entire volume goes into critical mode (degrade), all write and read operations are accompanied by additional manipulations, and performance drops sharply. In this case, the reliability level is reduced to the reliability of RAID-0 with the corresponding number of disks (that is, n times lower than the reliability of a single disk). If before full recovery If the array fails, or an unrecoverable read error occurs on at least one more disk, then the array is destroyed and the data on it cannot be restored by conventional methods. It should also be taken into account that the process of RAID Reconstruction (recovery of RAID data through redundancy) after a disk failure causes an intensive read load from the disks for many hours continuously, which can cause the failure of any of the remaining disks in the least protected period of RAID operation, as well as identify previously undetected read failures in cold data arrays (data that is not accessed when regular work array, archived and inactive data), which increases the risk of failure during data recovery. The minimum number of disks used is three.

    RAID 5EE

    Note: Not supported on all controllers RAID level-5EE is similar to RAID-5E but more efficient backup disk and more short time recovery. Similar to RAID level-5E, this RAID array level creates rows of data and checksums across all drives in the array. RAID-5EE provides improved security and performance. When using RAID level-5E, the capacity of a logical volume is limited by the capacity of two physical hard drives of the array (one for control, one backup). The spare disk is part of a RAID level-5EE array. However, unlike RAID level-5E, which uses non-partitioned free space for backup, in RAID level-5EE checksum blocks are inserted into the backup disk, as shown below in the example. This allows you to quickly rebuild data in the event of a breakdown physical disk. With this configuration, you will not be able to use it with other arrays. If you need a spare drive for another array, you should have another spare hard drive. RAID level-5E requires a minimum of four drives and, depending on the firmware level and their capacity, supports from 8 to 16 drives. RAID level-5E has specific firmware. Note: For RAID level-5EE, you can only use one logical volume in the array.

    Advantages:

    • 100% data protection
    • Large physical disk capacity compared to RAID-1 or RAID -1E
    • Greater performance compared to RAID-5
    • More fast recovery RAID versus RAID-5E

    Flaws:

    • More low performance than in RAID-1 or RAID-1E
    • Supports only one logical volume per array
    • Impossibility sharing backup disk with other arrays
    • Not all controllers supported

    RAID 6

    RAID 6 is similar to RAID 5, but has a higher degree of reliability - the capacity of 2 disks is allocated for checksums, 2 amounts are calculated using different algorithms. Requires a more powerful RAID controller. Ensures operation after the simultaneous failure of two disks - protection against multiple failures. To organize an array, a minimum of 4 disks is required. Typically, using RAID-6 causes approximately a 10-15% drop in disk group performance compared to similar RAID-5 indicators, which is caused by a large amount of processing for the controller (the need to calculate a second checksum, and read and rewrite more disk blocks as each block is written).

    RAID 7

    RAID 7 - registered trademark from Storage Computer Corporation, is not a separate RAID level. The structure of the array is as follows: data is stored on disks, one disk is used to store parity blocks. Writing to disks is cached using RAM, the array itself requires a mandatory UPS; In the event of a power failure, data corruption occurs.

    RAID 10

    RAID 10 architecture diagram

    RAID 10 is a mirrored array in which data is written sequentially to several disks, as in RAID 0. This architecture is a RAID 0 array, the segments of which are RAID 1 arrays instead of individual disks. Accordingly, an array of this level must contain at least 4 disks. RAID 10 combines high fault tolerance and performance.

    Current controllers use this mode by default for RAID 1+0. That is, one disk is the main one, the second is a mirror, data is read from them one by one. Now we can assume that RAID 10 and RAID 1+0 are just different names for the same disk mirroring method. The statement that RAID 10 is the most reliable option for data storage is erroneous, because, despite the fact that this level RAID can preserve data integrity if half of the disks fail; irreversible destruction of the array occurs when two disks fail if they are in the same mirrored pair.

    Combined levels

    In addition to the basic RAID 0 - RAID 5 levels described in the standard, there are combined RAID 1+0, RAID 3+0, RAID 5+0, RAID 1+5 levels, which are interpreted differently by different manufacturers.

    • RAID 1+0 is a combination mirroring And alternation(see above).
    • RAID 5+0 is alternation volumes of the 5th level.
    • RAID 1+5 - RAID 5 of mirrored steam.

    Combined levels inherit both the advantages and disadvantages of their “parents”: the appearance alternation at the RAID 5+0 level does not add any reliability to it, but it has a positive effect on performance. RAID level 1+5 is probably very reliable, but not the fastest and, moreover, extremely uneconomical: the useful capacity of the volume is less than half the total capacity of the disks...

    It is worth noting that the number of hard drives in combined arrays will also change. For example, for RAID 5+0, 6 or 8 hard drives are used, for RAID 1+0 - 4, 6 or 8.

    Comparison of standard levels

    Level Number of disks Effective capacity* Fault tolerance Advantages Flaws
    0 from 2 S*N No highest performance very low reliability
    1 2 S 1 disk reliability
    1E from 3 S*N/2 1 disc** high data security and good performance double cost of disk space
    10 or 01 from 4, even S*N/2 1 disc*** highest performance and highest reliability double cost of disk space
    5 from 3 to 16 S*(N - 1) 1 disk economical, high reliability, good performance performance below RAID 0
    50 from 6, even S*(N - 2) 2 disks** high reliability and performance high cost and difficulty of maintenance
    5E from 4 S*(N - 2) 1 disk cost-effective, high reliability, speed higher than RAID 5
    5EE from 4 S*(N - 2) 1 disk fast data reconstruction after a failure, cost-effective, high reliability, speed higher than RAID 5 performance is lower than RAID 0 and 1, the backup drive is idling and not checked
    6 from 4 S*(N - 2) 2 disks economical, highest reliability performance below RAID 5
    60 from 8, even S*(N - 2) 2 disks high reliability, large volume of data
    61 from 8, even S * (N - 2) / 2 2 disks** very high reliability high cost and complexity of organization

    * N is the number of disks in the array, S is the capacity of the smallest disk. ** Information will not be lost if all disks within one mirror fail. *** Information will not be lost if two disks within different mirrors fail.

    Matrix RAID

    Matrix RAID is a technology implemented by Intel in its chipsets starting with ICH6R. Strictly speaking, this technology is not a new RAID level (its analogue exists in hardware RAID controllers high level), it allows, using a small number of disks, to simultaneously organize one or several arrays of the RAID 1, RAID 0 and RAID 5 levels. This allows for relatively little money to provide increased reliability for some data, and for others high speed access and production.

    Additional features of RAID controllers

    Many RAID controllers are equipped with a set of additional features:

    • "Hot Swap"
    • "Hot Spare"
    • Stability check.

    Software (English) software) RAID

    To implement RAID, you can use not only hardware, but also completely software components(drivers). For example, in systems based on the Linux kernel, there are special kernel modules, and you can manage RAID devices using the mdadm utility. Software RAID has its advantages and disadvantages. On the one hand, it costs nothing (unlike hardware RAID controllers, which cost $250 or more). On the other hand, software RAID uses CPU resources, and at times of peak load on the disk system, the processor can spend a significant portion of its power on servicing RAID devices.

    Linux kernel 2.6.28 (the last one released in 2008) supports software RAID of the following levels: 0, 1, 4, 5, 6, 10. The implementation allows you to create RAID on separate disk partitions, which is similar to the Matrix RAID described above. Booting from RAID is supported.

    Further development of the RAID idea

    The idea of ​​RAID arrays is to combine disks, each of which is treated as a set of sectors, and as a result the driver file system“sees” as if a single disk and works with it, not paying attention to its internal structure. However, significant improvements in performance and reliability can be achieved. disk system, if the file system driver “knows” that it is working not with one disk, but with a set of disks.

    Moreover, if any of the disks in RAID-0 are destroyed, all information in the array will be lost. But if the file system driver places each file on one disk, and the directory structure is correctly organized, then if any of the disks is destroyed, only the files located on that disk will be lost; and the files located entirely on the preserved disks will remain accessible.

    Corporation employee Y-E Data, which is the world's largest manufacturer of USB floppy drives, Daniel Olson, as an experiment, created a RAID array of four

    Greetings to all, dear readers of the blog site! Earlier, I already published an article about, I highly recommend reading it. There I only briefly talked about what a tenth level raid array is, or “1+0” - as it is also called. This article will contain a detailed description of all the advantages and disadvantages of this type of Raid array, as well as its comparison with the fifth raid.

    As you know, Raid 10 has incorporated all the good things from Raid 0 and Raid 1: increased access speed and increased data reliability, respectively. Raid 10 is a kind of “strip” of mirrors consisting of pairs hard drives, combined into a first-level raid. In other words, the disks of a nested array are connected in pairs to form a first-level “mirror” raid, and these nested arrays, in turn, are transformed into a common zero-level array using data striping.

    Description of features raid array 10 boils down to this:

    • if any one disk from the nested raid 1 arrays breaks down, no data loss will occur. That is, if “inside” the tenth raid there are only four disks, which is the minimum acceptable number, then painless failure of as many as two disks at the same time is possible;
    • the next feature (rather a disadvantage) is the impossibility of replacement damaged drives, unless of course the array is equipped with “hot spare” technology;
    • If you rely on the statements of device manufacturers and numerous tests, it turns out that raid “1+0” provides the best throughput compared to other types, except for zero raid, of course.

    Number of disks

    Answering the question - how many disks are required for raid 10, I will say that such an array requires an even number of them. Moreover, the minimum allowed number of hard drives is 4, and the maximum is 16. Also, there is an opinion that raid “1+0” (aka 10) and “0+1” are somehow different. This is true, but the difference is only in the order in which the arrays are connected.

    The last digit indicates the type of the array itself top level. For example, raid “0+1” denotes a certain mirror system of stripes, inside of which there are two zero raids (total number: 4 hard drives) are combined into one raid 1 - this is just an example; there may be more “zero” raid arrays here. Moreover, from the outside visually these two subtypes of raid 10 are no different. And purely theoretically, they have an equal degree of resistance to failures.

    In practice, most manufacturers now use Raid 1+0 instead of Raid 0+1, explaining this by the greater resistance of the first option to errors and failures.

    So many disks can fail and no data loss will occur

    I repeat, the main disadvantage of raid 10 remains the need to include a “hot spare” disk in the array. The calculation is approximately as follows: for every 5 working drives there should be one backup drive. Now a few words about disk capacity. The peculiarity of raid 1 capacity is that only half of the hard drive space of their total volume is always available to you. In RAID 10, out of 4 disks with a total capacity of 4 Terabytes, only 2 TB will be available for recording. In general, you can easily calculate the available volume using the formula: F*G/2, F means the number of disks in the array, and G is their capacity.

    Comparison raid 10 vs raid 5

    When talking about choosing between the “tenth” raid and any other, the thought of raid 5 usually comes to mind. Raid 5 is similar to the first in its purpose, with the only difference being that it requires at least 3 drives. Moreover, one of them will not be available as a place for recording data; only service information will be stored on it.

    The fifth raid is able to survive the loss (breakage) of only one hard drive; the breakdown of the second will entail the loss of all data. However, a level 5 raid is a good and cheap way to extend the life of drives and reduce the likelihood of them breaking. In order for our comparison to be effective and clear, I will try to sort out the advantages and disadvantages of the fifth raid over the tenth:

    1. The capacity of a raid 5 array is equal to the total disk capacity minus the capacity of one disk. While in raid 10, in fact, only half of the storage capacity is available.
    2. During read/write operations, interaction with data streams can be carried out in parallel from several disks. Therefore, the writing or reading speed increases compared to usual hard disk. But, without a good raid controller, the speed will not be very high.
    3. The performance of raid 5 in random block reading/writing operations is 10–25% lower compared to raid 10. If one of the disks in the fifth raid fails, the entire array goes into critical mode - all write and read operations are accompanied by additional manipulations, and performance drops sharply.

    So, what do we have in the end: raid 10 has better fault tolerance and speed compared to raid 5. However, not everyone can afford to assemble such an array of disks. Raid 5 is some kind of intermediate solution between a zero array and a mirror (raid 1). How to make raid 10 from four disks will be discussed below, although I have already touched upon this topic “in passing” in the article, the link to which is indicated at the top. Of course, for this purpose it is better to use the hardware level - you need a special controller, but good equipment is expensive.

    The so-called “fake raid” (built into motherboard) is not reliable and fast, I do not recommend using it. It would be better then to organize all this at the program level. Well now, detailed example creating an array of four disks using a raid controller. To begin, select the appropriate utility through the BIOS.

    Then, in the utility menu, select the “driver initialization” item.

    Select all our disks.

    We return to the main menu of the utility again and select the “create array” item.

    And at the last step, we indicate the type of the array, its size and other parameters.

    A small, but, I hope, reasonable answer to the topic Why RAID-5 is a “mustdie”? .
    Below I will produce simplest calculation reliability of RAID10 and RAID5 and comparison of their characteristics, and also point out some fundamental disadvantages of RAID1 and RAID10.

    A little introductory note:

    We will consider the simplest cases - RAID10 of 4 disks and RAID5 of 3 disks. Let's assume that all disks in the system are the same.
    The original version of the article mentioned RAID0+1 instead of RAID10, but this creates unnecessary confusion. The correct name is, of course, RAID10 - I'll throw ashes on my head.

    Let n be the probability of failure of one disk;

    So - RAID10:

    Number of disks in the array - 4;
    The price of the array is equal to the cost four disks;
    The capacity of the array will be equal to twice the capacity of the disks used (one disk);
    The maximum data read speed is twice the speed of one disk;
    Probability of array failure for itself best case(when the controller implements RAID1+0 as a single matrix and can combine drives in any way):
    Probability of failure of one disk: P1=n(1-n)^3;
    Probability of failure of two disks: P2=(n^2)*(1-n)^2;
    Probability of failure three discs: P3=(n^3)*(1-n);
    Probability of failure of four disks: P4=n^4;
    Probability trouble-free operation: P0=(1-n)^4;
    Total probability: 4*P1+6*P2+4*P3+P4+P0=1;
    Array failure probability: P(RAID10)=2*P2+4*P3+P4;
    * In the first term, instead of 6, there is 2, since only in two cases (if disks with the same data are damaged) the array cannot be restored.

    Separately, I note that most controllers do not know how to combine drives, which means the failure of any two drives leads to data loss, and the reliability of the array as a whole is much lower.

    RAID5:

    Number of disks in the array - 3;
    The price of the array is equal to the cost of three disks;
    The capacity of the array is equal to the capacity of two disks;
    the maximum reading speed is one and a half times the reading speed of one disk;
    The probability of an array failure is equal to the probability of failure of two disks in it:
    Probability of failure of one disk: P1=n(1-n)^2;
    Probability of failure of two disks: P2=(n^2)*(1-n);
    Probability of failure of three disks: P3=n^3;
    Probability of failure-free operation: P0=(1-n)^3;
    Total probability: 3*P1+3*P2+P3+P0=1;
    Array failure probability: P(RAID5)=3*P2+P3;

    Conclusions:

    Let's start, of course, with the probability of failure - subtract the probability of failure of RAID5 from the probability of failure of RAID10:
    P(RAID10)-P(RAID5)=2n^2*(n-1)^2-n^3+n^4+3*n^2*(n-1)-4*n^3*(n -1)
    Considering that n->0 P(RAID10)-P(RAID5)<0, т.е. надёжность RAID5 НИЖЕ надёжности RAID10. Разница совсем небольшая, но в пользу RAID10;
    If we assume that drives cannot be combined in any way, then RAID5 is more reliable.
    Price ratio: RAID5 is 1.333 times cheaper.
    Speed ​​ratio: RAID5 is 1,333 times slower than RAID10, but one and a half times faster than a single drive.
    Attention, the question is which option is better? The one that is more expensive and less reliable, although a little faster. Or the one that is cheaper and more reliable?
    Personally, my opinion leans towards the more reliable and cheaper RAID5 is not leaning anywhere.

    Addition:
    In the comments, the respected track reasonably pointed out that in some cases RAID-5 may be much slower than RAID1. In my humble opinion, these should be very, very specific cases, but it is worth keeping in mind.

    All kinds of comments:

    Recovery time:
    RAID10 recovery is ideally equal to the time it takes to copy the entire amount of data.
    For RAID5 the situation is more complicated, since data recovery using correction codes is required.
    When implemented in software, RAID5 recovery time will be determined by processor speed.
    When implemented in hardware, the recovery time of RAID5 is equal to the recovery time of RAID10.
    Considering that modern processors can easily handle data flows of about 100MB/s (the approximate peak read speed of modern drives), we can say that if implemented correctly, software RAID5 will not be much slower than RAID10.
    About reliability during recovery. For the case under consideration, there is no need to talk about this at all - backup copies need to be made! In general, it should be taken into account that at the time of recovery the number of disks in RAID10 is greater than in RAID5, which means the probability of failure is higher, and we cannot say that at the time of recovery RAID10 is definitely more reliable.

    Addition:
    If RAID-5EE is used, then in case of the first failure it is “compressed” into RAID-5, which can take a very long time. However, it should be taken into account that the result is a full-fledged RAID-5, which is resistant to single failures, i.e. in fact (with some limitations) the system can survive two failures in a row.

    CPU load:
    Software implementation of RAID5 loads the processor. For modern processors, this is usually not critical, but for fast drives you need to keep in mind that the faster the drive, the greater the load on the processor.
    And again reliability is the last nail in the coffin:
    For some reason, when talking about RAID10 and especially RAID1, everyone loses sight of one very important point.
    Yes, in the event of a physical drive failure, it will ensure data recovery from the copy, but what happens if the drives return different data? After all, in RAID1 there is no way to know which data is correct! You can try to determine the reliability of data by its content, but this is not a trivial task that can only be done manually, and not always.
    It is for this reason that I do not consider RAID1 here at all - it does not provide a mechanism for monitoring the reliability of data. And RAID10 in general too.
    And RAID5 (6?) in the general case very well ensures that if one of the three drives returns incorrect data, then it will be clearly known that they are not reliable.
    How can this (unreliability of data) happen?
    Problems with disk overheating. Nutrition problems. Problems with disk firmware. Lots of options! Up to the complete burnout of the electronics as a result of the failure of the computer power supply. In this case, you can try to revive the disks by installing boards from similar devices, but there will be no guarantee that all the data on the disks is reliable.
    And another carnation there too. In the topic where it all started there is a lot written about BER (bit error rates). Without going into details, I’ll just note that, firstly, for hard drives it is still customary to talk more about MTBF (mean time between failures), and secondly, if we talk about BER, then about UBER (uncorrectable bit error rates), and , thirdly, this will be an argument in favor of RAID5 - if the drives return distorted data (which has gone through all the correction procedures), then how will you know which drive to trust?

    Addition:
    Wiki says the opposite - the recovery information is not used until one of the disks fails. Life experience, however, says otherwise, but it was a long time ago and I don’t even remember on which controller (perhaps it was one of non-standard RAID levels). So we can definitely talk about the reliability of the data only for ZFS/RAID-6.

    Verdict:

    The verdict is simple - if you don’t need unnecessary problems out of the blue, then you don’t need to fence either RAID1 or RAID0+1 - you need to look towards RAID5, 5E, 6, ZFS
    The verdict regarding “pure” RAID5 is not clear :)

    Udpate:
    I corrected the probability calculation - the conclusion did not change. Corrected “RAID0+1” to “RAID10”. I note that in the case described, “RAID0+1” is identical to “RAID1+0”. But the correct name is of course “RAID10”.

    Udpate2:
    So easily and uncomplicatedly the meaning of the article changed, if not to the opposite, then certainly radically.

    Today we will talk about RAID arrays. Let's figure out what it is, why we need it, what it is like and how to use all this magnificence in practice.

    So, in order: what is RAID array or just RAID? This abbreviation stands for "Redundant Array of Independent Disks" or "redundant (backup) array of independent disks." To put it simply, RAID array this is a collection of physical disks combined into one logical disk.

    Usually it happens the other way around - one physical disk is installed in the system unit, which we split into several logical ones. Here the situation is the opposite - several hard drives are first combined into one, and then the operating system is perceived as one. Those. The OS firmly believes that it physically only has one disk.

    RAID arrays There are hardware and software.

    Hardware RAID arrays are created before loading the OS using special utilities built into RAID controller- something like a BIOS. As a result of creating such RAID array already at the OS installation stage, the distribution kit “sees” one disk.

    Software RAID arrays created by OS tools. Those. During boot, the operating system “understands” that it has several physical disks, and only after the OS starts, through software, the disks are combined into arrays. Naturally, the operating system itself is not located on RAID array, since it is installed before it is created.

    "Why is all this needed?" - you ask? The answer is: to increase the speed of reading/writing data and/or increase fault tolerance and security.

    "How RAID array can increase speed or secure data?" - to answer this question, consider the main types RAID arrays, how they are formed and what it gives as a result.

    RAID-0. Also called "Stripe" or "Tape". Two or more hard drives are combined into one by sequential merging and summing up the volumes. Those. if we take two 500GB disks and create them RAID-0, the operating system will perceive this as one terabyte disk. At the same time, the read/write speed of this array will be twice as high as that of one disk, since, for example, if the database is physically located in this way on two disks, one user can read data from one disk, and another user can write to another disk at the same time. Whereas, if the database is located on one disk, the hard disk itself will perform read/write tasks of different users sequentially. RAID-0 will allow reading/writing in parallel. As a consequence, the more disks in the array RAID-0, the faster the array itself works. The dependence is directly proportional - the speed increases N times, where N is the number of disks in the array.
    At the array RAID-0 there is only one drawback that outweighs all the advantages of using it - the complete lack of fault tolerance. If one of the physical disks of the array dies, the entire array dies. There's an old joke about this: "What does the '0' in the title mean? RAID-0? - the amount of information restored after the death of the array!"

    RAID-1. Also called "Mirror" or "Mirror". Two or more hard drives are combined into one by parallel merging. Those. if we take two 500GB disks and create them RAID-1, the operating system will perceive this as one 500GB disk. In this case, the read/write speed of this array will be the same as that of one disk, since information is read/written to both disks simultaneously. RAID-1 does not provide a gain in speed, but provides greater fault tolerance, since in the event of the death of one of the hard drives, there is always a complete duplicate of information located on the second drive. It must be remembered that fault tolerance is provided only against the death of one of the array disks. If the data was deleted purposefully, it is deleted from all disks of the array simultaneously!

    RAID-5. A more secure option for RAID-0. The volume of the array is calculated using the formula (N - 1) * DiskSize RAID-5 from three 500GB disks, we get an array of 1 terabyte. The essence of the array RAID-5 is that several disks are combined into RAID-0, and the last disk stores the so-called “checksum” - service information intended to restore one of the array disks in the event of its death. Array write speed RAID-5 somewhat lower, since time is spent calculating and writing the checksum to a separate disk, but the reading speed is the same as in RAID-0.
    If one of the array disks RAID-5 dies, the read/write speed drops sharply, since all operations are accompanied by additional manipulations. Actually RAID-5 turns into RAID-0 and if recovery is not taken care of in a timely manner RAID array there is a significant risk of losing data completely.
    With an array RAID-5 You can use the so-called Spare disk, i.e. spare. During stable operation RAID array This disk is idle and not used. However, in the event of a critical situation, recovery RAID array starts automatically - information from the damaged one is restored to the spare disk using checksums located on a separate disk.
    RAID-5 is created from at least three disks and saves from single errors. In case of simultaneous occurrence of different errors on different disks RAID-5 doesn't save.

    RAID-6- is an improved version of RAID-5. The essence is the same, only for checksums, not one, but two disks are used, and the checksums are calculated using different algorithms, which significantly increases the fault tolerance of everything RAID array generally. RAID-6 assembled from at least four disks. The formula for calculating the volume of an array looks like (N - 2) * DiskSize, where N is the number of disks in the array, and DiskSize is the size of each disk. Those. when creating RAID-6 from five 500GB disks, we get an array of 1.5 terabytes.
    Write speed RAID-6 lower than RAID-5 by about 10-15%, which is due to additional time spent on calculating and writing checksums.

    RAID-10- also sometimes called RAID 0+1 or RAID 1+0. It is a symbiosis of RAID-0 and RAID-1. The array is built from at least four disks: on the first RAID-0 channel, on the second RAID-0 to increase read/write speed, and between them in a RAID-1 mirror to increase fault tolerance. Thus, RAID-10 combines the advantages of the first two options - fast and fault-tolerant.

    RAID-50- similarly, RAID-10 is a symbiosis of RAID-0 and RAID-5 - in fact, RAID-5 is built, only its constituent elements are not independent hard drives, but RAID-0 arrays. Thus, RAID-50 gives very good read/write speed and contains the stability and reliability of RAID-5.

    RAID-60- the same idea: we actually have RAID-6, assembled from several RAID-0 arrays.

    There are also other combined arrays RAID 5+1 And RAID 6+1- they look like RAID-50 And RAID-60 the only difference is that the basic elements of the array are not RAID-0 tapes, but RAID-1 mirrors.

    How do you understand combined RAID arrays: RAID-10, RAID-50, RAID-60 and options RAID X+1 are direct descendants of the basic array types RAID-0, RAID-1, RAID-5 And RAID-6 and serve only to increase either read/write speed or increase fault tolerance, while carrying the functionality of basic, parent types RAID arrays.

    If we move on to practice and talk about the use of certain RAID arrays in life, the logic is quite simple:

    RAID-0 We do not use it in its pure form at all;

    RAID-1 We use it where read/write speed is not particularly important, but fault tolerance is important - for example, on RAID-1 It’s good to install operating systems. In this case, no one except the OS accesses the disks, the speed of the hard disks themselves is quite sufficient for operation, fault tolerance is ensured;

    RAID-5 We install it where speed and fault tolerance are needed, but there is not enough money to buy more hard drives or there is a need to restore arrays in case of damage without stopping work - spare Spare drives will help us here. Common Application RAID-5- data storage;

    RAID-6 used where it is simply scary or there is a real threat of death of several disks in the array at once. In practice it is quite rare, mainly among paranoid people;

    RAID-10- used where it is necessary to work quickly and reliably. Also the main direction for use RAID-10 are file servers and database servers.

    Again, if we simplify further, we come to the conclusion that where there is no large and voluminous work with files, it is quite enough RAID-1- operating system, AD, TS, mail, proxy, etc. Where serious work with files is required: RAID-5 or RAID-10.

    The ideal solution for a database server is a machine with six physical disks, two of which are combined into a mirror RAID-1 and the OS is installed on it, and the remaining four are combined into RAID-10 for fast and reliable data processing.

    If, after reading all of the above, you decide to install it on your servers RAID arrays, but don’t know how to do it and where to start - contact us! - we will help you select the necessary equipment, as well as carry out installation work for implementation RAID arrays.

    RAID-10 in standard duplication mode will survive the loss of any one disk without data loss; resistance to the loss of the second is not guaranteed. Assembled as a mirror on top of a stripe, or vice versa, a stripe on top of a mirror, or mdadm (which is neither one nor the other) - it makes no difference, stability is guaranteed only without any one disk. Particular attention to "any drive".

    Each block of data in RAID-10 is mirrored across two disks, because of this, the total loss of capacity is half. But therefore, if you are unlucky, and even out of 10 disks, only those two that had mirrors of one data sector fell out - there is nowhere else to read this sector. If you are very lucky and the exact disks you need fall out, you can lose up to half of the array’s drives.

    For example, in Linux raid aka mdadm, it is possible to specify how many copies of data should be replicated across disks. For example, 3 copies of data on 6 disks will give you the opportunity to survive the loss of any two disks and not any 4. The price for this is the available capacity of the array. You will have access to the capacity of only two disks out of 6.

    RAID5 and RAID6, which they write about in the comments, will survive the loss of one and two disks, respectively. Failure of any second disk in raid5 or any third in raid6 is fatal and entails the loss of the entire array. The purpose and destiny of these raid levels is to insure against the death of the disk, but at the same time somehow cheaper than a mirror. RAID5 will reduce the formatted capacity of the array by the size of only one disk, RAID6 - by the capacity of only two disks. And not half as much as RAID1 or RAID10.

    For example, from 12 1 TB disks you can assemble:

    • RAID5 with a capacity of 11 TB, you can lose any 1 disk
    • RAID6 with a capacity of 10 TB, you can lose any 2 disks
    • RAID10 with a capacity of 6 TB, you can lose any 1 disk
    • RAID10 with a capacity of 4 TB, if you configure it, you can lose any 2 disks

    It would seem, why then do they actively use raid10 with such a difference in capacity? Answer: because of performance. In RAID10, a read request can be served by any disk in the pair, which means that in a normally made RAID10, read requests can be parallelized across different disks. With raid5/6, one initial block of data is stored in only one location. To read it from redundant data, you will need to read this segment from all disks at once and apply a little mathematics. Then, RAID5/6 is slower on writes. And a much more dramatic difference is in the degraded form, i.e. if one disk fell out. RAID5/6 suffers more than noticeably in terms of performance.

    How many disks can be lost is the problem. Let me just remind you that when a failed disk is replaced with a new one and the array synchronization process begins, this is a very dangerous time; the load on the old disks increases sharply and someone else may die. Therefore, RAID5 is used quite rarely, RAID6 is not much more expensive for these tasks, but it provides security during the array rebuilding.

    And another important point that should always be pointed out when talking about raids: RAID is not a backup. You should have a backup anyway.