Define file system. Basic elements of a file system

Files on a computer are created and placed based on system principles. Thanks to their implementation, the user gets the opportunity to comfortably access necessary information, without thinking about complex algorithms for accessing it. How are file systems organized? Which ones are the most popular today? What are the differences between PC-friendly file systems? And those used in mobile devices - smartphones or tablets?

File Systems: Definition

According to a common definition, a file system is a set of algorithms and standards used to organize effective access for a PC user to data located on the computer. Some experts consider it part of Other IT experts, recognizing the fact that it is directly related to the OS, believe that the file system is an independent component of computer data management.

How were computers used before the file system was invented? Computer science, as a scientific discipline, has recorded the fact that for a long time data management was carried out through structuring within the framework of algorithms embedded in specific programs. Thus, one of the criteria for a file system is to have standards that are the same for most programs that access data.

How file systems work

The file system is, first of all, a mechanism that involves the use of computer hardware resources. As a rule, we are talking about magnetic or laser media - hard drives, CDs, DVDs, flash drives, floppy disks that have not yet become obsolete. In order to understand how the corresponding system works, let’s define what the file itself is.

According to the generally accepted definition among IT experts, this is a data area of a fixed size, expressed in basic units of information - bytes. The file is located on disk media, usually in the form of several interconnected blocks that have a specific access “address”. The file system determines these same coordinates and “reports” them, in turn, to the OS. Which clearly transmits the relevant data to the user. Data is accessed in order to read it, modify it, or create new one. The specific algorithm for working with file “coordinates” may vary. It depends on the type of computer, OS, specifics of the stored data and other conditions. Because there is various types file systems. Each of them is optimized for use on a specific OS or for working with certain types of data.

Adaptation disk media to be used through the algorithms of a particular file system is called formatting. The corresponding hardware elements of the disk - clusters - are prepared for subsequent writing of files to them, as well as reading them in accordance with the standards laid down in a particular data management system. How to change the file system? In most cases, this can only be done by reformatting the storage medium. As a rule, the files are erased. However, there is an option in which, using special programs, it is still possible, although this usually requires a lot of time, to change the data management system, leaving the latter untouched.

File systems do not work without errors. There may be some failures in the organization of work with data blocks. But in most cases they are not critical. As a rule, there are no problems with how to fix the file system or eliminate errors. In Windows OS, in particular, there are built-in software solutions, accessible to any user. Such as, for example, the Check Disk program.

Varieties

What types of file systems are the most common? Probably, first of all, those used by the most popular PC OS in the world - Windows. The main Windows file systems are FAT, FAT32, NTFS and their various modifications. Along with computers, smartphones and tablets have gained popularity. Most of them, if we talk about the global market and do not consider differences in technology platforms, are controlled by Android and iOS OS. These operating systems use their own algorithms for working with data that are different from those that characterize Windows file systems.

Standards open to all

Note that in lately in the global electronics market there is some unification of standards in terms of operating OS with various types data. This can be seen in two aspects. First, different devices running two dissimilar types of OS often use the same file system, which is equally compatible with each OS. Secondly, modern versions of the OS, as a rule, are able to recognize not only their typical file systems, but also those that are traditionally used in other operating systems - both through built-in algorithms and using third-party software. For example, modern Linux versions, as a rule, recognize marked file systems for Windows without problems.

File system structure

Despite the fact that the types of file systems are presented in a fairly large number, they generally work according to very similar principles (we outlined the general scheme above) and within the framework of similar structural elements or objects. Let's look at them. What are the main objects of a file system?

One of the key ones is - It is an isolated data area in which files can be placed. The directory structure is hierarchical. What does it mean? One or more directories may reside within another. Which, in turn, is part of the “superior” one. The most important thing is the root directory. If we talk about the principles on which the Windows file system works - 7, 8, XP or another version - the root directory is a logical drive, designated by a letter - usually C, D, E (but you can configure any that is in English alphabet). As for, for example, the Linux OS, the root directory there is the magnetic medium as a whole. In this and other OSs based on its principles - such as Android - logical drives are not used. Is it possible to store files without directories? Yes. But this is not very convenient. Actually, comfort in using a PC is one of the reasons for introducing the principle of distributing data into directories in file systems. By the way, they can be called differently. IN Windows directories are called folders, in Linux - basically the same. But the traditional name for directories in this OS, used for many years, is “directories”. As in previous Windows and Linux OS - DOS, Unix.

Among IT specialists, there is no clear opinion as to whether a file should be considered a structural element of the corresponding system. Those who believe that this is not entirely correct argue their point of view by saying that the system can easily exist without files. Even though this is a useless phenomenon from a practical point of view. Even if no files are written to the disk, the corresponding system may still be present. Typically, magnetic media sold in stores does not contain any files. But they already have a corresponding system. Another view is that files should be considered an integral part of the systems they are managed by. Why? But because, according to experts, the algorithms for using them are adapted primarily to work with files within the framework of certain standards. The systems in question are not intended for anything else.

Another element present in most file systems is a data area containing information about the placement of a specific file in a specific location. That is, you can place a shortcut in one place on the disk, but it is also possible to provide access to the desired data area, which is located in another part of the media. You can consider that shortcuts are full-fledged objects of the file system if you agree that files are also such.

One way or another, it will not be a mistake to say that all three types of data - files, shortcuts and directories - are elements of their respective systems. At least this thesis will correspond to one of the common points of view. The most important aspect that characterizes how a file system works is the principles of naming files and directories.

File and directory names on different systems

If we agree that files are still components of the systems corresponding to them, then it is worth considering their basic structure. What is the first thing to note? To make it easier to access them, most modern data management systems provide a two-level file naming structure. The first level is the name. The second is expansion. Let's take the music file Dance.mp3 as an example. Dance is the name. Mp3 - extension. The first is intended to reveal to the user the essence of the file’s contents (and for the program to be a guide for quick access). The second indicates the file type. If it is Mp3, then it is easy to guess that we are talking about music. Files with the extension Doc are, as a rule, documents, Jpg are pictures, Html are web pages.

Directories, in turn, have a single-level structure. They only have a name, no extension. If we talk about the differences between different types of data management systems, then the first thing you should pay attention to is the principles of naming files and directories implemented in them. Regarding Windows OS, the specifics are as follows. In the world's most popular operating system, files can be named in any language. The maximum length, however, is limited. The exact interval depends on the data management system used. Typically these values range from 200-260 characters.

A general rule for all operating systems and their corresponding data management systems is that files with the same names cannot be located in the same directory. In Linux, there is a certain “liberalization” of this rule. There may be files in the same directory with the same letters, but in different case. For example, Dance.mp3 and DANCE.mp3. This is not possible on Windows OS. The same rules are also established in terms of placing directories within others.

Addressing files and directories

Addressing files and directories is the most important element of the corresponding system. On Windows, its custom format might look like this: C:/Documents/Music/ - this is access to the Music directory. If we are interested in a specific file, then the address may look like this: C:/Documents/Music/Dance.mp3. Why "custom"? The fact is that at the level of hardware and software interaction between computer components, the structure of file access is much more complex. The file system determines the location of file blocks and interacts with the OS in largely hidden operations. However, it is extremely rare for a PC user to need to use other “address” formats. Almost always, files are accessed in the specified standard.

Comparison of file systems for Windows

We have studied the general principles of the functioning of file systems. Let us now consider the features of their most common types. The most commonly used file systems in Windows are FAT, FAT32, NTFS, and exFAT. The first in this series is considered obsolete. At the same time, for a long time it was a kind of flagship of the industry, but as PC technology grew, its capabilities no longer met the needs of users and the resource needs of software.

Designed to replace FAT file the system is FAT32. According to many IT experts, it is now the most popular if we talk about the PC market for Windows control. It is most often used when storing files on hard drives and flash drives. It can also be noted that this data management system is quite regularly used in memory modules of various digital devices- phones, cameras. The main advantage of FAT32, which is highlighted by IT experts, is thus, despite the fact that this file system was created by Microsoft, most modern operating systems, including those installed on the specified types of digital equipment, can work with data within the framework of the algorithms embedded in it.

The FAT32 system also has a number of disadvantages. First of all, we can note the limitation on the size of one taken file - it cannot be more than 4 GB. Also, in a FAT32 system, you cannot use built-in Windows tools to specify a logical drive whose size would be larger than 32 GB. But this can be done by installing additional specialized software.

Another popular file management system developed by Microsoft is NTFS. According to some IT experts, it is superior to FAT32 in most parameters. But this thesis is true when we are talking about a computer running Windows. NTFS is not as versatile as FAT32. The peculiarities of its functioning make the use of this file system not always comfortable, in particular on mobile devices. One of the key advantages of NFTS is reliability. For example, in cases where the hard drive suddenly loses power, the likelihood of files being damaged is minimized due to the data duplication algorithms provided in NTFS.

One of the newest file systems from Microsoft is exFAT. In the best possible way it is adapted for flash drives. The basic principles of operation are the same as in FAT32, but there are also significant modernizations in some aspects: for example, there are no restrictions on the size of a single file. At the same time, the exFAT system, as many IT experts note, is among those that have low versatility. On non-Windows computers, file handling may be difficult when using exFAT. Moreover, even in some versions of Windows itself, such as XP, data on disks formatted using exFAT algorithms may not be readable. You will need to install an additional driver.

Note that due to the use of a fairly wide range of file systems in Windows OS, the user may experience periodic difficulties in terms of compatibility various devices with a computer. In some cases, for example, it is necessary to install the WPD (Windows Portable Devices - technology used when working with portable devices) file system driver. Sometimes the user may not have it at hand, and as a result, the external OS media may not recognize it. The WPD file system may require additional software adaptation to the operating environment on a specific computer. In some cases, the user will be forced to contact IT specialists to solve the problem.

How to determine which file system - exFAT or NTFS, or maybe FAT32 - is optimal for use in specific cases? The recommendations of IT specialists in general are as follows. Two main approaches can be used. According to the first, a distinction should be made between typical hard drive file systems, as well as those that are better adapted to flash drives. FAT and FAT32, according to many experts, are better suited for flash drives, NTFS - for hard drives (due to the technological features of working with data).

In the second approach, the size of the carrier matters. If we are talking about using a relatively small volume of a disk or flash drive, you can format it in the FAT32 system. If the disk is larger, you can try exFAT. But only if the media is not intended to be used on other computers, especially those that are not equipped with the most latest versions Windows. If we are talking about large hard drives, including external ones, then it is advisable to format them in NTFS. These are approximately the criteria by which the optimal file system can be selected - exFAT or NTFS, FAT32. That is, you should use one of them, taking into account the size of the media, its type, as well as the version of the OS on which the drive is primarily used.

File systems for Mac

Another popular software and hardware platform in the global computer market is Apple's Macintosh. PCs in this line run the Mac OS operating system. What are the features of organizing work with files on Mac computers? Most modern Apple PCs use the Mac OS Extended file system. Previously in Computers Mac work data was managed in accordance with HFS standards.

The main thing that can be noted in terms of its characteristics is that a disk managed by the Mac OS Extended file system can accommodate very large files - we can talk about several million terabytes.

File system in Android devices

The most popular OS for mobile devices - a form of electronic technology that is not inferior in popularity to PCs - is Android. How are files managed on devices of the corresponding type? Let us note first of all that this operating system is actually a “mobile” adaptation of the Linux OS, which, thanks to the open source program code, can be modified for use on a wide range of devices. Therefore, file management in mobile devices under Android control carried out generally according to the same principles as in Linux. We noted some of them above. In particular, file management in Linux is carried out without dividing the media into logical drives, as happens in Windows. What else is interesting about the Android file system?

The root directory in Android is usually a data area called /mnt. Accordingly, the address of the required file may look something like this: /mnt/sd/photo.jpg. In addition, there is another feature of the data management system that is implemented in this mobile OS. The fact is that the flash memory of a device is usually classified into several sections, such as, for example, System or Data. However, the initially specified size of each of them cannot be changed. An approximate analogy regarding this technological aspect can be found by remembering that you cannot (unless you use special software) change the size of logical drives in Windows. It must be fixed.

One more interesting feature organizing work with files in Android - the corresponding operating system, as a rule, writes new data to a specific area of the disk - Data. Work, for example, with the System section is not carried out. Therefore, when the user activates the reset function software settings smartphone or tablet to the “factory” level, then in practice this means that those files written to the Data area are simply erased. The System section, as a rule, remains unchanged. Moreover, the user, without having specialized software, cannot make any adjustments to the contents in the System. The procedure associated with updating the system storage area in an Android device is called flashing. This is not formatting, although both operations are often performed simultaneously. As a rule, flashing is used for the purpose of installation on mobile device more new version Android OS.

Thus, the key principles on which the Android file system operates are the absence of logical drives, as well as strict differentiation of access to system and user data. It cannot be said that this approach is fundamentally different from that implemented in Windows, however, according to many IT experts, in Microsoft's OS users have somewhat greater freedom in working with files. However, as some experts believe, this cannot be considered a clear advantage of Windows. The “liberal” mode in terms of file management is used, of course, not only by users, but also by computer viruses, to which Windows is very susceptible (unlike Linux and its “mobile” implementation in the form of Android). This, according to experts, is one of the reasons that there are so few viruses for Android devices - from a purely technological point of view, they cannot fully function in an operating environment that operates on the principles of strict file access control.

Sooner or later, a novice computer user is faced with such a concept as a file system (FS). As a rule, the first acquaintance with this term occurs when formatting a storage medium: logical drives and connected media (flash drives, memory cards, external hard drive).

Before formatting, the Windows operating system prompts you to select the type of file system on the media, cluster size, and formatting method (quick or full). Let's figure out what a file system is and why it is needed?

All information is recorded on the media in the form, which must be located in a certain order, otherwise the operating system and programs will not be able to operate with the data. This order is organized by the file system using certain algorithms and rules for placing files on the media.

When a program needs a file stored on disk, it does not need to know how or where it is stored. All that is required of the program is to know the file name, its size and attributes in order to transfer this data to the file system, which will provide access to the required file. The same thing happens when writing data to a medium: the program transfers information about the file (name, size, attributes) to the file system, which saves it according to its own specific rules.

To better understand, imagine a librarian giving a book to a client based on its title. Or in reverse order: the client returns the book he read to the librarian, who places it back into storage. The client does not need to know where and how the book is stored; this is the responsibility of the establishment's employee. The librarian knows the rules of library cataloging and, according to these rules, searches for the publication or places it back, i.e. performs its official functions. In this example, the library is a storage medium, the librarian is a file system, and the client is a program.

Basic File System Functions

The main functions of the file system are:

placement and organization on a data carrier in the form of files;
determining the maximum supported amount of data on the storage medium;
creating, reading and deleting files;
assigning and changing file attributes (size, creation and modification time, file owner and creator, read-only, hidden file, temporary file, archive, executable, maximum length file name, etc.);
determining the file structure;
directory organization for logical organization of files;
file protection in case of system failure;
protecting files from unauthorized access and changing their contents.

Information recorded on a hard drive or any other medium is placed there on the basis of a cluster organization. A cluster is a kind of cell of a certain size into which the entire file or part of it fits.

If the file is cluster size, then it only occupies one cluster. If the file size exceeds the cell size, then it is placed in several cluster cells. Moreover, free clusters may not be located next to each other, but may be scattered over the physical surface of the disk. This system allows you to make the most efficient use of space when storing files. The task of the file system is to distribute the file when writing into free clusters in an optimal way, and also to assemble it when reading and give it to the program or operating system.

Types of file systems

In the process of the evolution of computers, storage media and operating systems, large number file systems. In the process of such evolutionary selection, today to work with hard drives and external storage devices (flash drives, memory cards, external hard drives, CDs), the following types of file systems are mainly used:

FAT32
ISO9660

The last two systems are designed to work with CDs. Ext3 and Ext4 file systems work with Linux-based operating systems. NFS Plus is a file system for OS X operating systems used on Apple computers.

The most widely used file systems are NTFS and FAT32, and this is not surprising, because... they are designed for Windows operating systems, which run the vast majority of computers in the world.

Now FAT32 is being actively replaced by the more advanced NTFS system due to its greater reliability in data safety and protection. In addition, the latest versions of Windows OS simply will not allow themselves to be installed if the hard drive partition is formatted in FAT32. The installer will ask you to format the partition to NTFS.

The NTFS file system supports disks with a capacity of hundreds of terabytes and a single file size of up to 16 terabytes.

The FAT32 file system supports disks up to 8 terabytes and a single file size up to 4GB. Most often, this FS is used on flash drives and memory cards. External drives are formatted in FAT32 at the factory.

However, the 4GB file size limitation is already a big disadvantage today, because... Due to the distribution of high-quality video, the file size of the movie will exceed this limit and it will not be possible to record it on the media.

Share.

General. In computer science theory, the following three main types of data structures are defined: linear, tabular, hierarchical. Example book: sequence of sheets - linear structure. Parts, sections, chapters, paragraphs - hierarchy. Table of contents – table – connects – hierarchical with linear. Structured data has new attribute- Address. So:

Linear structures (lists, vectors). Regular lists. The address of each element is uniquely determined by its number. If all elements of the list have equal length– data vector.

Tabular structures (tables, matrices). The difference between a table and a list - each element - is determined by an address, consisting of not one, but several parameters. The most common example is a matrix - address - two parameters - row number and column number. Multidimensional tables.

Hierarchical structures. Used to present irregular data. The address is determined by the route - from the top of the tree. File system - computer. (The route can exceed the data size, dichotomy - there are always two branches - left and right).

Ordering data structures. The main method is sorting. ! When adding a new element to an ordered structure, it is possible to change the address of existing ones. For hierarchical structures - indexing - each element has a unique number - which is then used in sorting and searching.

Basic elements of a file system

The historical first step in data storage and management was the use of file management systems.

A file is a named area of external memory that can be written to and read from. Three parameters:

sequence of an arbitrary number of bytes,

a unique proper name (actually an address).

data of the same type – file type.

The rules for naming files, how the data stored in a file is accessed, and the structure of that data depend on the particular file management system and possibly on the file type.

First, in modern understanding, an advanced file system was developed by IBM for its 360 series (1965-1966). But in current systems it is practically not used. Used list data structures (EC-volume, section, file).

Most of you are familiar with the file systems of modern operating systems. This is primarily MS DOS, Windows, and some with file system construction for various UNIX variants.

File structure. A file represents a collection of data blocks located on external media. To exchange with a magnetic disk at the hardware level, you need to specify the cylinder number, surface number, block number on the corresponding track and the number of bytes that need to be written or read from the beginning of this block. Therefore, all file systems explicitly or implicitly allocate some basic level that ensures work with files that represent a set of directly addressable blocks in the address space.

Naming files. All modern file systems support multi-level file naming by maintaining in external memory additional files with a special structure - directories. Each directory contains the names of the directories and/or files contained in that directory. Thus, the full name of a file consists of a list of directory names plus the name of the file in the directory immediately containing the file. The difference between the way files are named on different file systems is where the chain of names begins. (Unix, DOS-Windows)

File protection. File management systems must provide authorization for access to files. In general, the approach is that in relation to each registered user of a given computing system for each existing file, actions that are allowed or prohibited for this user are indicated. There have been attempts to implement this approach in full. But this caused too much overhead both in storing redundant information and in using this information to control access eligibility. Therefore, most modern file management systems use the file protection approach first implemented in UNIX (1974). In this system, each registered user is associated with a pair of integer identifiers: the identifier of the group to which this user belongs, and his own identifier in the group. Accordingly, for each file, the full identifier of the user who created this file is stored, and it is noted what actions he himself can perform with the file, what actions with the file are available to other users of the same group, and what users of other groups can do with the file. This information is very compact, requires few steps during verification, and this method of access control is satisfactory in most cases.

Multi-user access mode. If the operating system supports multi-user mode, it is quite possible for two or more users to simultaneously try to work with the same file. If all these users are only going to read the file, nothing bad will happen. But if at least one of them changes the file, mutual synchronization is required for this group to work correctly. Historically, file systems have taken the following approach. In the operation of opening a file (the first and mandatory operation with which a session of working with a file should begin), among other parameters, the operating mode (reading or changing) was indicated. + there are special procedures for synchronizing user actions. Not allowed by records!

Journaling in file systems. General principles.

Running a system check (fsck) on large file systems can take a long time, which is unfortunate given today's high-speed systems. The reason why there is no integrity in the file system may be incorrect unmounting, for example, the disk was being written to at the time of termination. Applications could update the data contained in files, and the system could update file system metadata, which is “data about file system data,” in other words, information about which blocks are associated with which files, which files are located in which directories, and the like. . Errors (lack of integrity) in data files are bad, but much worse are errors in file system metadata, which can lead to file loss and other serious problems.

To minimize integrity issues and minimize system restart time, a journaled file system maintains a list of changes it will make to the file system before actually writing the changes. These records are stored in a separate part of the file system called a "journal" or "log". Once these journal (log) entries are securely written, the journaling file system makes these changes to the file system and then deletes these entries from the “log” (log). Log entries are organized into sets of related file system changes, much like the way changes added to a database are organized into transactions.

A journaled file system increases the likelihood of integrity because log file entries are made before changes are made to the file system, and because the file system retains those entries until they are fully and securely applied to the file system. When rebooting a computer that uses a journaled file system, the mount program can ensure the integrity of the file system by simply checking the log file for changes that were expected but not made and writing them to the file system. In most cases, the system does not need to check the integrity of the file system, which means that a computer using a journaled file system will be available for use almost immediately after a reboot. Accordingly, the chances of data loss due to problems in the file system are significantly reduced.

The classic form of a journaled file system is to store changes in file system metadata in a journal (log) and store changes to all file system data, including changes to the files themselves.

File MS-DOS system(FAT)

The MS-DOS file system is a tree-based file system for small disks and simple directory structures, with the root being the root directory and the leaves being files and other directories, possibly empty. Files managed by this file system are placed in clusters, the size of which can range from 4 KB to 64 KB in multiples of 4, without using the adjacency property in a mixed way to allocate disk memory. For example, the figure shows three files. The File1.txt file is quite large: it involves three consecutive blocks. The small file File3.txt uses the space of only one allocated block. The third file is File2.txt. is a large fragmented file. In each case, the entry point points to the first allocable block owned by the file. If a file uses multiple allocated blocks, the previous block points to the next one in the chain. The value FFF is identified with the end of the sequence.

FAT disk partition

To access files efficiently, use file allocation table– File Allocation Table, which is located at the beginning of the section (or logical drive). It is from the name of the allocation table that the name of this file system – FAT – comes from. To protect the partition, two copies of the FAT are stored on it in case one of them becomes corrupted. In addition, file allocation tables must be placed at strictly fixed addresses so that the files necessary to start the system are located correctly.

The file allocation table consists of 16-bit elements and contains the following information about each logical disk cluster:

the cluster is not used;

the cluster is used by the file;

bad cluster;

last file cluster;.

Since each cluster must be assigned a unique 16-bit number, FAT therefore supports a maximum of 216, or 65,536 clusters on one logical disk (and also reserves some of the clusters for its own needs). Thus, we get the maximum disk size served by MS-DOS at 4 GB. The cluster size can be increased or decreased depending on the disk size. However, when the disk size exceeds a certain value, the clusters become too large, which leads to internal disk defragmentation. In addition to information about files, the file allocation table can also contain information about directories. This treats directories as special files with 32-byte entries for each file contained in that directory. The root directory has a fixed size of 512 entries for a hard disk, and for floppy disks this size is determined by the size of the floppy disk. Additionally, the root directory is located immediately after the second copy of the FAT because it contains the files needed by the MS-DOS boot loader.

When searching for a file on a disk, MS-DOS is forced to look through the directory structure to find it. For example, to run the executable file C:\Program\NC4\nc.exe finds the executable file by doing the following:

reads the root directory of the C: drive and looks for the Program directory in it;

reads the initial cluster Program and looks in this directory for an entry about the NC4 subdirectory;

reads the initial cluster of the NC4 subdirectory and looks for an entry for the nc.exe file in it;

reads all clusters of the nc.exe file.

This search method is not the fastest among current file systems. Moreover, the greater the depth of the directories, the slower the search will be. To speed up the search operation, you should maintain a balanced file structure.

Advantages of FAT

Is best choice for small logical drives, because starts with minimal overhead. On disks whose size does not exceed 500 MB, it works with acceptable performance.

Disadvantages of FAT

Since the size of a file entry is limited to 32 bytes, and the information must include the file size, date, attributes, etc., the size of the file name is also limited and cannot exceed 8+3 characters for each file. The use of so-called short file names makes FAT less attractive to use than other file systems.

Using FAT on disks larger than 500 MB is irrational due to disk defragmentation.

The FAT file system does not have any security features and supports minimal information security capabilities.

The speed of operations in FAT is inversely proportional to the depth of directory nesting and disk space.

UNIX file system - systems (ext3)

The modern, powerful and free Linux operating system provides a wide area for the development of modern systems and custom software. Some of the most exciting developments in recent Linux kernels are new, high-performance technologies for managing the storage, placement, and updating of data on disk. One of the most interesting mechanisms is the ext3 file system, which is integrated into Linux kernel starting from version 2.4.16, and is already available by default in Linux distributions from Red Hat and SuSE.

The ext3 file system is a journaling file system that is 100% compatible with all utilities designed to create, manage, and fine tuning ext2 file system, which has been used in Linux systems for the last few years. Before describing in detail the differences between the ext2 and ext3 file systems, let us clarify the terminology of file systems and file storage.

At the system level, all data on a computer exists as blocks of data on some storage device, organized using special data structures into partitions (logical sets on a storage device), which in turn are organized into files, directories and unused (free) space.

File systems are created on disk partitions to simplify the storage and organization of data in the form of files and directories. Linux, like the Unix system, uses a hierarchical file system made up of files and directories, which respectively contain either files or directories. Files and directories in a Linux file system are made available to the user by mounting them (the "mount" command), which is usually part of the system boot process. The list of file systems available for use is stored in the /etc/fstab file (FileSystem TABle). The list of file systems not currently mounted by the system is stored in the /etc/mtab (Mount TABle) file.

When a filesystem is mounted during boot, a bit in the header (the "clean bit") is cleared, indicating that the filesystem is in use, and that the data structures used to control the placement and organization of files and directories within that filesystem subject to change.

A file system is considered complete if all data blocks in it are either in use or free; each allocated data block is occupied by one and only one file or directory; all files and directories can be accessed after processing a series of other directories in the file system. When a Linux system is deliberately shut down using operator commands, all file systems are unmounted. Unmounting a file system during shutdown sets a "clean bit" in the file system header, indicating that the file system was properly unmounted and can therefore be considered intact.

Years of debugging and reworking of the file system and the use of improved algorithms for writing data to disk have greatly reduced data corruption caused by applications or the Linux kernel itself, but eliminating corruption and data loss due to power outages and other systemic problems is still a challenging task. In case of emergency stop or simple shutdown Linux systems Without using standard shutdown procedures, the "clean bit" in the file system header is not set. The next time the system boots, the mount process detects that the system is not marked as "clean" and physically checks its integrity using the Linux/Unix file system check utility "fsck" ( File System CheckK).

There are several journaling file systems available for Linux. The most famous of them are: XFS, a journaling file system developed by Silicon Graphics, but now released as open source; RaiserFS, a journaling file system designed specifically for Linux; JFS, a journaling file system originally developed by IBM but now released as open source; ext3 is a file system developed by Dr. Stephan Tweedie at Red Hat, and several other systems.

The ext3 file system is a journaled Linux version of the ext2 file system. The ext3 file system has one significant advantage over other journaling file systems - it is fully compatible with the ext2 file system. This makes it possible to use all existing applications designed to manipulate and customize the ext2 file system.

The ext3 filesystem is supported by Linux kernels version 2.4.16 and later, and must be enabled using the Filesystems Configuration dialog when building the kernel. Linux distributions such as Red Hat 7.2 and SuSE 7.3 already include native support for the ext3 file system. You can only use the ext3 filesystem if ext3 support is built into your kernel and you have the latest versions of the "mount" and "e2fsprogs" utilities.

In most cases, converting file systems from one format to another entails backing up all contained data, reformatting the partitions or logical volumes containing the file system, and then restoring all data to that file system. Due to the compatibility of the ext2 and ext3 file systems, all these steps do not need to be carried out, and the translation can be done using a single command (run with root privileges):

# /sbin/tune2fs -j<имя-раздела >

For example, converting an ext2 file system located on the /dev/hda5 partition to an ext3 file system can be done using the following command:

# /sbin/tune2fs -j /dev/hda5

The "-j" option to the "tune2fs" command creates an ext3 journal on an existing ext2 filesystem. After converting the ext2 file system to ext3, you must also make changes to the /etc/fstab file entries to indicate that the partition is now an "ext3" file system. You can also use auto detection of the partition type (the “auto” option), but it is still recommended to explicitly specify the file system type. The following example /etc/fstab file shows the changes before and after a file system transfer for the /dev/hda5 partition:

/dev/ hda5 /opt ext2 defaults 1 2

/dev/ hda5 /opt ext3 defaults 1 0

The last field in /etc/fstab specifies the step in the boot process during which the integrity of the file system should be checked using the "fsck" utility. When using ext3 file system, you can set this value to "0" as shown in the previous example. This means that the "fsck" program will never check the integrity of the filesystem, due to the fact that the integrity of the filesystem is guaranteed by rolling back the journal.

Converting the root file system to ext3 requires a special approach, and is best done in single user mode after creating a RAM disk that supports the ext3 file system.

In addition to compatibility with ext2 file system utilities and simple translation file system from ext2 to ext3, the ext3 file system also offers several different types of journaling.

The ext3 file system supports three different journaling modes that can be activated from the /etc/fstab file. These logging modes are as follows:

Journal / journal – records all changes to file system data and metadata. The slowest of all three logging modes. This mode minimizes the chance of losing file changes you make to the file system.

Sequential/ordered – Writes changes to filesystem metadata only, but writes file data updates to disk before changes to associated filesystem metadata. This ext3 logging mode is installed by default.

Writeback - only changes to file system metadata are written, based on standard process recording changes to these files. This is the most quick method logging.

The differences between these logging modes are both subtle and profound. Using journal mode requires the ext3 file system to write every change to the file system twice - first to the journal and then to the file system itself. This can reduce the overall performance of your file system, but this mode is most loved by users because it minimizes the chance of losing data changes to your files, since both meta data changes and file data changes are written to the ext3 log and can be repeated when the system is rebooted.

Using the "sequential" mode, only changes to file system metadata are recorded, which reduces the redundancy between writing to the file system and to the journal, which is why this method is faster. Although changes to file data are not written to the journal, they must be made before changes to the associated filesystem metadata are made by the ext3 journaling daemon, which may slightly reduce the performance of your system. Using this journaling method ensures that files on the file system are never out of sync with the associated file system metadata.

The writeback method is faster than the other two journaling methods because it only stores changes to file system metadata and does not wait for the file's associated data to change when it is written (before updating things like file size and directory information). Since file data is updated asynchronously with respect to journaled changes to the file system's metadata, files in the file system may show errors in the metadata, for example, an error in indicating the owner of data blocks (the update of which was not completed by the time the system was rebooted). This is not fatal, but may interfere with the user's experience.

Specifying the journaling mode used on an ext3 file system is done in the /etc/fstab file for that file system. "Sequential" mode is the default, but you can specify different logging modes by changing the options for the desired partition in the /etc/fstab file. For example, an entry in /etc/fstab indicating use of the writeback logging mode would look like this:

/dev/hda5 /opt ext3 data=writeback 1 0

File system Windows family NT (NTFS)

Physical structure of NTFS

Let's start with general facts. An NTFS partition, in theory, can be almost any size. Of course, there is a limit, but I won’t even indicate it, since it will be sufficient for the next hundred years of development of computer technology - at any growth rate. How does this work in practice? Almost the same. Maximum NTFS partition size in at the moment limited only by the size of the hard drives. NT4, however, will experience problems when trying to install on a partition if any part of it is more than 8 GB from the physical beginning of the disk, but this problem only affects the boot partition.

Lyrical digression. The method of installing NT4.0 on an empty disk is quite original and can lead to the wrong thoughts about the capabilities of NTFS. If you tell the installer that you want to format the drive to NTFS, maximum size, which she will offer you, will be only 4 GB. Why so small if the size of an NTFS partition is actually practically unlimited? The fact is that the installation section simply does not know this file system :) The installation program formats this disk into a regular FAT, the maximum size of which in NT is 4 GB (using a not quite standard huge 64 KB cluster), and NT installs on this FAT . But already during the first boot of the operating system itself (still in the installation phase), the partition is quickly converted to NTFS; so the user does not notice anything except the strange “limitation” on the NTFS size during installation. :)

Section structure - general view

Like any other system, NTFS divides all useful space into clusters - blocks of data used at a time. NTFS supports almost any cluster size - from 512 bytes to 64 KB, while a 4 KB cluster is considered a certain standard. NTFS does not have any anomalies in the cluster structure, so there is not much to say on this, in general, rather banal topic.

An NTFS disk is conventionally divided into two parts. The first 12% of the disk is allocated to the so-called MFT zone - the space into which the MFT metafile grows (more on this below). It is not possible to write any data to this area. The MFT zone is always kept empty - this is done so that the most important service file (MFT) does not become fragmented as it grows. The remaining 88% of the disk is normal file storage space.

Free disk space, however, includes all physically free space - unfilled pieces of the MFT zone are also included there. The mechanism for using the MFT zone is as follows: when files can no longer be written to regular space, the MFT zone is simply reduced (in current versions operating systems exactly twice), thus freeing up space for writing files. When space is freed up in the regular MFT area, the area may expand again. At the same time, it is possible that ordinary files remain in this zone: there is no anomaly here. Well, the system tried to keep her free, but nothing worked. Life goes on... The MFT metafile may still become fragmented, although this would be undesirable.

MFT and its structure

The NTFS file system is an outstanding achievement of structuring: every element of the system is a file - even service information. The most important file on NTFS is called MFT, or Master File Table - a general table of files. It is located in the MFT zone and is a centralized directory of all other disk files, and, paradoxically, itself. The MFT is divided into records of a fixed size (usually 1 KB), and each record corresponds to a file (in in a general sense this word). The first 16 files are of a service nature and are inaccessible to the operating system - they are called metafiles, with the very first metafile being the MFT itself. These first 16 MFT elements are the only part of the disk that has a fixed position. Interestingly, the second copy of the first three records, for reliability (they are very important), is stored exactly in the middle of the disk. The rest of the MFT file can be located, like any other file, in arbitrary places on the disk - you can restore its position using the file itself, “hooking” on the very basis - the first MFT element.

Metafiles

First 16 NTFS files(metafiles) are of a service nature. Each of them is responsible for some aspect of the system's operation. The advantage of such a modular approach is its amazing flexibility - for example, on FAT, physical damage in the FAT area itself is fatal to the functioning of the entire disk, and NTFS can shift, even fragment across the disk, all of its service areas, bypassing any surface faults - except for the first 16 MFT elements.

Metafiles are located in the root directory of an NTFS disk - they begin with the name symbol "$", although it is difficult to obtain any information about them using standard means. It is curious that for these files it is indicated completely actual size- You can find out, for example, how much the operating system spends cataloging your entire disk by looking at the size of the $MFT file. The following table shows the currently used metafiles and their purpose.


	a copy of the first 16 MFT records placed in the middle of the disk
	logging support file (see below)
	service information - volume label, file system version, etc.
	list of standard file attributes on the volume
	root directory
	volume free space map
	boot sector (if the partition is bootable)
	file that records user permissions to use disk space(only started working in NT5)
	file - a table of correspondence between uppercase and lowercase letters in file names on the current volume. It is needed mainly because in NTFS file names are written in Unicode, which amounts to 65 thousand different characters, searching for large and small equivalents of which is very non-trivial.

Files and streams

So, the system has files - and nothing but files. What does this concept include on NTFS?

First of all, a mandatory element is recording in MFT, because, as mentioned earlier, all disk files are mentioned in MFT. All information about the file is stored in this place, with the exception of the data itself. File name, size, location on disk of individual fragments, etc. If one MFT record is not enough for information, then several are used, and not necessarily in a row.

Optional element - file data streams. The definition of “optional” may seem strange, but, nevertheless, there is nothing strange here. Firstly, the file may not have data - in this case, it does not consume the free space of the disk itself. Secondly, the file may not be very large. Then a rather successful solution comes into play: the file data is stored directly in the MFT, in the space remaining from the main data within one MFT record. Files that occupy hundreds of bytes usually do not have their “physical” embodiment in the main file area - all the data of such a file is stored in one place - in the MFT.

The situation with the file data is quite interesting. Each file on NTFS, in general, has a somewhat abstract structure - it does not have data as such, but there are streams. One of the streams has the meaning we are familiar with - file data. But most file attributes are also streams! Thus, it turns out that the file has only one basic entity - the number in MFT, and everything else is optional. This abstraction can be used to create quite convenient things - for example, you can “attach” another stream to a file by writing any data into it - for example, information about the author and contents of the file, as is done in Windows 2000 (the rightmost tab in the file properties, viewed from Explorer). Interestingly, these additional streams are not visible by standard means: the observed file size is only the size of the main stream that contains the traditional data. You can, for example, have a file of zero length, which, when erased, will free up 1 GB of free space - simply because some cunning program or technology has stuck an additional gigabyte-sized stream (alternative data) in it. But in fact, at the moment, threads are practically not used, so one should not be afraid of such situations, although hypothetically they are possible. Just keep in mind that a file on NTFS is a deeper and more global concept than one might imagine by simply browsing the disk's directories. And finally: the file name can contain any characters, including the entire set of national alphabets, since the data is presented in Unicode - a 16-bit representation that gives 65535 different characters. The maximum file name length is 255 characters.

Catalogs

An NTFS directory is a specific file that stores links to other files and directories, creating a hierarchical structure of data on the disk. The catalog file is divided into blocks, each containing the file name, basic attributes, and a link to the MFT element that already provides full information about a directory item. The internal directory structure is a binary tree. Here's what this means: to find a file with a given name in a linear directory, such as a FAT, the operating system has to look through all the elements of the directory until it finds the right one. A binary tree arranges file names in such a way that searching for a file is carried out in a faster way - by obtaining two-digit answers to questions about the location of the file. The question that a binary tree can answer is: in which group, relative to a given element, is the name you are looking for - above or below? We start with such a question to the middle element, and each answer narrows the search area by an average of two times. The files are, say, simply sorted alphabetically, and the question is answered in the obvious way - by comparing the initial letters. The search area, narrowed by half, begins to be explored in a similar way, starting again from the middle element.

Conclusion - to search for one file among 1000, for example, FAT will have to make an average of 500 comparisons (it is most likely that the file will be found in the middle of the search), and a tree-based system will only have to make about 10 (2^10 = 1024). Search time savings are obvious. However, you should not think that in traditional systems (FAT) everything is so neglected: firstly, maintaining a list of files in the form of a binary tree is quite labor-intensive, and secondly, even FAT performed by a modern system (Windows2000 or Windows98) uses similar optimization search. This is just another fact to add to your knowledge base. I would also like to dispel the common misconception (which I myself shared quite recently) that adding a file to a directory in the form of a tree is more difficult than to a linear directory: these are quite comparable operations in time - the fact is that in order to add a file to the directory, you first need to make sure that a file with that name is not there yet :) - and here in a linear system we will have the difficulties with finding a file, described above, which more than compensate for the very simplicity of adding a file to the directory.

What information can be obtained by simply reading a catalog file? Exactly what the dir command produces. To perform simple disk navigation, you don’t need to go into MFT for each file, you just need to read the most general information about files from directory files. The main directory of the disk - the root - is no different from ordinary directories, except for a special link to it from the beginning of the MFT metafile.

Logging

NTFS is a fault-tolerant system that can easily restore itself to a correct state in the event of almost any real failure. Any modern file system is based on the concept of a transaction - an action performed entirely and correctly or not performed at all. NTFS simply does not have intermediate (erroneous or incorrect) states - the quantum of data change cannot be divided into before and after the failure, bringing destruction and confusion - it is either committed or canceled.

Example 1: data is being written to disk. Suddenly it turns out that it was not possible to write to the place where we had just decided to write the next piece of data - physical damage to the surface. The behavior of NTFS in this case is quite logical: the write transaction is rolled back entirely - the system realizes that the write was not performed. The location is marked as failed, and the data is written to another location - a new transaction begins.

Example 2: a more complex case - data is being written to disk. Suddenly, bang - the power is turned off and the system reboots. At what phase did the recording stop, where is the data, and where is nonsense? Another system mechanism comes to the rescue - the transaction log. The fact is that the system, realizing its desire to write to disk, marked this state in the $LogFile metafile. When rebooting, this file is examined for the presence of unfinished transactions that were interrupted by an accident and the result of which is unpredictable - all these transactions are canceled: the place where the write was made is marked again as free, indexes and MFT elements are returned to the state in which they were before failure, and the system as a whole remains stable. Well, what if an error occurred while writing to the log? It’s also okay: the transaction either hasn’t started yet (there is only an attempt to record the intentions to carry it out), or it has already ended - that is, there is an attempt to record that the transaction has actually already been completed. In the latter case, at the next boot, the system itself will fully understand that in fact everything was written correctly anyway, and will not pay attention to the “unfinished” transaction.

Still, remember that logging is not an absolute panacea, but only a means to significantly reduce the number of errors and system failures. It is unlikely that the average NTFS user will ever notice a system error or be forced to run chkdsk - experience shows that NTFS is restored to a completely correct state even in case of failures at moments very busy with disk activity. You can even optimize the disk and press reset in the middle of this process - the likelihood of data loss even in this case will be very low. It is important to understand, however, that the system NTFS recovery guarantees the correctness of the file system, not your data. If you were writing to a disk and got a crash, your data may not be written. There are no miracles.

NTFS files have one quite useful attribute - "compressed". The fact is that NTFS has built-in support for disk compression - something for which you previously had to use Stacker or DoubleSpace. Any file or directory can be individually stored on disk in compressed form - this process is completely transparent to applications. File compression has a very high speed and only one big negative property - the huge virtual fragmentation of compressed files, which, however, does not really bother anyone. Compression is carried out in blocks of 16 clusters and uses so-called “virtual clusters” - again an extremely flexible solution that allows you to achieve interesting effects- for example, half the file can be compressed, but half cannot. This is achieved due to the fact that storing information about the compression of certain fragments is very similar to regular file fragmentation: for example, a typical record of the physical layout for a real, uncompressed file:

file clusters from 1 to 43 are stored in disk clusters starting from 400, file clusters from 44 to 52 are stored in disk clusters starting from 8530...

Physical layout of a typical compressed file:

file clusters from 1 to 9 are stored in disk clusters starting from 400 file clusters from 10 to 16 are not stored anywhere file clusters from 17 to 18 are stored in disk clusters starting from 409 file clusters from 19 to The 36th is not stored anywhere....

It is clear that compressed file has “virtual” clusters, in which there is no real information. As soon as the system sees such virtual clusters, it immediately understands that the data from the previous block, a multiple of 16, must be decompressed, and the resulting data will just fill the virtual clusters - that, in fact, is the whole algorithm.

Safety

NTFS contains many means of delineating the rights of objects - it is believed that this is the most advanced file system of all currently existing. In theory, this is undoubtedly true, but in current implementations, unfortunately, the system of rights is quite far from ideal and, although rigid, is not always a logical set of characteristics. The rights assigned to any object and clearly respected by the system are evolving - major changes and additions to rights have been made several times already, and by Windows 2000 they have finally arrived at a fairly reasonable set.

The rights of the NTFS file system are inextricably linked with the system itself - that is, they, generally speaking, are not required to be respected by another system if it is given physical access to the disk. To prevent physical access, Windows 2000 (NT5) still introduced a standard feature - see below for more on this. The system of rights in its current state is quite complex, and I doubt that I can tell the general reader anything interesting and useful to him in ordinary life. If you are interested in this topic, you will find many books on NT network architecture that describe this in more detail.

At this point the description of the structure of the file system can be completed; it remains to describe only a certain number of simply practical or original things.

This thing has been in NTFS since time immemorial, but was used very rarely - and yet: Hard Link is when the same file has two names (several file-directory pointers or different directories point to the same MFT record) . Let's say the same file has the names 1.txt and 2.txt: if the user deletes file 1, file 2 will remain. If he deletes 2, file 1 will remain, that is, both names, from the moment of creation, are completely equal. The file is physically erased only when its last name is deleted.

Symbolic Links (NT5)

A much more practical feature that allows you to create virtual directories - exactly the same as virtual disks using the subst command in DOS. The applications are quite varied: firstly, simplifying the catalog system. If you don't like the Documents and settings\Administrator\Documents directory, you can link it to the root directory - the system will still communicate with the directory with a wild path, and you will have a much shorter name that is completely equivalent to it. To create such connections, you can use the junction program (junction.zip(15 Kb), 36 kb), written by the famous specialist Mark Russinovich (http://www.sysinternals.com). The program only works in NT5 (Windows 2000), as does the feature itself. To remove a connection, you can use the standard rd command. WARNING: Attempting to delete a link using Explorer or other file managers that do not understand the virtual nature of a directory (such as FAR) will delete the data referenced by the link! Be careful.

Encryption (NT5)

A useful feature for people who are worried about their secrets - each file or directory can also be encrypted, making it impossible for another NT installation to read it. Combined with a standard and virtually unbreakable password for booting the system itself, this feature provides sufficient security for most applications for the important data you select.

File system is a part of the operating system whose purpose is to organize effective work with data stored in external memory and provide the user with a convenient interface when working with such data. Organizing information storage on a magnetic disk is not easy. This requires, for example, a good knowledge of the disk controller design and the features of working with its registers. Direct interaction with the disk is the prerogative of a component of the OS input/output system called the disk driver. In order to free the computer user from the complexities of interacting with the hardware, a clear, abstract model of the file system was invented. File write or read operations are conceptually simpler than low-level device operations.

Let's list main functions file system.

1. File identification. Associating a file name with the external memory space allocated to it.

2. Distribution of external memory between files. To work with a specific file, the user does not need to have information about the location of this file on an external storage medium. For example, in order to load a document into the editor from a hard drive, we do not need to know which side of which magnetic disk, on which cylinder and in which sector this document is located.

3. Ensuring reliability and fault tolerance. The cost of information can be many times higher than the cost of a computer.

4. Ensuring protection from unauthorized access.

5. Providing sharing to files, so that the user does not have to make special efforts to ensure synchronized access.

6. Ensuring high performance.

It is sometimes said that a file is a named collection related information, recorded in secondary memory. For most users, the file system is the most visible part OS. It provides a mechanism for online storage and access to both data and programs for all users of the system. From the user's point of view, a file is a unit of external memory, that is, the data written to the disk must be part of some file.

37. The simplest table volume table of contents and its elements

The file system includes table of contents And data area – a collection of blocks on a disk, identified by their numbers/addresses. An example of the simplest (abstract) table of contents, table of contents of a volume (disk, package of disks), which has different names in different operating systems - VTOC - Volume Table of Content, FAT - File Allocation Table, FDT - File Definition Table, etc., is shown in Fig. 1.

Rice. 1. The simplest volume table of contents

It consists of three areas:

· file area. This is a table that usually has a limited (in the example N=6) number of lines N(in MS-DOS, for example, N=500, i.e. number of files no more than 500). Number of columns M(in the example M= 5) is usually chosen so that 85 -95% of the files created by the user would contain no more than M blocks, which depends both on the size of the block and the type of user, and on the general level of development of information and software. First table column in each row (Title Record) contains data about the file, in this example – the file name;

· overflow area- an additional table of a similar structure, in which the block numbers of particularly long files are recorded (in the example - File_l). Organizing the allocation table in the form of a file area and an overflow area obviously allows saving on the size of the table as a whole, without at the same time limiting the likely length of the file;

· list of free blocks- necessary information for placing created or expanded files. The list is created during initialization and includes all blocks except damaged ones, and then is adjusted when files are created, deleted, or modified;

· list of bad blocks. This is a table created during initialization (partitioning) of a volume (disk), replenished by diagnostic programs (an example of which is NDD - Norton Disk Doctor, well known to users) and prevents the distribution of damaged areas on a magnetic medium to data files.

Let us list the features of the situation recorded in Fig. 1. in the simplest (artificial) file system.

File_l occupies 6 blocks, this number is greater than the maximum, so the address of block No. 6 (23) is placed in the overflow table;

File_2 occupies 2 blocks, which is less than the limit, so all information is concentrated in the file area.

There are the following conflict situations:

· File_3 does not contain a single block (hence, the file was deleted, but the header record was preserved);

· File_4 and File_l refer to block #3. This is an error because each block must be assigned to a single file;

· the list of free blocks contains block numbers No. 12 (marked as bad) and No. 13 (allocated under File_1).

38. Logical structure of disk partitions using the example of IBM- and MS-compatible file systems

Logical drives D and E

The maximum number of primary partitions is 4. The active partition is where the system boot loader is located.

MBR- code and data necessary for the subsequent loading of the operating system and located in the first physical sectors (most often in the very first) on a hard drive or other information storage device.

An extended section entry is called SMBR (Secondary Master Boot Record)). The difference with this entry is that it does not have a bootloader, and the partition table consists of two entries: a primary partition and an extended partition.

39. FAT file system. FAT volume structure

40. NTFS file system. NTFS volume structure

41. Windows OS Registry

42. Operating systems of the Windows NT family

43. Some architectural Windows modules NT

44. Managing tough disks in Windows NT

45. Projective operating systems, their principles, advantages, disadvantages

46. Procedural operating systems, their principles, advantages, disadvantages

47. History of development and ideology of building the Unix OS

48. Unix OS structure

49. Unix User Interfaces

50. Dispatching processes (tasks) in Unix

51. Linux OS and its main advantages

52. Implementation of graphic mode in Linux OS

53. Basic principles of working in Linux OS

54. Basic Linux OS configuration files

55. Working with disk drives in Linux OS

56. Applications for Linux OS

Material for review lecture No. 33

for specialty students

"Information Technology Software"

Associate Professor of the Department of Computer Science, Ph.D. Livak E.N.

FILE MANAGEMENT SYSTEMS

Basic concepts, facts

Purpose. Features of file systemsFATVFATFAT 32,HPFSNTFS. File systems UNIX OS (s5, ufs), Linux OS Ext2FS. System areas of the disk (partition, volume). Principles of file placement and storage of file location information. Organization of catalogs. Restricting access to files and directories.

Skills and abilities

Using knowledge of the file system structure to protect and restore computer information (files and directories). Organization of access control to files.

File systems. File system structure

Data on disk is stored in the form of files. A file is a named part of a disk.

File management systems are designed to manage files.

The ability to deal with data stored in files at the logical level is provided by the file system. It is the file system that determines the way data is organized on any storage medium.

Thus, file system is a set of specifications and their corresponding software that are responsible for creating, destroying, organizing, reading, writing, modifying and moving file information, as well as for controlling access to files and managing the resources that are used by files.

The file management system is the main subsystem in the vast majority of modern operating systems.

Using a file management system

· all system processing programs are connected using data;

· problems of centralized distribution of disk space and data management are solved;

· the user is provided with opportunities to perform operations on files (creation, etc.), to exchange data between files and various devices, to protect files from unauthorized access.

Some operating systems may have multiple file management systems, giving them the ability to handle multiple file systems.

Let's try to distinguish between a file system and a file management system.

The term "file system" defines the principles of access to data organized in files.

Term "file management system" refers to a specific implementation of the file system, i.e. This is a set of software modules that provide work with files in a specific OS.

So, to work with files organized in accordance with some file system, an appropriate file management system must be developed for each OS. This UV system will only work on the OS for which it is designed.

For the Windows OS family, the main file systems used are: VFAT, FAT 32, NTFS.

Let's look at the structure of these file systems.

On the file system FAT The disk space of any logical drive is divided into two areas:

system area and

· data area.

System area created and initialized during formatting, and subsequently updated when the file structure is manipulated.

The system area consists of the following components:

· boot sector containing the boot record (boot record);

· reserved sectors (they may not exist);

· file allocation tables (FAT, File Allocation Table);

· root directory (ROOT).

These components are located on the disk one after another.

Data area contains files and directories subordinate to the root one.

The data area is divided into so-called clusters. A cluster is one or more adjacent sectors of a data area. On the other hand, a cluster is the minimum addressable unit of disk memory allocated to a file. Those. a file or directory occupies an integer number of clusters. To create and write a new file to disk, the operating system allocates several free disk clusters for it. These clusters do not have to follow each other. For each file, a list of all cluster numbers that are assigned to that file is stored.

Dividing the data area into clusters instead of using sectors allows you to:

· reduce the size of the FAT table;

· reduce file fragmentation;

· the length of file chains is reduced Þ speeds up file access.

However, too large a cluster size leads to inefficient use of the data area, especially in the case of a large number of small files (after all, on average half a cluster is lost for each file).

In modern file systems (FAT 32, HPFS, NTFS) this problem is solved by limiting the cluster size (maximum 4 KB)

The data area map is T file allocation table (File Allocation Table - FAT) Each element of the FAT table (12, 16 or 32 bits) corresponds to one disk cluster and characterizes its state: free, busy or a bad cluster.

· If a cluster is allocated to a file (i.e., busy), then the corresponding FAT element contains the number of the next cluster of the file;

· the last cluster of the file is marked with a number in the range FF8h - FFFh (FFF8h - FFFFh);

· if the cluster is free, it contains the zero value 000h (0000h);

· a cluster that is unusable (failed) is marked with the number FF7h (FFF7h).

Thus, in the FAT table, clusters belonging to the same file are linked into chains.

The file allocation table is stored immediately after the boot record of the logical disk; its exact location is described in a special field in the boot sector.

It is stored in two identical copies, which follow each other. If the first copy of the table is destroyed, the second one is used.

Due to the fact that FAT is used very intensively during disk access, it is usually loaded into the RAM (into I/O buffers or cache) and remains there for as long as possible.

The main disadvantage of FAT is slow work with files. When creating a file, the rule is that the first free cluster is allocated. This leads to disk fragmentation and complex file chains. This results in slower work with files.

To view and edit the FAT table you can use utilityDiskEditor.

Detailed information about the file itself is stored in another structure called the root directory. Each logical drive has its own root directory (ROOT).

Root directory describes files and other directories. A directory element is a file descriptor.

Each file and directory descriptor includes it

· Name

· extension

date of creation or last modification

· time of creation or last modification

attributes (archive, directory attribute, volume attribute, system, hidden, read-only)

· file length (for a directory - 0)

· reserved field that is not used

· number of the first cluster in the chain of clusters allocated to a file or directory; Having received this number, the operating system, referring to the FAT table, finds out all the other cluster numbers of the file.

So, the user launches the file for execution. The operating system looks for a file with the desired name by looking at the descriptions of the files in the current directory. When the required element is found in the current directory, the operating system reads the number of the first cluster this file, and then uses the FAT table to determine the remaining cluster numbers. Data from these clusters is read into RAM, combining into one continuous section. The operating system transfers control to the file, and the program begins to run.

To view and edit the root directory ROOT you can also use utilityDiskEditor.

File system VFAT

The VFAT (virtual FAT) file system first appeared in Windows for Workgroups 3.11 and was designed for protected mode file I/O.

This file system is used in Windows 95.

It is also supported in Windows NT 4.

VFAT is the native 32-bit file system of Windows 95. It is controlled by the VFAT .VXD driver.

VFAT uses 32-bit code for all file operations and can use 32-bit protected mode drivers.

BUT, the file allocation table entries remain 12- or 16-bit, so the disk uses the same data structure (FAT). Those. f table formatVFAT is the same, like the FAT format.

VFAT along with "8.3" names supports long file names. (VFAT is often said to be FAT with support for long names).

The main disadvantage of VFAT is the large clustering losses with large logical disk sizes and restrictions on the size of the logical disk itself.

File system FAT 32

This is a new implementation of the idea of using the FAT table.

FAT 32 is a completely self-contained 32-bit file system.

First used in Windows OSR 2 (OEM Service Release 2).

Currently, FAT 32 is used in Windows 98 and Windows ME.

It contains numerous improvements and additions over previous FAT implementations.

1. Uses disk space much more efficiently due to the fact that it uses smaller clusters (4 KB) - it is estimated that savings of up to 15%.

2. Has an extended boot record that allows you to create copies of critical data structures Þ increases the disc's resistance to damage to disc structures

3. Can use FAT backup instead of standard one.

4. Can move the root directory, in other words, the root directory can be in any location Þ removes the limitation on the size of the root directory (512 elements, since ROOT was supposed to occupy one cluster).

5. Improved root directory structure

Additional fields have appeared, for example, creation time, creation date, last access date, checksum

There are still multiple handles for a long filename.

File system HPFS

HPFS (High Performance File System) is a high-performance file system.

HPFS first appeared in OS/2 1.2 and LAN Manager.

Let's list main features of HPFS.

· The main difference is basic principles placement of files on disk and principles of storing information about the location of files. Thanks to these principles, HPFS has high performance and fault tolerance, is reliable file system.

· Disk space in HPFS is allocated not in clusters (as in FAT), but blocks. In the modern implementation, the block size is taken equal to one sector, but in principle it could be of a different size. (In fact, a block is a cluster, only a cluster is always equal to one sector). Placing files in such small blocks allows use disk space more efficiently, since the overhead of free space is on average only (half a sector) 256 bytes per file. Remember that the larger the cluster size, the more space on disk is wasted.

· The HPFS system strives to arrange the file in contiguous blocks, or, if this is not possible, place it on the disk in such a way that extents(fragments) of the file were physically as close to each other as possible. This approach is essential reduces write/read head positioning time hard drive and wait time (delay between installing the read/write head on the desired track). Let us recall that in a FAT file the first free cluster is simply allocated.

Extents(extent) - file fragments located in adjacent sectors of the disk. A file has at least one extent if it is not fragmented, and multiple extents otherwise.

·Used method balanced binary trees for storing and searching information about the location of files (directories are stored in the center of the disk, in addition, automatic sorting of directories is provided), which is essential increases productivity HPFS (vs. FAT).

· HPFS provides special extended file attributes that allow control access to files and directories.

Extended Attributes (extended attributes, EAs ) allow you to store additional information about the file. For example, each file can be associated with a unique graphic (icon), file description, comment, file owner information, etc.

C HPFS partition structure

At the beginning of the partition with HPFS installed there are three block controls:

boot block

· additional block (super block) and

· spare (backup) block (spare block).

They occupy 18 sectors.

All remaining disk space in HPFS is divided into parts from adjacent sectors - stripes(band - strip, tape). Each strip takes up 8 MB of disk space.

Each strip has its own sector allocation bitmap.The bitmap shows which sectors of a given band are occupied and which are free. Each sector of a data strip corresponds to one bit in its bitmap. If bit = 1, then the sector is busy, if 0, then it is free.

The bitmaps of the two lanes are located side by side on the disk, as are the lanes themselves. That is, the sequence of stripes and cards looks like in Fig.

Compare withFAT. There is only one “bit map” (FAT table) for the entire disk. And to work with it you have to move the read/write heads across half the disk on average.

It is in order to reduce the time of positioning the read/write heads of a hard disk that in HPFS the disk is divided into stripes.

Let's consider control blocks.

Boot block (bootblock)

Contains the volume name, its serial number, BIOS parameter block and boot program.

The bootstrap program finds the file OS 2 LDR , reads it into memory and transfers control to this OS boot program, which, in turn, loads the OS/2 kernel from disk into memory - OS 2 KRNL. And already OS 2 KRIML using information from the file CONFIG. SYS loads all other necessary program modules and data blocks into memory.

The boot block is located in sectors 0 to 15.

SuperBlock(super block)

Contains

· pointer to a list of bitmaps (bitmap block list). This list lists all the blocks on the disk that contain the bitmaps used to detect free sectors;

· pointer to the list of defective blocks (bad block list). When the system detects a damaged block, it is added to this list and is no longer used to store information;

· pointer to directory band

· pointer to the file node (F -node) of the root directory,

· date of the last scan of the partition by CHKDSK;

· information about the stripe size (in the current HPFS implementation - 8 MB).

Super block is located in sector 16.

Spareblock(spare block)

Contains

· pointer to the emergency replacement map (hotfix map or hotfix -areas);

· pointer to the list of free spare blocks (directory emergency free block list);

· a number of system flags and descriptors.

This block is located in sector 17 of the disk.

The backup block provides high fault tolerance to the HPFS file system and allows you to recover damaged data on the disk.

File placement principle

Extents(extent) - file fragments located in adjacent sectors of the disk. A file has at least one extent if it is not fragmented, and multiple extents otherwise.

To reduce the time it takes to position the read/write heads of a hard disk, the HPFS system strives to

1) place the file in adjacent blocks;

2) if this is not possible, then place the extents of the fragmented file as close to each other as possible,

To do this, HPFS uses statistics and also tries to conditionally reserve at least 4 kilobytes of space at the end of files that are growing.

Principles for storing file location information

Each file and directory on the disk has its own file node F-Node. This is a structure that contains information about the location of a file and its extended attributes.

Each F-Node occupies one sector and is always located close to its file or directory (usually immediately before the file or directory). The F-Node object contains

· length,

· first 15 characters of the file name,

· special service information,

· statistics on file access,

· extended file attributes,

· a list of access rights (or only part of this list, if it is very large); If the extended attributes are too large for the file node, then a pointer to them is written to it.

· associative information about the location and subordination of the file, etc.

If the file is contiguous, then its location on disk is described by two 32-bit numbers. The first number is a pointer to the first block of the file, and the second is the extent length (the number of consecutive blocks that belong to the file).

If a file is fragmented, then the location of its extents is described in the file node by additional pairs of 32-bit numbers.

A file node can contain information about up to eight extents of a file. If a file has more extents, then a pointer to an allocation block is written to its file node, which can contain up to 40 pointers to extents or, similar to a directory tree block, to other allocation blocks.

Directory structure and placement

Used to store directories stripe located in the center of the disk.

This strip is called directoryband.

If it is completely full, HPFS starts placing file directories in other stripes.

Placing this information structure in the middle of the disk significantly reduces the average read/write head positioning time.

However, a significantly greater contribution to HPFS performance (compared to placing the Directory Band in the middle of a logical disk) is made by using method balanced binary trees for storing and retrieving file location information.

Recall that in the file system FAT the directory has a linear structure, not ordered in a special way, so when searching for a file you need to look through it sequentially from the very beginning.

In HPFS, the directory structure is a balanced tree with entries arranged in alphabetical order.

Each entry included in the tree contains

· file attributes,

· pointer to the corresponding file node,

information about the time and date of file creation, time and date of the last update and access,

length of data containing extended attributes,

· file access counter,

file name length

· the name itself,

· and other information.

The HPFS file system looks only at the necessary branches of the binary tree when searching for a file in a directory. This method is many times more efficient than sequentially reading all entries in a directory, which is the case with the FAT system.

The size of each block in terms of which directories are allocated in the current HPFS implementation is 2 KB. The size of the entry describing the file depends on the size of the file name. If a name is 13 bytes (for 8.3 format), then a 2 KB block can hold up to 40 file descriptors. Blocks are connected to each other using a list.

Problems

When renaming files, so-called tree rebalancing may occur. Creating a file, renaming or erasing it may result in cascading directory blocks. In fact, a rename may fail due to lack of disk space, even if the file itself has not grown in size. To avoid this disaster, HPFS maintains a small pool of free blocks that can be used in the event of a disaster. This operation may require allocating additional blocks on a full disk. A pointer to this pool of free blocks is stored in SpareBlock.

Principles for placing files and directories on disk inHPFS:

· information about the location of files is dispersed throughout the disk, with records for each specific file located (if possible) in adjacent sectors and close to the data about their location;

· directories are located in the middle of disk space;

· Directories are stored as a binary balanced tree with entries arranged in alphabetical order.

Reliability of data storage in HPFS

Any file system must have a means of correcting errors that occur when writing information to disk. The HPFS system uses for this emergency replacement mechanism ( hotfix).

If the HPFS file system encounters a problem while writing data to disk, it displays an error message. HPFS then stores the information that should have been written to the defective sector in one of the spare sectors reserved in advance for this eventuality. The list of free spare blocks is stored in the HPFS spare block. If an error is detected while writing data to a normal block, HPFS selects one of the free spare blocks and stores the data there. The file system then updates emergency replacement card in the reserve unit.

This map is simply pairs of double words, each of which is a 32-bit sector number.

The first number indicates the defective sector, and the second indicates the sector among the available spare sectors that was selected to replace it.

After replacing the defective sector with a spare one, the emergency replacement map is written to the disk, and a pop-up window appears on the screen informing the user that a disk write error has occurred. Every time the system writes or reads a disk sector, it looks at the recovery map and replaces all bad sector numbers with spare sector numbers with the corresponding data.

It should be noted that this number translation does not significantly affect system performance, since it is performed only when physically accessing the disk, and not when reading data from the disk cache.

File system NTFS

The NTFS (New Technology File System) file system contains a number of significant improvements and changes that significantly distinguish it from other file systems.

Note that with rare exceptions, with NTFS partitions can only be worked directly fromWindowsN.T. although there are corresponding implementations of file management systems for reading files from NTFS volumes for a number of OSes.

However, there are no full-fledged implementations for working with NTFS outside of Windows NT.

NTFS is not supported on the widely used Windows 98 and Windows Millennium Edition operating systems.

Main FeaturesNT FS

· work on large disks occurs efficiently (much more efficiently than in FAT);

· there are tools to restrict access to files and directories Þ NTFS partitions provide local security for both files and directories;

· a transaction mechanism has been introduced in which logging file operations Þ significant increase in reliability;

· many restrictions on the maximum number of disk sectors and/or clusters have been removed;

· a file name in NTFS, unlike the FAT and HPFS file systems, can contain any characters, including the full set of national alphabets, since the data is represented in Unicode - a 16-bit representation that gives 65535 different characters. The maximum length of a file name in NTFS is 255 characters.

· NTFS also has built-in compression capabilities that you can apply to individual files, entire directories, and even volumes (and subsequently undo or assign them as you wish).

Volume structure with the NTFS file system

An NTFS partition is called a volume (volume). The maximum possible volume size (and file size) is 16 EB (exabyte 2**64).

Like other systems, NTFS divides a volume's disk space into clusters—blocks of data that are addressed as data units. NTFS supports cluster sizes from 512 bytes to 64 KB; the standard is a cluster of 2 or 4 KB in size.

All disk space in NTFS is divided into two unequal parts.

The first 12% of the disk is allocated to the so-called MFT zone - space that can be occupied by the main service metafile MFT.

It is not possible to write any data to this area. The MFT zone is always kept empty - this is done so that the MFT file, if possible, does not become fragmented as it grows.

The remaining 88% of the volume is regular file storage space.

MFT(masterfiletable - general file table) is essentially a directory of all other files on the disk, including itself. It is designed to determine the location of files.

MFT consists of fixed size records. The MFT record size (minimum 1 KB and maximum 4 KB) is determined when the volume is formatted.

Each entry corresponds to a file.

The first 16 entries are of a service nature and are not available to the operating system - they are called metafiles, and the very first metafile is the MFT itself.

These first 16 MFT elements are the only part of the disk that has a strictly fixed position. A copy of these same 16 entries is kept in the middle of the volume for reliability.

The remaining parts of the MFT file can be located, like any other file, in arbitrary locations on the disk.

Metafiles are of a service nature - each of them is responsible for some aspect of the system's operation. Metafiles are located in the root directory of the NTFS volume. They all begin with the name symbol "$", although it is difficult to obtain any information about them using standard means. In table The main metafiles and their purpose are given.

Metafile name	Purpose of the metafile
$MFT	Master File Table itself
$MFTmirr	A copy of the first 16 MFT entries placed in the middle of the volume
$LogFile	Logging support file
$Volume	Service information - volume label, file system version, etc.
$AttrDef	List of standard file attributes on the volume
		Root directory
$Bitmap		Volume free space map
$Boot		Boot sector (if the partition is bootable)
$Quota		A file that records user rights to use disk space (this file only started working in Windows 2000 with NTFS 5.0)
$Upcase		File - a table of correspondence between uppercase and lowercase letters in file names. In NTFS, file names are written in Unicode (which amounts to 65 thousand different symbols) and looking for large and small equivalents in this case is a non-trivial task

The corresponding MFT record stores all information about the file:

· file name,

· size;

· file attributes;

· position on the disk of individual fragments, etc.

If one MFT record is not enough for the information, then several records are used, and not necessarily consecutive ones.

If the file is not very large, then the file data is stored directly in the MFT, in the space remaining from the main data within one MFT record.

A file on an NTFS volume is identified by the so-called file link(File Reference), which is represented as a 64-bit number.

· file number that corresponds to the record number in MFT,

· and sequence numbers. This number is incremented whenever a given number in the MFT is reused, allowing file NTFS system Perform internal integrity checks.

Each file in NTFS is represented by streams(streams), that is, it does not have “just data” as such, but there are streams.

One of the streams is the file data.

Most file attributes are also streams.

Thus, it turns out that the file has only one basic entity - the number in the MFT, and everything else, including its streams, is optional.

This approach can be used effectively - for example, you can “attach” another stream to a file by writing any data to it.

Standard attributes for files and directories on an NTFS volume have fixed names and type codes.

Catalog in NTFS is a special file that stores links to other files and directories.

The catalog file is divided into blocks, each containing

· file name,

basic attributes and

The root directory of the disk is no different from regular directories, except for a special link to it from the beginning of the MFT metafile.

The internal directory structure is a binary tree, similar to HPFS.

The number of files in the root and non-root directories is not limited.

The NTFS file system supports the NT security object model: NTFS treats directories and files as distinct types of objects and maintains separate (albeit overlapping) lists of permissions for each type.

NTFS provides file-level security; this means that access rights to volumes, directories and files may depend on account user and the groups to which he belongs. Every time a user accesses a file system object, his access rights are checked against the permission list of that object. If the user has sufficient rights, his request is granted; otherwise the request is rejected. This security model applies both to local user registration on NT computers and to remote network requests.

The NTFS system also has certain self-healing capabilities. NTFS supports various mechanisms for verifying system integrity, including transaction logging, which allows file write operations to be replayed against a special system log.

At logging file operations, the file management system records the changes that occur in a special service file. At the beginning of an operation related to changing the file structure, a corresponding note is made. If any failure occurs during file operations, the said operation start mark remains indicated as incomplete. When you perform a file system integrity check after rebooting the machine, these pending operations will be canceled and the files will be restored to their original state. If the operation of changing data in files is completed normally, then in this very service logging support file the operation is marked as completed.

The main disadvantage of the file systemNTFS- service data takes up a lot of space (for example, each directory element takes up 2 KB) - for small partitions, service data can occupy up to 25% of the media volume.

Þ NTFS cannot be used to format floppy disks. You should not use it to format partitions smaller than 100 MB.

OS file system UNIX

In the UNIX world, there are several different types of file systems with their own external memory structure. The most well-known are the traditional UNIX System V (s5) file system and the UNIX BSD family file system (ufs).

Consider s 5.

A file on a UNIX system is a collection of random access characters.

The file has a structure that is imposed on it by the user.

The Unix file system is a hierarchical, multi-user file system.

The file system has a tree structure. The vertices (intermediate nodes) of the tree are directories with links to other directories or files. The leaves of the tree correspond to files or empty directories.

Comment. In fact, the Unix file system is not tree-based. The fact is that the system has the possibility of violating the hierarchy in the form of a tree, since it is possible to associate multiple names with the same file content.

Disk structure

The disk is divided into blocks. The data block size is determined when formatting the file system with the mkfs command and can be set to 512, 1024, 2048, 4096 or 8192 bytes.

We count 512 bytes (sector size).

Disk space is divided into the following areas (see figure):

· loading block;

· control superblock;

· array of i-nodes;

· area for storing the contents (data) of files;

· a set of free blocks (linked into a list);

Boot block

Superblock

i - node

. . .

i - node

Comment. For the UFS file system - all this is repeated for a group of cylinders (except for the Boot block) + a special area is allocated to describe the group of cylinders

Boot block

The block is located in block No. 0. (Recall that the placement of this block in system device block zero is determined by the hardware, since the hardware boot loader always accesses system device block zero. This is the last component of the file system that is hardware dependent.)

The boot block contains a promotion program that is used to initially launch the UNIX OS. In S 5 file systems, only the boot block of the root file system is actually used. In additional file systems, this area is present, but not used.

Superblock

It contains operational information about the state of the file system, as well as data about file system configuration parameters.

In particular, the superblock contains the following information

· number of i-nodes (index descriptors);

· partition size???;

· list of free blocks;

· list of free i-nodes;

· and more.

Let's pay attention! The free space on the disk is linked list of free blocks. This list is stored in a superblock.

List elements are arrays of 50 elements (if block = 512 bytes, then element = 16 bits):

· array elements No. 1-48 contain the numbers of free blocks of file block space from 2 to 49.

· element #0 contains a pointer to the continuation of the list, and

· the last element (No. 49) contains a pointer to a free element in the array.

If some process needs a free block to expand a file, then the system selects an array element using a pointer (to a free element), and the block with No. stored in this element is provided to the file. If a file is reduced, the freed numbers are added to the array of free blocks and the pointer to the free element is adjusted.

Since the array size is 50 elements, two critical situations are possible:

1. When we free blocks of files, but they cannot fit in this array. In this case, one free block is selected from the file system and the completely filled array of free blocks is copied into this block, after which the value of the pointer to the free element is reset, and the zero element of the array, which is located in the superblock, contains the number of the block that the system has chosen to copy the contents of the array. At this moment, a new element of the list of free blocks is created (each with 50 elements).

2. When the contents of the elements of the array of free blocks have been exhausted (in this case, the zero element of the array is zero). If this element is not equal to zero, then this means that there is a continuation of the array. This continuation is read into a copy of the superblock in RAM.

Free listi-nodes. This is a buffer consisting of 100 elements. It contains information about 100 numbers of i-nodes that are free at the moment.

The superblock is always in RAM

Þ all operations (releasing and occupying blocks and i-nodes occur in RAM Þ minimizing disk exchanges.

But! If the contents of the superblock are not written to the disk and the power is turned off, problems will arise (a discrepancy between the real state of the file system and the contents of the superblock). But this is already a requirement for the reliability of the system equipment.

Comment. UFS file systems support multiple copies of the superblock (one copy per cylinder group) to improve stability.

Inode area

This is an array of file descriptions called i -nodes (i -node).(64 byte?)

Each index descriptor (i-node) of a file contains:

· File type (file/directory/special file/fifo/socket)

· Attributes (access rights) - 10

File owner ID

· Group ID of the file owner

· File creation time

File modification time

· Time of last access to the file

· File length

· Number of links to a given i-node from various directories

File block addresses

!Please note. There is no file name here

Let's take a closer look at how it is organized block addressing, in which the file is located. So, in the address field there are numbers of the first 10 blocks of the file.

If the file exceeds ten blocks, then the following mechanism begins to work: the 11th element of the field contains the block number, which contains 128 (256) links to blocks of this file. In the event that the file is even larger, then the 12th element of the field is used - it contains the block number, which contains 128 (256) block numbers, where each block contains 128 (256) file system block numbers. And if the file is even larger, then the 13th element is used - where the nesting depth of the list is increased by another one.

This way we can get a file of size (10+128+128 2 +128 3)*512.

This can be represented as follows:

Address of the 1st block of the file

Address of the 2nd block of the file

Address of the 10th block of the file

Indirect addressing block address (block with 256 block addresses)

Address of the 2nd indirect addressing block (block with 256 address blocks with addresses)

Address of the 3rd indirect addressing block (block with addresses of blocks with addresses of blocks with addresses)

File protection

Now let's look at the owner and group IDs and security bits.

In Unix OS it is used three-level user hierarchy:

The first level is all users.

The second level is user groups. (All users are divided into groups.

The third level is a specific user (Groups consist of real users). Due to this three-level organization of users, each file has three attributes:

1) Owner of the file. This attribute is associated with one specific user, who is automatically assigned by the system as the owner of the file. You can become the default owner by creating a file, and there is also a command that allows you to change the owner of a file.

2) File access protection. Access to each file is limited to three categories:

· owner rights (what the owner can do with this file, in the general case - not necessarily everything);

· rights of the group to which the file owner belongs. The owner is not included here (for example, a file can be read-locked for the owner, but all other group members can freely read from the file;

· all other users of the system;

For these three categories, three actions are regulated: reading from a file, writing to a file and executing a file (in the mnemonics of the R, W, X system, respectively). Each file in these three categories defines which user can read, which can write, and who can run it as a process.

Directory organization

From the OS point of view, a directory is a regular file that contains data about all the files that belong to the directory.

A directory element consists of two fields:

1)number of the i-node (ordinal number in the array of i-nodes) and

2)file name:

Each directory contains two special names: ‘.’ - the directory itself; ‘..’ - parent directory.

(For the root directory, the parent refers to the same directory.)

In general, a directory can contain multiple entries that refer to the same i-node, but the directory cannot contain entries with the same names. That is, an arbitrary number of names can be associated with the contents of the file. It's called tying. A directory entry that refers to a single file is called communication.

Files exist independently of directory entries, and directory links actually point to physical files. A file "disappears" when the last link pointing to it is deleted.

So, to access a file by name, operating system

1. finds this name in the directory containing the file,

2. gets the number of the i-node of the file,

3. by number finds the i-node in the area of i-nodes,

4. from the i-node receives the addresses of the blocks in which the file data is located,

5. reads blocks from the data area using block addresses.

Disk partition structure in EXT2 FS

The entire partition space is divided into blocks. A block can be 1, 2, or 4 kilobytes in size. A block is an addressable unit of disk space.

Blocks in their area are combined into groups of blocks. Groups of blocks in a file system and blocks within a group are numbered sequentially, starting with 1. The first block on a disk is numbered 1 and belongs to group number 1. The total number of blocks on a disk (in a disk partition) is a divisor of the disk's capacity, expressed in sectors. And the number of block groups does not have to divide the number of blocks, because the last block group may not be complete. The beginning of each group of blocks has an address, which can be obtained as ((group number - 1)* (number of blocks in the group)).

Each group of blocks has the same structure. Its structure is presented in the table.

The first element of this structure (superblock) is the same for all groups, and all the rest are individual for each group. The superblock is stored in the first block of each block group (except for group 1, which has a boot record in the first block). Superblock is the starting point of the file system. It is 1024 bytes in size and is always located at offset 1024 bytes from the beginning of the file system. The presence of multiple copies of a superblock is explained by the extreme importance of this element of the file system. Superblock duplicates are used when recovering a file system after failures.

The information stored in the superblock is used to organize access to the rest of the data on the disk. The superblock determines the size of the file system, the maximum number of files in the partition, the amount of free space, and contains information about where to look for unallocated areas. When the OS starts, the superblock is read into memory and all changes to the file system are first reflected in a copy of the superblock located in the OS and are written to disk only periodically. This improves system performance because many users and processes are constantly updating files. On the other hand, when the system is turned off, the superblock must be written to disk, which does not allow turning off the computer by simply turning off the power. Otherwise, the next time you boot, the information recorded in the superblock will not correspond to the real state of the file system.

Following the superblock is a description of the group of blocks (Group Descriptors). This description contains:

Address of the block containing the block bitmap of this group;

Address of the block containing the inode bitmap of this group;

Address of the block containing the inode table of this group;

Counter of the number of free blocks in this group;

The number of free inodes in this group;

The number of inodes in a given group that are directories

and other data.

The information stored in the group description is used to locate the block and inode bitmaps, as well as the inode table.

File system Ext 2 is characterized by:

hierarchical structure,
coordinated processing of data sets,
dynamic file extension,
protection of information in files,
treating peripheral devices (such as terminals and tape devices) as files.

Internal file representation

Each file in the Ext 2 system has unique index. The index contains the information needed by any process to access the file. Processes access files using a well-defined set of system calls and identifying the file with a string of characters that acts as a qualified file name. Each compound name uniquely identifies a file, so the system kernel converts this name into a file index. The index includes a table of addresses where file information is located on disk. Since each block on a disk is addressed by its own number, this table stores a collection of disk block numbers. To increase flexibility, the kernel appends a file one block at a time, allowing the file's information to be scattered throughout the file system. But this layout complicates the task of searching for data. The address table contains a list of block numbers containing information belonging to the file.

File inodes

Each file on the disk has a corresponding file inode, which is identified by its serial number - the file index. This means that the number of files that can be created in a file system is limited by the number of inodes, which is either explicitly specified when the file system is created or calculated based on the physical size of the disk partition. Inodes exist on disk in static form and the kernel reads them into memory before working with them.

The file inode contains the following information:

- The type and access rights to this file.

File owner identifier (Owner Uid).

File size in bytes.

Time of the last access to the file (Access time).

File creation time.

Time of the last modification of the file.

File deletion time.

Group ID (GID).

Links count.

The number of blocks occupied by the file.

File flags

Reserved for OS

Pointers to blocks in which file data is written (an example of direct and indirect addressing in Fig. 1)

File version (for NFS)

ACL file

Directory ACL

Fragment address

Fragment number

Fragment size

Catalogs

Directories are files.

The kernel stores data in a directory just as it does in a regular file type, using an index structure and blocks with direct and indirect addressing levels. Processes can read data from directories in the same way they read regular files, however, exclusive write access to the directory is reserved by the kernel, ensuring that the directory structure is correct.)

When a process uses a file path, the kernel looks in the directories for the corresponding inode number. After the file name has been converted to an inode number, the inode is placed in memory and then used in subsequent requests.

Additional features of EXT2 FS

In addition to standard Unix features, EXT2fs provides some additional features not typically supported by Unix file systems.

File attributes allow you to change how the kernel reacts when working with sets of files. You can set attributes on a file or directory. In the second case, files created in this directory inherit these attributes.

During system mounting, some features related to file attributes may be set. The mount option allows the administrator to choose how files are created. In a BSD-specific file system, files are created with the same group ID as the parent directory. The features of System V are somewhat more complex. If a directory has its setgid bit set, then created files inherit the group ID of that directory, and subdirectories inherit the group ID and setgid bit. Otherwise, files and directories are created with the primary group ID of the calling process.

The EXT2fs system can use synchronous data modification similar to the BSD system. The mount option allows the administrator to specify that all data (inodes, bit blocks, indirect blocks, and directory blocks) be written to disk synchronously when they are modified. This can be used to achieve high data recording capacity, but also results in poor performance. In reality, this function is not usually used because, in addition to degrading performance, it can lead to the loss of user data that is not flagged when checking the file system.

EXT2fs allows you to select the logical block size when creating a file system. It can be 1024, 2048 or 4096 bytes in size. Using larger blocks results in faster I/O operations (since fewer disk requests are made), and therefore less head movement. On the other hand, using large blocks leads to wasted disk space. Typically, the last block of a file is not completely used for storing information, so as the block size increases, the amount of wasted disk space increases.

EXT2fs allows you to use accelerated symbolic links. When using such links, file system data blocks are not used. The destination file name is not stored in the data block, but in the inode itself. This structure allows you to save disk space and speed up the processing of symbolic links. Of course, the space reserved for the handle is limited, so not every link can be represented as accelerated. The maximum length of a file name in an accelerated link is 60 characters. In the near future it is planned to expand this scheme for small files.

EXT2fs monitors the state of the file system. The kernel uses a separate field in the superblock to indicate the state of the file system. If the file system is mounted in read/write mode, then its state is set to "Not Clean". If it is dismantled or remounted in read-only mode, then its state is set to “Clean”. During system boot and file system status checks, this information is used to determine whether a file system check is necessary. The kernel also places some errors in this field. When the kernel detects a mismatch, the file system is marked as "Erroneous". The file system checker tests this information to check the system, even if its status is actually Clean.

Ignoring file system testing for a long time can sometimes lead to some difficulties, so EXT2fs includes two methods for regularly checking the system. The superblock contains the system mount counter. This counter is incremented each time the system is mounted in read/write mode. If its value reaches the maximum (it is also stored in the superblock), then the file system test program starts checking it, even if its state is "Clean". The last check time and the maximum interval between checks are also stored in the superblock. When the maximum interval between scans is reached, the state of the file system is ignored and its scan is started.

Performance optimization

The EXT2fs system contains many features that optimize its performance, which leads to increased speed of information exchange when reading and writing files.

EXT2fs actively uses the disk buffer. When a block needs to be read, the kernel issues an I/O operation request to several adjacent blocks. Thus, the kernel tries to make sure that the next block to be read has already been loaded into the disk buffer. Such operations are usually performed when reading files sequentially.

The EXT2fs system also contains a large number of optimizations for information placement. Block groups are used to group together corresponding inodes and data blocks. The kernel always tries to place the data blocks of one file in the same group, as well as its descriptor. This is intended to reduce the movement of the drive heads when reading the descriptor and its corresponding data blocks.

When writing data to a file, EXT2fs pre-allocates up to 8 contiguous blocks when allocating a new block. This method allows you to achieve high performance under heavy system load. This also allows files to be placed in contiguous blocks, which speeds up their subsequent reading.