• Associative computer memory. associative memory. Development of associative memory. General information and classification of memory devices

    Ways to organize memory

    Parameter name Meaning
    Article subject: Ways to organize memory
    Rubric (thematic category) Computers

    Functionally, a memory of any type always consists of a storage array that stores information, and auxiliary, very complex blocks that serve to search in the array, write and read (and, if necessary, regenerate).

    The storage array (MS) consists of a plurality of identical storage elements (SE). All SEs are organized into cells, each of which is designed to store a unit of information in the form of a binary code, the number of bits of which is determined by the sample width. The way the memory is organized depends on the methods of placing and searching for information in the SM. On this basis, address, associative and stack memory are distinguished.

    ADDRESS MEMORY

    In memory with an address organization, the placement and search for information in the SM is based on the use of the storage address of the information unit, which we will call further for brevity word. The address is the number of the SM cell in which this word is placed. When writing (reading) a word in the SM, the command initiating this operation must indicate the address (number) of the cell by which it is necessary to write (read).

    On fig. 5.2 shows a generalized structure of address memory.

    The memory access cycle is initialized by the "Access" signal arriving at the TCU. The general part of the access cycle includes receiving in RgA from the address bus (SHA) the address of the address and receiving in the TCU the control signal "Operation" indicating the type of the requested operation (reading or writing).

    Reading. BAW decrypts the address and sends a signal that highlights the 3M cell specified by the address. In the general case, the BAW can also send signals to the allocated memory cell that set up the GE cells for writing or reading. After that, the word written in the cell is read by the BUS amplifiers and transmitted to the RGI. Further, in the memory with destructive reading, information is regenerated by writing a word from the RgI through the BUZ to the same SM cell. The read operation is completed by issuing a word from the RGI to the output information bus SHI out.

    Record. In addition to the above general part of the access cycle, the written word is received from the input bus SHI in to RGI. The record itself generally consists of two operations - clearing the cell and the record itself. To do this, the BAS first selects and clears the cell specified by the address in PrA. Clearing the SM cell (resetting) can be done in different ways. In particular, in a memory with destructive readout, clearing can be done by a signal to read a word in a cell when the BUS is blocked (so that information does not enter the RGI). Next, a new word is written to the selected cell.

    The need for the operation of clearing the cell before writing, as well as the operation of regenerating information during reading, is determined by the type of SGs used, control methods, features of the electronic structure of the LSI memory, and therefore these operations may be absent in semiconductor memories.

    The TCU generates the necessary sequences of control signals that initiate the operation of individual memory nodes. It should be borne in mind that the TCU must be a very complex device (a kind of control controller with its own cache memory), which gives the memory LSI as a whole special consumer properties, such as multiport, pipelined output of information, etc.

    ASSOCIATIVE MEMORY

    In this type of memory, the search for information occurs not by address, but by its content. In this case, the content of information is usually understood not as the semantic load of the word stored in the memory cell, but as the content of the GE of the memory cell, ᴛ.ᴇ. the bitwise composition of the recorded binary word. In this case, the associative query (feature) is also a binary code with a certain bit composition. The search by an associative feature occurs in parallel in time for all SM cells and is an operation of comparing the content of the attribute register bits with the content of the corresponding bits of the memory cells. To organize such a search, all EP SMs are equipped with single-bit processors; therefore, in some cases, this type of memory is considered as a multiprocessor system.

    Large fully associative memory is a very expensive device, therefore, in order to reduce its cost, the number of single-bit processors is reduced to one per memory cell. In this case, the comparison of the associative query with the contents of the memory cells is carried out sequentially for individual digits, parallel in time for all SM cells.

    With very large amounts of memory on certain classes of problems, associative search significantly speeds up data processing and reduces the probability of failure in the computer. At the same time, associative memories with blocks of corresponding combinational circuits make it possible to perform fairly complex logical operations in memory: searching for the maximum or minimum number in an array, searching for words enclosed in certain boundaries, sorting an array, etc.

    It should be noted that an associative search can also be implemented in a computer with a conventional address memory, sequentially calling the words written in the memory cells to the processor and comparing them with some associative feature (template). At the same time, with large amounts of memory, a lot of time will be spent on this. When using associative memory, it is possible, without reading words from the RAM to the processor, to determine the number of words corresponding to one or another associative query in one call. This allows in large databases to very quickly implement a query like: how many residents of the region did not submit an income declaration, etc.

    In some specialized computers, the OP or part of it is built in such a way that it makes it possible to implement both associative and targeted information search.

    A simplified block diagram of an associative memory, in which all GE SMs are equipped with single-bit processors, is shown in Fig. 5.3.

    Let us first consider the operation called association control. This operation is common to the read and write operations, and also has an independent value.

    An n-bit associative request, ᴛ.ᴇ, enters the RGAP via the input information buse. digits from 0 to n-1 are filled. At the same time, the search mask code enters RgM, while the nth bit of RgM is set to 0. Associative search is performed only for the set of RgAP bits that correspond to 1 in RgM (unmasked RgAP bits). It is important to note that for words in which the digits in the digits coincided with the unmasked digits of PrAP, the CS sets 1 in the corresponding digits of PrCv and 0 in the remaining digits.

    The combinational scheme for generating the result of the associative inversion of the FS forms at least three signals from the word formed in RgSv:

    A 0 - the absence in the SM of words that satisfy the associative attribute;

    A 1 - the presence of one such word;

    A 2 - the presence of more than one word.

    Other operations on the contents of PgSv are also possible, for example, counting the number of units, ᴛ.ᴇ. counting words in memory that satisfy an associative query, etc.

    The formation of the contents of RgSv and a 0 , a 1 , a 2 according to the contents of RgAP, RgM, ZM is usually called the association control operation.

    Reading. First, the association is controlled on the basis of RgAP.

    A 0 = 1 - reading is canceled due to the lack of the required information;

    A 1 \u003d 1 - the found word is read into the RGI, after which it is issued to the SHI output;

    A 2 \u003d 1 - a word is read that has, for example, the smallest number among the cells marked 1 in RgSv, after which it is issued to the SHI output.

    Record. First, a free cell is found (we assume that 0 is written in the busy bit of a free cell). To do this, the association control is performed at PgAP=111...10 and PgM=000...01, ᴛ.ᴇ. The nth digit of RgAP is set to 0, and the nth digit of RgM is set to 1. In this case, the free cell is marked 1 in RgSv. For recording, a free cell is selected, for example, with the smallest number. It contains the word received from SHI in RGI.

    It should be noted that this diagram does not show the blocks BUP, BUS, BUS, which are in real devices. At the same time, to build an associative memory, storage elements are required that can be read without destruction.

    STACK MEMORY (STORE)

    Stack memory, like associative memory, is unaddressed. Stack memory must be organized both in hardware and on a regular array of address memory.

    In the case of a hardware implementation, the stack memory cells form a one-dimensional array in which neighboring cells are connected to each other by bit chains of word transmission (Fig. 5.4). In this case, two types of devices (a, b) are possible, the principles of operation of which are different. Let us first consider the structure in Fig. 5.4, ​​a.

    The entry of a new word received from SHI in is made to the upper (zero) cell, while all previously recorded words (including the word in cell 0) are shifted down to adjacent cells, the numbers of which are greater by one. Reading is possible only from the upper (zero) memory cell. The main mode is ϶ᴛᴏ reading with deletion. At the same time, all other words in memory are shifted up to adjacent cells with lower numbers. In such a memory, the following rule is implemented: last in, first out. Stacks of this type are called LIFO (Last In - First Out) stacks.

    In some cases, stack memory devices also provide for the operation of simply reading a word from cell 0 without deleting it and shifting the rest of the words. When using the stack to store the initialization parameters of the controllers of any computer devices, it is usually possible to read the contents of any stack cell without deleting it, ᴛ.ᴇ. reading content not just cell 0.

    The first word pushed onto the stack is said to be located on bottom of the stack. The last word sent (in time) to the stack is said to be in top of the stack. Τᴀᴋᴎᴍ ᴏϬᴩᴀᴈᴏᴍ, cell N-1 is the bottom of the stack, and cell 0 is the top.

    Typically, the hardware stack is provided with a stack counter, ChSt, showing the total number of words stored in memory (CHSt = 0 - the stack is empty). When the stack is full, it disables further write operations.

    The stack principle of memory organization can be implemented not only in devices specially designed for this purpose. Stack organization of data is also possible on conventional address memory with random access (software stack). To organize the LIFO stack in this case, one more memory cell (register) is needed, in which the address of the top of the stack is always stored and which is commonly called stack pointer. Usually, one of the processor's internal registers is used as the stack pointer. In addition, appropriate software is required. The principles of stack organization of data on conventional address memory are illustrated by the diagram in fig. 5.5.

    Unlike a hardware stack, data placed on a software stack is not moved when a new number is written or read. Each new word is written to the memory location following in order the one whose address is contained in the stack pointer. After a new word is written, the stack pointer is incremented by one (see Figure 6.5). Τᴀᴋᴎᴍ ᴏϬᴩᴀᴈᴏᴍ, it's not the data that moves on the software stack, but the top of the stack. When a word is read from the stack, the process is reversed. The word is read from the location whose address is in the stack pointer, after which the contents of the stack pointer are decremented by one.

    If the words newly loaded onto the stack are placed in memory cells with sequentially increasing addresses, the stack is called direct. If the addresses are consecutively decreasing, then - upside down. In most cases, a flipped stack is used, which is due to the peculiarities of the hardware implementation of counters inside the processor.

    How convenient is this form of memory organization? Looking ahead, it can be noted that any instruction executed in the processor, in the general case, must contain an operation code (COP), the address of the first and second operands, and the address of the result. To save memory and reduce the execution time of a machine instruction by the processor, it is desirable to reduce the length of the instruction. The limit for this reduction is the length of the unaddressed command, ᴛ.ᴇ. just COP. It is these instructions that are possible with the stack organization of memory, since with the correct arrangement of operands on the stack, it is enough to sequentially extract them and perform the appropriate operations on them.

    In addition to the stack memory of the LIFO type discussed above, computers use stack memories of another type that implement the rule: first in - first out. Stacks of this type are called FIFO (First In - First Out) stacks. Such stack memory is widely used for organizing various kinds of queues (commands, data, requests, etc.). The generalized structure of the hardware stack of the FIFO type is shown in fig. 5.4b.

    As in the previous case, the stack memory cells form a one-dimensional array in which adjacent cells are connected to each other by bit chains of word transmission. The entry of a new word received from SHI in is carried out in the upper (zero) cell, after which it immediately moves down and is written to the last empty cell. If the stack before writing was empty, the word immediately goes to the cell with the number N-1, ᴛ.ᴇ. to the bottom of the stack. Reading is only possible from the bottom cell numbered N-1 (the bottom of the stack). The main mode is ϶ᴛᴏ reading with deletion. In this case, all subsequent (recorded) words are shifted down to adjacent cells, the numbers of which are one more. When the stack is full, the counter (CHST) prohibits further writes to the stack.

    Τᴀᴋᴎᴍ ᴏϬᴩᴀᴈᴏᴍ, unlike the LIFO stack, the FIFO stack does not move the bottom, but the top. Words written to the FIFO stack gradually move from the top to the bottom, from where they are read as they are extremely important, and the rate of writing and reading is determined by external control signals and is not related to each other.

    The software implementation of the FIFO stack is not considered in this section, since it is rarely used in practice.

    Ways of memory organization - concept and types. Classification and features of the category "Methods of organizing memory" 2017, 2018.

    The storage device, as a rule, contains many identical storage elements that form a storage array (SM). The array is divided into individual cells; each of them is designed to store a binary code, the number of bits in which is determined by the width of the memory sample (in particular, it can be one, half, or several machine words). The way memory is organized depends on the methods of placing and searching for information in the storage array. On this basis, address, associative and stack (store) memory are distinguished.

    address memory. In memory with an address organization, the placement and search of information in the SM is based on the use of the storage address of the word (number, command, etc.), the address is the number of the SM cell in which this word is located.

    When writing (or reading) a word to the SM, the command initiating this operation must indicate the address (cell number) at which the recording (reading) is performed.

    The typical address memory structure shown in fig. 4.2 contains a storage array of N n-bit cells and its hardware frame, including the address register RgA, having k(k> log 2 N) bits, information register RGI, address fetch block BAS, readout amplifier unit BUS, block of bit amplifiers-formers of recording signals BUZ and memory management BUP.

    Fig.4.2.Structure of address memory.

    By address code RgA BAV generates signals in the corresponding memory cell that allow reading or writing a word in the cell.

    The memory access cycle is initiated by the arrival of BUP from outside the signal Appeal. The general part of the circulation cycle includes the reception in RgA with address bus USA addresses of appeal and reception in BUP and decoding of the control signal Operation, indicating the type of requested operation (read or write).

    Next when reading BAS decrypts the address, sends read signals to the cell specified by the address ZM, in this case, the code of the word written in the cell is read by the readout amplifiers BUS and transferred to RGI. Then in memory with destructive reading (when reading, all storage elements of the cell are set to the zero state). information is regenerated in the cell by writing to it from RGI read word. The read operation is completed by issuing a word from RGI to the output information bus SHIout.

    When writing, in addition to performing the above general part of the access cycle, the word being written is received from the input information bus SHIVx V RGI. The record itself consists of two operations: clearing the cell (reset to 0) and the record itself. For this BAS first selects and clears the cell specified by the address in RgA. Clearing is performed by the word read signals in the cell, but the read amplifiers and from BUS V RGI information is not received. Then to the chosen BAS cell is written word from RGI.

    Control block BUP generates the necessary sequences of control signals that initiate the operation of individual memory nodes. Control signal transmission chains are shown by thin lines in fig. 4.2.

    associative memory. In the memory of this type, the search for the necessary information is carried out not by the address, but by its content (by the associative feature). In this case, the search by an associative attribute (or sequentially by individual digits of this attribute) occurs in parallel in time for all cells of the storage array. In many cases, associative search can significantly simplify and speed up data processing. This is achieved due to the fact that in this type of memory the operation of reading information is combined with the execution of a number of logical operations.

    A typical structure of associative memory is shown in fig. 4.3. The storage array contains N(P + 1) -bit cells. The service n-th bit is used to indicate the occupancy of the cell (0 - the cell is free, 1 - the word is written in the cell).

    On the input information bus SHIVx to the register of the associative attribute RGAP in digits 0-and-1 enters P- bit associative query, and into the mask register RgM - search mask code, with the nth digit RgM set to 0. Associative search is performed only for a set of digits RGAP, which "correspond to 1 in RgM(unmasked digits RgAP). For words in which the digits in the digits matched the unmasked digits RGAP, combinational circuit KS sets 1 to the corresponding bits of the match register RgSv and 0 for the rest of the bits. So the value j-ro discharge in RgSv is defined by the expression

    RgSv(j) =

    Where RGAP[i], RgM[i] and ZM - values ​​of the i-th digit, respectively RGAP, RGM and j-th cell ZM.

    Combination scheme for generating the result of an associative call FS forms from a word formed in RgSv, signals  0 ,  1 ,  2 corresponding to the absence of words in ZM, satisfying the associative feature, the presence of one or more than one such word. For this FS implements the following boolean functions:

     0 =

     1 = РгСв

     2 =  0  1

    Content Shaping RgSv and signals  0 ,  1 ,  2 by content RGAP, RGM And ZM is called an association control operation. This operation is an integral part of the read and write operations, although it also has an independent meaning.

    When reading, the association is first controlled by an associative feature in RGAP. Then at  0 = 1 reading is canceled due to the lack of the required information, when  1 = 1 it is read in RGI found word, with  2 = 1 in RGI the word is read from the cell with the smallest number among the cells marked 1 in RgSt. From RGI the read word is issued on SHIout.

    Rice. 4.3. Structure of associative memory

    When writing, a free cell is first searched. To do this, an association control operation is performed when RgAP= 111. ..10 and RgM== 00... 01. In this case, free cells are marked 1 in RgSt. For recording, a free cell with the smallest number is selected. It contains the word received from SHIVx V RGI.

    Rice. 4.4. stack memory

    With the help of the association control operation, it is possible, without reading words from memory, to determine by the content RgSv, how many words in memory that satisfy an associative attribute, for example, to implement queries like how many students in a group have an excellent mark in a given discipline. When appropriate combinational circuits are used, quite complex logical operations can be performed in associative memory, such as searching for a larger (smaller) number, searching for words enclosed within certain boundaries, searching for the maximum (minimum) number, etc.

    Note that associative memory requires storage elements that can be read without destroying the information recorded in them. This is due to the fact that in the associative search, reading is performed over the entire SM for all unmasked bits and there is no place to store information temporarily destroyed by reading.

    stack memory, as well as associative, is unaddressed. IN stack memory(Fig. 4.4) cells form a one-dimensional array in which neighboring cells are connected to each other by bit chains of word transmission. A new word is written to the top cell (cell 0), while all previously recorded words (including the word that was in cell 0) are shifted down to adjacent cells with numbers larger by 1. Reading is possible only from the upper (zero) memory cell, while if reading with deletion is performed, all other words in the memory are shifted upwards to neighboring cells with higher numbers. In this memory, the order in which words are read follows the rule: last entered - served first. In a number of devices of the type under consideration, the operation of simply reading a word from the zero cell (without deleting it and shifting the word in memory) is also provided. Sometimes stack memory is provided with a stack counter chst, showing the number of memorized words. Signal MFST = 0 matches empty, stack, MFST = N - 1 - full stack.

    Often stack memory is organized using address memory. Stack memory is widely used when processing nested data structures.

    The following paragraphs of the chapter describe the various types of memory with address organization. Associative memory is used in the equipment for dynamic memory allocation, as well as for building a cache memory.

    In associative storage devices, information is searched for by an associative feature recorded in each memory cell.

    In the memory of this type, the search for the necessary information is carried out not by the address, but by the content of the information itself (ie, by the associative feature). In this case, the search by associative feature occurs in parallel in time for all memory cells. Associative search allows you to significantly simplify and speed up data processing. This is achieved due to the fact that in such a memory the operation of reading information is combined with the execution of a number of logical operations. For example, you can perform operations such as:

    1) search for the maximum or minimum number in the memory;

    2) search for words enclosed in certain boundaries;

    3) search for words closest to the associative feature, both from the larger and from the smaller side, etc.

    The simplest associative memory usually performs a single operation of selecting words whose feature matches the associative feature.

    The memory array (SM) contains N cells, each cell is n+1 bit. To indicate the employment of the cell, the service n-th bit is used. If there is 0 in the n-th digit, then the cell is free, if 1, then it is occupied.

    An n-bit sign enters the RGP associative feature register at the input SD, and the search mask code enters the RGM mask register. In this case, the nth bit of the RGM register is set to 0. An associative search is performed only on those bits of the attribute that correspond to "1" in the mask register, that is, on the so-called unmasked RGM bits. Thus, by setting the mask code M, one can arbitrarily choose those digits of the attribute by which the search is conducted.

    For words from 3M, in which all digits matched with unmasked RGP bits, the combined circuit KS 1 sets "1" in the corresponding bits of the RGC match register. Thus, if the digit of the j-th word coincides with the non-maskable bits of the sign, then "1" will be written in the j-th bit of the RGC register, otherwise "0". The entry "1" in the j-th digit of the RGC means that the j-th word corresponds to the feature, i.e. is the word that is actually searched for in the ZM.

    A word is written into the mask register that allows a query for all or only some digits of the associative feature, the use of a mask allows you to reduce or expand the search area.

    Information is searched in parallel for all cells by comparing the query with the associative feature of each cell.

    The search result is generated by a special combinational circuit that generates signals that indicate the absence of words that meet the search conditions, the presence of only one word, the presence of several words with such an associative feature.

    After the formation and processing of alert signals, the control circuit reads the necessary information.

    When writing information, a free cell is found first. To do this, an associative search operation is performed on a feature that has “0” in all digits, and “0” is written in the mask register in all digits, except for the least significant n-th digit.

    Thus, those SM cells are determined, in which “0” is written in the n-th digit, which means that the cell is unoccupied. The word from the RGI information register is written to the free cell with the smallest number.

    When using additional combinational circuits in associative memory, you can perform various logical operations, determining the maximum or minimum number, the number of words that have the same associative feature, etc. Figure 1 shows the structure of associative memory. The memory cells of an associative storage device must be elements of static memory; in associative memory, all cells are accessed simultaneously and must not be interrupted by refresh cycles. Associative memory is the fastest, but very expensive, since it requires the introduction of an additional comparison circuit that allows you to search for each memory cell. Therefore, such memory is usually not used in its pure form, and high-speed cache-type memory devices are usually implemented as partially associative.

    In microprocessors, associative memory (memory with a content selection) is used as part of the cache memory to store the address part of the instructions and operands of the executable program. In this case, there is no need to access the RAM for the next instruction or the required operand, it is enough to place the required address in the associative sign register and, if the required information is available in the memory cache, it will be immediately issued. Access to RAM will be necessary only if the required information is not in the cache. By using the cache in this way, the number of RAM accesses is reduced, which saves time, since a cache access takes approximately 10 times less time than a RAM access.

    Stack organization of memory

    If writing and reading is done through the same register, then such a device is called stack memory, operating on the principle of "first in - last out" (FILO-First Input, Last Output).

    Stack memory, as well as associative, is non-addressable, it is a collection of cells that form a one-dimensional array in which neighboring cells are connected to each other by bit chains of word transmission. Words are always written to the top zero cell. In this case, all previously recorded words are shifted down one cell. Reading is done in reverse order of writing.

    Stack memory has become widespread. For its implementation in RAM, by means of operating system programs, a part of the memory is allocated for the stack. In practice, stack memory is often organized using conventional address memory.

    Consider the organization of stack memory as a memory formed from interconnected memory cells in which information is shifted down when a new word is written to the stack (Fig. 2). The exchange of information is carried out only through the upper memory cell. When reading words from the stack, the word may be removed from the stack memory or shifted around the ring, depending on the organization of the stack. Reading mode - last in, first out - is called LIFO (Last In First Out).


    Fig.2.Organization of stack memory.

    The hardware implementation of such memory is not always appropriate, and often the stack memory is organized in the main memory of the computer by software, which allows you to change the size of the stack depending on the need. When organizing a stack in the main memory, a special address register is allocated - the “stack pointer”. The stack pointer contains the address of the last word pushed onto the stack. When a word is written to the stack, the address of the top of the stack is automatically decremented; when a word is read, it is automatically incremented. Stack memory is usually used to store the state of the current program when processing an interrupt. After the execution of the interrupting program, the state of all registers that existed at the time of the interruption of the program is restored in the reverse order of the recording. You can also store program data on the stack, which is convenient because when accessing the stack, you do not need to specify the address of the memory cell in the program; extracting information from the stack also occurs without specifying the address.

    IN associative memory elements are selected not by address, but by content. Let us explain the last concept in more detail. For memory with address organization, the concept was introduced minimum addressable unit(MAE) as a piece of data that has an individual address. Let us introduce a similar concept for associative memory, and we will be this minimum unit of storage in associative memory call associative memory string(Strap). Each StrAP contains two fields: a tag field and a data field. A read request to associative memory can be expressed in words as follows: select a line (lines) whose tag is (are) equal to the given value.

    In particular, one of three results is possible with such a query:

    1. there is exactly one line with the given tag;
    2. there are multiple lines with the given tag;
    3. there is no row with the given tag.

    Searching for a record by attribute is an activity that is typical for database accesses, and searching the database is often an associative search. To perform such a search, you must look through all entries and compare the given tag with the tag of each entry. This can also be done when using ordinary addressable memory for storing records (and it is clear that this will require a lot of time - in proportion to the number of stored records!). About associative memory say when associative fetching of data from memory is supported by hardware. When writing to associative memory, the data element is placed in the StrAP along with the tag inherent in this element. To do this, you can use any free STRAP. Consider the varieties of the structural organization of the cache memory or ways to map the RAM to the cache.

    Fully associative cache

    The scheme of a fully associative cache is shown in the figure (see figure below).

    Let us describe the algorithm of operation of the system with cache memory. At the beginning of the operation, the cache memory is empty. When the first command is executed during the fetch, its code, as well as a few more adjacent bytes of the program code, will be transferred (slowly) to one of the cache lines, and at the same time the upper part of the address will be written to the corresponding tag. This is how the cache line is filled.

    If the next fetches are possible from this section, they will be made from the CASH (fast) - "CASH hit". If it turns out that the required element is not in the cache, - "CASH miss". In this case, the RAM is accessed (slowly), and the next cache line is simultaneously filled.

    Fully Associative Cache Diagram

    Accessing the cache is as follows. After the execution address is formed, its high bits, which form the tag, are compared in hardware (quickly) and simultaneously with the tags of all cache lines. In this case, only two situations out of the three listed above are possible: either all comparisons will give a negative result (CASH miss), or a positive result of the comparison will be recorded for exactly one row (CASH hit).

    When reading, if a CACHE hit is fixed, the lower bits of the address determine the position in the cache line from which to select bytes, and the type of operation determines the number of bytes. Obviously, if the length of a data element exceeds one byte, then situations are possible when this element (in parts) is located in two (or more) different cache lines, then the time to fetch such an element will increase. This can be counteracted by aligning operands and instructions along cache line boundaries, which is taken into account when developing optimizing translators or when manually optimizing code.

    If a cache miss occurs and there are no free lines in the cache, you must replace one line of the cache with another line.

    The main purpose of the replacement strategy is to keep in cache the lines that are most likely to be accessed in the near future, and replace the lines that will be accessed in a more distant time or not at all. Obviously, the optimal algorithm will be one that replaces the line that will be accessed later in the future than any other cache line.

    Unfortunately, such a prediction is practically unrealizable, and one has to resort to algorithms that are inferior to the optimal one. Regardless of the replacement algorithm used, it must be implemented in hardware to achieve high speed.

    Among the many possible substitution algorithms, four are the most common, considered in decreasing order of their relative efficiency. Any of these can be used in a fully associative cache.

    The most efficient is the replacement algorithm based on the oldest use ( LRU - Least Recently Used ), which replaces the cache line that has not been accessed for the longest time. Studies have shown that the LRU algorithm that "looks" backward performs quite well compared to the optimal algorithm that "looks" forward.

    The most well-known are two methods of hardware implementation of this algorithm. In the first one, a counter is associated with each cache line. One is added to the contents of all counters at certain intervals. When a string is accessed, its counter is reset to zero. Thus, the largest number will be in the counter of the row that has not been accessed for the longest time, and this row is the first candidate for replacement.

    The second method is implemented using a queue, where references to these lines are entered in the order in which the cache lines are filled. Each time a string is accessed, its reference is moved to the end of the queue. As a result, the first in the queue each time is a reference to the string, which has not been accessed for the longest time. It is this line that is replaced first of all.

    Another possible replacement algorithm is the first in, first out algorithm ( FIFO-First In First Out ). This replaces the line that has been in the cache the longest. The algorithm is easily implemented using the queue discussed earlier, with the only difference that after accessing the string, the position of the corresponding reference in the queue does not change.

    Another algorithm is to replace the least frequently used string (LFU - Least Frequently Used). The line in the cache that has been accessed the least is replaced. The principle can be put into practice by associating each row with a hit counter, to the contents of which one is added after each hit. The main contender for replacement is the string whose counter contains the smallest number.

    The simplest algorithm is an arbitrary choice of a string to replace. The replacement string is chosen at random. This can be implemented, for example, using a counter, the contents of which are incremented by one with each clock pulse, regardless of whether there was a hit or a miss. The value in the counter determines the string to be replaced.

    In addition to the tag and data bytes, the cache line may contain additional service fields, among which, first of all, the validity bit V (from valid - valid) and the modification bit M (from modify - change, modify) should be noted. When the next cache line is filled, V is set to the "valid" state, and M is set to the "not modified" state. If the contents of this line were changed during the execution of the program, the M bit is switched, signaling that when replacing this line, its contents should be rewritten to RAM. If for some reason a copy of an element of this string stored elsewhere (for example, in RAM) has changed, the V bit is switched. When accessing such a string, a cache miss will be recorded (despite the fact that the tag matches), and the call will to main RAM. In addition, the service field may contain bits that support the LRU algorithm.

    Estimation of equipment volume

    The typical amount of cache memory in a modern system is 8 ... 1024 kb, and the length of the cache line is 4 ... 32 bytes. Further evaluation is made for 256 KB cache size and 32 byte line length, which is typical for systems with Pentium and PentiumPro processors. The tag length is 27 bits, and the number of lines in the cache will be 256K/ 32=8192. That is how many digital comparators of 27 bit codes will be required to implement the above structure.

    A rough estimate of the cost of equipment for building a digital comparator gives a value of 10 transistors / bit, and the total number of transistors in the comparator block alone will be equal to:

    10*27*8192 = 2 211 840,

    which is approximately one and a half times less than the total number of transistors on a Pentium chip. Thus, it is clear that the described structure of a fully associative cache memory () is realizable only with a small number of lines in the cache, i.e. with a small amount of cache (practically no more than 32 ... 64 lines). A larger cache is built according to a different structure.

    multilevel page table requires several accesses to the main memory, so it takes a long time. In some cases, such a delay is unacceptable. The problem of search acceleration is solved at the level of computer architecture.

    Due to the property of locality, most programs access a small number of pages for some period of time, so only a small part of the page table is actively used.

    The natural solution to the acceleration problem is to equip the computer with a hardware device for mapping virtual pages to physical ones without referring to the page table, that is, to have a small, fast cache memory that stores the part of the page table that is currently needed. This device is called associative memory, sometimes also use the term translation lookaside buffer (TLB).

    One table entry in associative memory(one input) contains information about one virtual page: its attributes and the frame in which it is located. These fields correspond exactly to the fields in the page table.

    Because associative memory contains only some of the page table entries, each entry in the TLB must include a field with a number virtual page. The memory is called associative, because it simultaneously compares the number of the displayed virtual page with the corresponding field in all rows of this small table. Therefore, this type of memory is quite expensive. In line, field virtual page which coincided with the desired value, the page frame number is found. The usual number of entries in the TLB is from 8 to 4096. The increase in the number of entries in associative memory must take into account factors such as the size of the main memory cache and the number of memory accesses per instruction.

    Consider the functioning of the memory manager in the presence associative memory.

    At the beginning of the display information virtual page in the physical is found in associative memory. If the required entry is found, everything is fine, except for privilege violations, when the request to access the memory is denied.

    If the desired entry in associative memory absent, the display is via the page table. One of the entries in the associative memory found entry from the page table. Here we are faced with the traditional replacement problem for any cache (namely, which of the entries in the cache needs to be changed). Design associative memory should organize the records in such a way that it can be decided which of the old records should be deleted when new ones are added.

    The number of successful page number lookups in associative memory in relation to the total number of searches is called hit (coincidence) ratio (proportion, ratio). Sometimes the term cache hit percentage is also used. So the hit ratio is the part of links that can be made using associative memory. Referring to the same pages increases the hit ratio. The higher the hit ratio, the lower the average access time to data in RAM.

    Suppose, for example, that it takes 100 ns to determine the address in the event of a cache miss through the page table, and to determine the address in the event of a cache hit in associative memory– 20 ns. With 90% hit ratio, the average address resolution time is 0.9x20+0.1x100 = 28ns.

    Quite acceptable performance of modern operating systems proves the effectiveness of using associative memory. High probability of finding data in associative memory associated with the presence of objective properties of data: spatial and temporal locality.

    It is necessary to pay attention to the following fact. When switching the context of processes, you need to ensure that the new process "does not see" in associative memory information related to the previous process, such as clear it. So the use associative memory increases the context switch time.

    Considered two-level ( associative memory+ page table ) the address translation scheme is a prime example of a memory hierarchy based on the use of the principle of locality, as discussed in the introduction to the previous lecture.

    Inverted page table

    Despite the tiered organization, storing multiple large page tables is still a challenge. Its value is especially relevant for 64-bit architectures, where the number of virtual pages is very large. The solution is to use inverted page table(inverted page table). This approach is used on PowerPC machines, some Hewlett-Packard workstations, IBM RT, IBM AS/400, and several others.

    This table contains one entry for each page frame of physical memory. It is essential that one table is sufficient for all processes. Thus, to store the mapping function, a fixed portion of the main memory is required, regardless of the bitness of the architecture, the size and number of processes.

    Despite saving RAM, the use of inverted table has a significant disadvantage - the entries in it (as in associative memory) are not sorted in ascending order of virtual page numbers, which complicates address translation. One way to solve this problem is to use a hash table. virtual addresses. At the same time, part virtual address, which is the page number, is mapped to a hash table using a hash function. Each page of physical memory here corresponds to one entry in the hash table and inverted page table. Virtual addresses, which have the same hash value, concatenate with each other. Typically, the length of the chain does not exceed two entries.

    Page Size

    OS developers for existing machines rarely have the ability to influence the page size. However, for newly created computers, the decision regarding the optimal page size is relevant. As you would expect, there is no one best size. Rather, there is a set of factors that affect size. Typically, the page size is a power of two from 29 to 214 bytes.