• Search and placement of information in associative memory. Associative memory. Development of associative memory. Small and microcomputers

    A storage device, as a rule, contains many identical storage elements forming a storage array (SM). The array is divided into individual cells; each of them is designed to store binary code, the number of bits in which is determined by the width of the memory sample (in particular, it can be one, half or several machine words). The method of organizing memory depends on the methods of placing and searching for information in the storage array. Based on this feature, they distinguish between address, associative and stack (store) memories.

    Address memory. In memory with an address organization, the placement and search of information in the memory is based on the use of the storage address of a word (number, command, etc.), the address is the number of the memory cell in which this word is located.

    When writing (or reading) a word into a memory, the command initiating this operation must indicate the address (cell number) at which the recording (reading) is performed.

    A typical address memory structure shown in Fig. 4.2, contains a storage array of N n-bit cells and its hardware frame, including the address register RgA, having k(k> log 2 N) digits, information register RgI, address sample block BAV, sense amplifier block BUS, block of bit amplifiers-formators of recording signals BUZ and memory management unit BUP.

    Fig. 4.2. Address memory structure.

    By address code in RgA BAV generates signals in the corresponding memory cell that allow reading or writing a word in the cell.

    The memory access cycle is initiated by an entry into BUP from outside the signal Appeal. The general part of the circulation cycle includes admission to RgA s address bus SHA address and reception BUP and decoding of the control signal Operation, indicating the type of operation requested (read or write).

    Further when reading BAV decrypts the address, sends read signals to the cell specified by the address ZM, in this case, the code of the word written in the cell is read by reading amplifiers BUS and transmitted to RgI. Then in memory with a destructive read (when read, all storage elements of the cell are set to the zero state). information is regenerated in a cell by writing to it from RGI a few words. The read operation is completed by issuing a word from RGI to the output information bus SHIVIkh.

    When recording, in addition to performing the above general part of the access cycle, the word being written is received from the input information bus SHIVh V RgI. The recording itself consists of two operations: clearing the cell (resetting to 0) and the recording itself. For this BAV first selects and clears the cell specified by the address in RgA. Clearing is performed by the word read signals in the cell, but this blocks the sense amplifiers and from BUS V RGI no information is received. Then to the selected BAV cell is written a word from RgI.

    Control unit BUP generates the necessary sequences of control signals that initiate the operation of individual memory nodes. The control signal transmission circuits are shown with thin lines in Fig. 4.2.

    Associative memory. In this type of memory, the search for the necessary information is carried out not by address, but by its content (by associative characteristics). In this case, the search by an associative characteristic (or sequentially by individual bits of this characteristic) occurs in parallel in time for all cells of the storage array. In many cases, associative search can significantly simplify and speed up data processing. This is achieved due to the fact that in memory of this type, the operation of reading information is combined with the execution of a number of logical operations.

    A typical structure of associative memory is shown in Fig. 4.3. The storage array contains N(p + 1) - bit cells. To indicate the cell's occupancy, the nth service digit is used (0 - the cell is free, 1 - a word is written in the cell).

    By input information bus SHIVh to the associative attribute register RgAP goes to 0-and-1 digits p- bit associative query, and into the mask register RgM - search mask code, with the nth digit RgM is set to 0. Associative search is performed only for a set of bits RgAP, which "correspond to 1 in RgM(unmasked bits RgAP). For words in which the digits in the digits coincide with the unmasked digits RgAP, combinational circuit KS sets 1 to the corresponding bits of the match register RgSV and 0 in the remaining digits. So the value j-ro rank in RgSV is determined by the expression

    RgSV(j) =

    Where RgAP[i], RgM[i] and ZM - values ​​of the i-th digit, respectively RgAP, RgM and jth cell ZM.

    Combination scheme for generating the result of an associative appeal FS forms from a word formed in RgSV, signals  0,  1,  2, corresponding to cases of absence of words in ZM, satisfying the associative criterion, the presence of one or more than one such word. For this FS implements the following Boolean functions:

     0 =

     1 = РгСв

     2 =  0  1

    Content generation RgSV and signals  0,  1,  2 by content RgAP, RgM And ZM is called an association control operation. This operation is an integral part of the read and write operations, although it has its own meaning.

    When reading, the association is first checked according to the associative feature in RgAP. Then at  0 = 1 reading is canceled due to the lack of the required information, when  1 = 1 it is read into RGI found word, with  2 = 1 in RGI a word is read from the cell that has the lowest number among the cells marked 1 in RgSv. From RGI the read word is given out on SHIVIkh.

    Rice. 4.3. Structure of associative memory

    When recording, a free cell is first found. To do this, an association check operation is performed when PrgAP= 111. ..10 and RgM== 00... 01. In this case, free cells are marked 1 in RgSv. The free cell with the lowest number is selected for recording. It records the word received from SHIVh V RgI.

    Rice. 4.4. Stack memory

    Using the association control operation, you can, without reading words from memory, determine by content RgSV, how many words are in memory that satisfy an associative criterion, for example, implement queries like how many students in a group have an excellent grade in a given discipline. When using appropriate combinational circuits, quite complex logical operations can be performed in associative memory, such as searching for a larger (smaller) number, searching for words contained within certain boundaries, searching for a maximum (minimum) number, etc.

    Note that associative memory requires storage elements that can be read without destroying the information recorded in them. This is due to the fact that during associative search, reading is performed throughout the entire SM for all unmasked bits and there is no place to store information that is temporarily destroyed by reading.

    stack memory, just like associative, it is addressless. IN stack memory(Fig. 4.4) the cells form a one-dimensional array in which neighboring cells are connected to each other by word transmission bit circuits. A new word is written to the top cell (cell 0), while all previously written words (including the word that was in cell 0) are shifted down to adjacent cells with numbers larger by 1. Reading is possible only from the top (zero) memory cell, and if reading with deletion is performed, all other words in memory are shifted upward to adjacent cells with higher numbers. In this memory, the order of reading words follows the rule: last to arrive - first served. A number of devices of this type also provide for the operation of simply reading a word from the zero cell (without deleting it and shifting the word in memory). Sometimes stack memory is provided with a stack counter SchSt, showing the number of words stored in memory. Signal Schst = 0 corresponds to empty, stack, Schst = N - 1 - full stack.

    Stack memory is often organized using address memory. Stack memory is widely used when processing nested data structures.

    The following paragraphs of the chapter describe various types of addressable storage devices. Associative memory is used in equipment for dynamic distribution of OP, as well as for constructing cache memory.

    multi-level page table requires multiple accesses to main memory, so it takes a lot of time. In some cases, such a delay is unacceptable. The problem of searching acceleration is solved at the computer architecture level.

    Due to the locality property, most programs access a small number of pages over a period of time, so only a small portion of the page table is actively used.

    A natural solution to the acceleration problem is to equip the computer with a hardware device for mapping virtual pages to physical pages without accessing the page table, that is, to have a small, fast cache memory that stores the part of the page table that is needed at the moment. This device is called associative memory, sometimes the term translation lookaside buffer (TLB) is also used.

    One table record per associative memory(single input) contains information about one virtual page: its attributes and the frame in which it is located. These fields correspond exactly to the fields in the page table.

    Because associative memory contains only some of the page table entries, each TLB entry must include a field number virtual page. The memory is called associative because it simultaneously compares the number of the displayed virtual page with the corresponding field in all rows of this small table. Therefore, this type of memory is quite expensive. In line, field virtual page which coincides with the desired value, the page frame number is found. The typical number of entries in the TLB is from 8 to 4096. An increase in the number of entries in associative memory must take into account factors such as the size of the main memory cache and the number of memory accesses per instruction.

    Let's consider the functioning of the memory manager if there is associative memory.

    At the beginning display information virtual page in physical found in associative memory. If the required entry is found, everything is fine, except in cases of violation of privileges, when the request to access the memory is rejected.

    If the required entry in associative memory missing, mapping is done through the page table. One of the entries in the associative memory found entry from the page table. Here we are faced with the traditional replacement problem for any cache (namely, which of the entries in the cache needs to be changed). Design associative memory must organize records in such a way that a decision can be made about which of the old records should be deleted when new ones are added.

    Number of successful page number searches in associative memory in relation to the total number of searches is called hit (match) ratio (proportion, ratio). The term "cache hit rate" is also sometimes used. Thus, hit ratio is the part of links that can be made using associative memory. Accessing the same pages increases the hit ratio. The higher the hit ratio, the lower the average access time to data located in RAM.

    Suppose, for example, that it takes 100 ns to determine the address in the case of a cache miss through the page table, and to determine the address in the case of a cache hit through associative memory– 20 ns. With a 90% hit ratio, the average time to determine the address is 0.9x20+0.1x100 = 28 ns.

    Quite acceptable performance of modern operating systems proves the effectiveness of use associative memory. High probability of finding data in associative memory is associated with the presence of objective properties of data: spatial and temporal locality.

    It is necessary to pay attention to the following fact. When switching the context of processes, you need to ensure that the new process “does not see” in associative memory information related to the previous process, such as cleaning it up. So using associative memory increases context switching time.

    The considered two-level ( associative memory+ page table) the address translation scheme is a clear example of a memory hierarchy based on the principle of locality, as discussed in the introduction to the previous lecture.

    Inverted page table

    Despite the multi-tier organization, storing multiple large page tables is still a challenge. Its importance is especially relevant for 64-bit architectures, where the number of virtual pages is very large. A possible solution is to use inverted page table(inverted page table). This approach is used on PowerPC machines, some Hewlett-Packard workstations, IBM RT, IBM AS/400 and several others.

    This table contains one entry for each page frame of physical memory. It is important that one table is sufficient for all processes. Thus, storing the mapping function requires a fixed portion of main memory, regardless of the architecture's width, size, and number of processes.

    Despite saving RAM, the use inverted table has a significant disadvantage - the entries in it (as in associative memory) are not sorted by ascending virtual page numbers, which complicates address translation. One way to solve this problem is to use a hash table virtual addresses. Moreover, part virtual address, which is the page number, is mapped into a hash table using the hashing function. Each page of physical memory corresponds to one entry in the hash table and inverted page table. Virtual addresses, having the same hash value are concatenated with each other. Typically the length of the chain does not exceed two records.

    Page size

    OS developers for existing machines rarely have the ability to influence page size. However, for newly created computers, the decision regarding the optimal page size is relevant. As you might expect, there is no one best size. Rather, there is a set of factors that influence size. Typically the page size is a power of two from 2 9 to 2 14 bytes.

    General information and classification of memory devices

    Lecture 2. Organization of computer memory.

    Minisupercomputer and superminicomputer.

    Small and microcomputers.

    There are a large number of, relatively speaking, “small” applications of computers, such as automation of production control of products, data processing during experiments, receiving and processing data from a communication line, technological process control, control of machine tools and various digital terminals, small computational engineering tasks .

    Currently, small and microcomputers are built into various “smart” devices (electricity meters, microwave ovens, washing machines, modems, sensors, etc.).

    The classification does not have clear boundaries between the considered types of computers. Recently, two intermediate types have begun to be distinguished.

    Superminicomputers include high-performance computers containing one or more loosely coupled processors connected to a common backbone (common bus). It is typical for a superminicomputer that the speed of performing its arithmetic operations on floating point numbers is significantly lower than the speed of operation determined by a mixture of commands corresponding to information and logical queries. This type includes the IBM chess computer Deep Blue.

    Minisupercomputers are simplified (particularly due to a shorter word) multiprocessor computers, most often with vector and pipeline processing tools, with high speed of performing operations on floating point numbers. This type includes computers with SMP (Symmetric multiprocessor) architecture.

    Memory devices can be classified according to the following criteria: · by type of storage elements · by functional purpose · by type of method of organizing circulation · by the nature of reading · by storage method · by method of organization By type of storage elements Semiconductor Magnetic Capacitor Optoelectronic Holographic Cryogenic By functional purpose RAM SRAM VZU ROM PROM By the method of organizing circulation With sequential search With direct access With immediate access or Address Associative Stack Store By the nature of reading With destruction of information Without destruction of information By method of storage Static Dynamic By method of organization One-coordinate Two-coordinate Three-coordinate Two-three-coordinate

    By memory A computer is a set of devices used for storing, storing and issuing information. The individual devices included in this set are called storage devices or memories of one type or another.



    The performance and computing capabilities of a computer are largely determined by the composition and characteristics of its memory. As part of a computer, several types of memory are used simultaneously, differing in the principle of operation, characteristics and purpose.

    The main operations in memory are storing information in memory - recording and retrieving information from memory - reading. Both of these operations are called access to memory.

    When accessing memory, a certain unit of data is read or written - different for different types of devices. Such a unit could be, for example, a byte, a machine word, or a data block.

    The most important characteristics of individual memory devices (storage devices) are memory capacity, specific capacity, and performance.

    Memory capacity determined by the maximum amount of data that can be stored in it.

    Specific capacitance is the ratio of the storage capacity to its physical volume.

    Recording Density is the ratio of the storage capacity to the storage area. For example, a HDD with a capacity of up to 10 GB has a recording density of 2 Gbit per square meter. inch.

    Memory performance is determined by the duration of the access operation, i.e. the time spent searching for the desired unit of information in memory and reading it ( read access time), or time to search for a place in memory intended for storing a given unit of information, and to record it in memory (access time when writing).

    Duration of memory access (memory cycle time) when reading

    where is the access time, determined by the time interval between the start of the reading operation until the moment when access to a given piece of information becomes possible; - the duration of the physical reading process itself, i.e. the process of detecting and recording the states of the corresponding storage elements or areas of the surface of the information carrier.

    In some memory devices, reading information is accompanied by its destruction (erasure). In this case, the access cycle must contain the operation of restoring (regenerating) the read information in its original place in memory.

    Duration of access (cycle time) when recording

    where is the write access time, i.e., the time from the moment the write access begins until the moment when access to the storage elements (or areas of the media surface) into which recording is made becomes possible; - preparation time spent on bringing storage elements or areas of the surface of an information carrier to their original state for recording a specific unit of information (for example, a byte or a word); - time of entering information, i.e. changing the state of storage elements (parts of the surface of the medium). For the most part

    The duration of the memory access cycle is taken to be

    Depending on the access operations implemented in memory, they are distinguished: a) memory with random access (reading and writing data into memory is possible); b) memory for reading information only (“permanent” or “one-way”). Information is recorded in permanent memory during the process of its manufacture or configuration.

    These types of memory correspond to the terms RAM (random access memory) and ROM (read only memory).

    According to the method of organizing access, memory devices are distinguished with immediate (random), direct (cyclic) and sequential access.

    In memory since direct (arbitrary) With access, the access time, and therefore the access cycle, do not depend on the location of the memory section from which information is read or written to. In most cases, direct access is realized using electronic (semiconductor) memories. In such memories, the access cycle is usually 70 nanoseconds or less. The number of bits read or written in direct access memory in parallel in time during one access operation is called sample width.

    The other two types of memory use slower electromechanical processes. In devices direct access memory, which include disk devices, due to the continuous rotation of the storage medium, the ability to access a certain section of the storage medium for reading or writing is repeated cyclically. In such memory, access time usually ranges from a few fractions of a second to several tens of milliseconds.

    In memory with sequential access sections of the information carrier are sequentially scanned until the desired section of the media takes a certain initial position. A typical example is storage on magnetic tapes, the so-called. streamers ( streamer). In unfavorable cases of information location, access time can reach several minutes.

    A good example of a tape drive is the use of an ARVID adapter with a VHS video recorder. The capacity of this drive is 4GB/180min.

    Storage devices also differ in the functions performed in the computer, depending in particular on the location of the storage device in the computer structure.

    Requirements for memory capacity and speed are contradictory. The higher the performance, the more technically difficult it is to achieve and the more expensive it is to increase memory capacity. The cost of memory makes up a significant portion of the total cost of a computer. Therefore, computer memory is organized in the form of a hierarchical structure of storage devices with different speeds and capacities. In general, a computer contains the following types of memory, in descending order of speed and increasing capacity.

    The hierarchical structure of memory makes it possible to cost-effectively combine the storage of large amounts of information with rapid access to information during processing.

    Table 2.1.

    RAM or main memory(OP) is a device that serves to store information (program data, intermediate and final processing results) directly used in the process of performing operations in the arithmetic-logical unit (ALU) and control unit (CU) of the processor.

    In the process of information processing, there is close interaction between the processor and the OP. Program commands and operands are received from the OP to the processor, on which the operations specified by the command are performed, and intermediate and final processing results are sent from the processor to the OP for storage.

    The characteristics of the operating system directly affect the main indicators of the computer and, first of all, the speed of its operation. Currently, RAM has a capacity from several MB to several GB and an access cycle of about 60 ns or less. OP storage devices are manufactured on integrated circuits with a high degree of integration (semiconductor memories).

    Recently, a number of companies have announced the start of serial production of dynamic memory chips with a capacity of 1GB. The recognized leader is Samsung. The most popular product today can be considered 64 MB chips. In the coming year, 128MB and 256MB chips are expected to be widely used.

    In a number of cases, the speed of the OP turns out to be insufficient, and the machine has to include a SOP (buffer or cache memory of several hundred or thousand kilobytes with an access cycle of several nanoseconds. Such SOPs are executed on static memory chips. The performance of the cache must correspond to the speed of the arithmetic -logical and control devices of the processor. Super-RAM (buffer) memory is used for intermediate storage of program sections and data groups read by the processor from the OP, as working cells of the program, index registers, for storing service information used in controlling the computing process. It acts as a coordinating memory. a link between the high-speed logical devices of the processor and the slower operating system.

    High-speed memories with random access and direct access are used as OP and SOP.

    Typically, the capacity of the OP is insufficient to store all the necessary data in the computer. Therefore, the computer contains several memory devices with direct access on disks (the capacity of one memory device on HDD disks is 1 - 30 GB) and several memory devices with sequential access on magnetic tapes (the capacity of one memory device is 4 - 35 GB).

    RAM, together with SOP and some other specialized processor memories, form internal memory Computer (Fig. 4.1). Electromechanical memory devices form external memory Computers, and that’s why they themselves are called external storage devices(VZU).

    A storage device of any type consists of a storage array that stores information, and blocks that serve to search the array, write and read (and in some cases, regenerate) information.

    A random access memory device, as a rule, contains many identical storage elements forming a storage array (SA). The array is divided into individual cells; each of them is designed to store binary code, the number of bits in which is determined by the width of the memory sample (in particular, it can be one, half or several machine words). The method of organizing memory depends on the methods of placing and searching for information in the storage array. Based on this feature, they distinguish between address, associative and stack (magazine) memories.

    Address memory. In memory with an address organization, the placement and search of information in the memory is based on the use of the storage address of a word (number, command, etc.). The address is the number of the ZM cell in which this word is located.

    When writing (or reading) a word into a memory, the command initiating this operation must indicate the address (cell number) at which the recording (reading) is performed.

    A typical address memory structure contains a storage array of N-bit cells and its hardware frame, which includes an address register RgA having k (k» log N) digits, information register RGI, address sample block BAV, sense amplifier block BUS, block of bit amplifiers-formators of recording signals BUZ and memory management unit BUP.

    By address code in RgA BAV generates signals in the corresponding memory cell that allow reading or writing a word in the cell.

    The memory access cycle is initiated by an entry into BUP from outside the signal Appeal. The general part of the circulation cycle includes admission to RgA from the address bus SHA address and reception BUP and decoding of the control signal Operation, indicating the type of operation requested (read or write).

    Further when reading BAV decrypts the address, sends read signals to the 3M cell specified by the address, while the code of the word written in the cell is read by the BUS read amplifiers and transmitted to RGI. The read operation is completed by issuing a word from RGI to the output information bus SHIVYH.

    When recording, in addition to performing the above general part of the access cycle, the word being written is received from the input information bus SHIVh And RGI. Then to the selected BAV cell is written a word from RGI.

    Control unit BUP generates the necessary sequences of control signals that initiate the operation of individual memory nodes.

    Associative memory. In this type of memory, the search for the necessary information is carried out not by address, but by its content (by associative characteristics). In this case, the search by an associative characteristic (or sequentially by individual bits of this characteristic) occurs in parallel in time for all cells of the storage array. In many cases, associative search can significantly simplify and speed up data processing. This is achieved due to the fact that in memory of this type, the operation of reading information is combined with the execution of a number of logical operations.

    A typical structure of associative memory is shown in Fig. 4.3. The storage array contains N (n+1)-bit cells. To indicate the cell's occupancy, the nth service digit is used (0 - the cell is free, 1 - a word is written in the cell).

    Rice. 2.2. Structure of associative memory

    By input information bus SHIVh to the associative attribute register RgAP bits 0..n-1 receive an n-bit associative request, and the mask register RgM- search mask code, with n-digit RgM is set to 0. Associative search is performed only for a set of bits RgAP, which correspond to 1 in RgM(unmasked bits RgAP). For words in which the digits in the digits coincide with the unmasked digits RgAP, combinational circuit KS sets the corresponding bits of the match register to 1 RgSV and 0 in the remaining digits. So the value j-th category in RgSV is determined by the expression

    РгСв(j)=

    Where RgAP[i], RgM[i] And ZM[j,i] - values i-th category accordingly RgAP, RgM And j-and cells ZM.

    Combination scheme for generating the result of an associative appeal FS forms from a word formed in RgSV, signals a 0 , a 1 , a 2 , corresponding to cases of absence of words in ZM, satisfying the associative criterion, and the presence of one (or more) such words.

    Content generation RgSV and signals a 0 , a 1 , a 2 by content RgAP, RgM And ZM called association control operation. This operation is an integral part of the read and write operations, although it has its own meaning.

    When reading, the association is first checked according to the associative feature in RgAP. Then, when a 0 = 1, the reading is canceled due to the lack of the required information; when a 1 = 1, it is read into RGI found word, with a 2 = 1 in RGI a word is read from the cell that has the lowest number among the cells marked 1 y RgSV. From RGI the read word is given out on SHIVYH.

    When recording, a free cell is first found. To do this, an association check operation is performed when RgAP= 111...10 and RgM= 00...01, with free cells marked 1 in RgSV. The free cell with the lowest number is selected for recording. It records the word received from SHIVh V RGI.

    Using the association control operation, you can, without reading words from memory, determine by content RgSV, how many words are in memory that satisfy an associative criterion, for example, implement queries like how many students in a group have an excellent grade in a given discipline. When using appropriate combinational circuits, quite complex logical operations can be performed in associative memory, such as searching for a larger (smaller) number, searching for words within certain boundaries, searching for the maximum (minimum) number, etc. Associative memory is used, for example, in hardware dynamic distribution of OP.

    Note that associative memory requires storage elements that can be read without destroying the information recorded in them. This is due to the fact that during associative search, reading is carried out throughout ZM for all unmasked bits and there is no place to store information that is temporarily destroyed by reading.

    Stack memory, just like associative, is addressless. Stack memory can be considered as a collection of cells forming a one-dimensional array in which neighboring cells are connected to each other by bit word transfer circuits. A new word is written to the top cell (cell 0), while all previously written words (including the word that was in cell 0) are shifted down to adjacent cells with numbers larger by 1. Reading is possible only from the top (zero) memory cell, and if reading with deletion is performed, all other words in memory are shifted upward to adjacent cells with higher numbers. In this memory, the order in which words are read follows the rule: last in, first out. A number of devices of this type also provide for the operation of simply reading a word from the zero cell (without deleting it and shifting the word in memory). Sometimes stack memory is provided with a stack counter Schst, showing the number of words stored in memory. Signal Schst= 0 corresponds to an empty stack, Schst= N- 1 - full stack.

    Typically, stack memory is organized using address memory. In this case, there is usually no stack counter, since the number of words in memory can be determined by the stack pointer. Stack memory is widely used when processing nested data structures, when executing addressless commands and interrupts.

    Architectural organization of a computer processor

    The processor occupies a central place in the computer architecture, managing the interaction of all the main components that make up the computer. It directly processes information, and software control of this process deciphers and executes program commands, organizes access to random access memory (RAM), and initiates operations when necessary. input/output and operation of peripheral devices, perceives and processes requests coming from both computer devices and the external environment (organization of the interrupt system). The execution of each command consists of the execution of smaller operations - microcommands that perform certain elementary actions. The set of microinstructions is determined by the command system and the logical structure of a particular computer. Thus, each computer command is implemented by a corresponding microprogram stored in a read-only memory device (ROM). In some computers (primarily specialized ones), all or part of the commands are implemented in hardware, which makes it possible to increase their performance at the expense of losing a certain part of the flexibility of the machine’s command system. Both one and the second method of implementing computer commands have their pros and cons.

    The microprogramming language is designed to describe digital devices operating at the register level. It has simple and visual means of describing machine words, registers, buses and other basic elements of a computer. Taking into account the above, the hierarchy of languages ​​for describing the computing process on a computer can be represented, in the general case, at four levels: (1) Boolean operation (functioning of combinational drugs) => (2) micro-instruction (functioning of computer nodes) => (3) command ( functioning of the computer) => (4) operator of the computer (description of the algorithm of the problem being solved). To determine the timing relationships between microcommands, a time unit (cycle) is set during which the longest microcommand is executed. Therefore, the execution of one computer command with clock pulses generated by a special processor device - a clock generator; the clock frequency (measured in MHz) largely determines the speed of the computer. Naturally, for other classes of computers this indicator is differently associated with productivity, determined by such additional factors as.

    Memory access width,

    Sampling time

    Bit depth,

    Architecture of the processor and its coprocessors,

    An enlarged diagram of the central processing unit (CPU) of some formal computer is presented in the figure, which shows only its main blocks: control registers (UR), control unit (CU), ROM, arithmetic-logical unit (ALU), register memory (RM), cache memory and interface unit (IB). Along with the above, the CPU contains a number of other blocks (interrupts, OP protection, monitoring and diagnostics, etc.), the structure and purpose of which are not discussed here. The control unit generates a sequence of control signals that initiate the execution of the corresponding sequence of microcommands (located in ROM) that implement the current command. Along with this, the control unit coordinates the functioning of all computer devices by sending control signals and data exchange to the CPU<->OP, information storage and processing, user interface, testing and diagnostics, etc. Therefore, it is advisable to consider the control unit as a separate CPU block; however, in practice, most control circuits are distributed throughout the computer. They are connected by a large number of control lines that transmit signals to synchronize operations in all computer devices and receive signals about their status. The UR block is designed for temporary storage of control information and contains registers and counters that participate together with the CU in controlling the computing process; the CPU state register, program (SSP), program counter (SC) is a register that stores the address of the executed command in the OP (during the execution period current instruction, its contents are updated to the address of the next instruction), the command register (RK) contains the command being executed (its outputs are connected to control circuits that generate time-distributed signals necessary for executing commands)

    The RP block contains small-volume registers of extra-random memory (higher speed than OP), which allow increasing the performance and logical capabilities of the CPU. These registers are used in instructions by abbreviated register addressing (only register numbers are indicated) and serve to store operands, operation results, as base and index registers, stack pointers, etc. In some CPUs, base and index registers are part of the UT block, like As a rule, RP is performed in the form of high-speed semiconductor integrated memory devices

    The ALU block is used to perform arithmetic and logical operations on data coming from the OP and stored in the RP, and operates under the control of the control unit. The ALU performs arithmetic operations on binary numbers with fixed and floating points, on decimal numbers, and processes symbolic information on words of fixed and variable length. Logical operations are performed on individual bits, groups of bits, bytes and their sequences. The type of operation performed by the ALU is determined by the current command of the currently running program; more precisely, the ALU is used to perform any operation assigned to it by the control unit. In the general case, the information processed by a computer consists of words containing a fixed number n of bits (for example, n = 8, 16, 32, 64, 128 bits). In this case, the ALU must be able to perform operations on n-bit words; the operands come from the OP to the ALU registers, and the control unit indicates the operation that needs to be performed on them; the result of each arithmetic-logical operation is stored in a special adder register, which is the main register for arithmetic-logical operations.

    The adder is connected to gate circuits to perform the necessary operations on its contents and the contents of other registers. Some computers have several adders; if the number is greater than 4, they are allocated to a special group of general purpose registers (GPR). Structurally, the ALU is executed on one or several LSIs/VLSIs, while the CPU may have one universal-purpose ALU or several specialized ones for certain types of operations. In the latter case, the structural complexity of the CPU increases, but its performance increases due to specialization and simplification of the calculation schemes for individual operations. This approach is widely used in modern general-purpose computers and supercomputers to increase their performance. Despite the different classes of computers, their ALUs use general principles for performing arithmetic-logical operations. The differences relate to the circuit design solutions for organizing the ALU and the principles of implementation of the operation, ensuring the acceleration of their execution.

    The interface unit (IB) ensures the exchange of information between the CPU and the OP and the protection of sections of the OP from unauthorized access for the current program, as well as communication between the CPU and peripheral devices and other external devices (DW), which can be other processors and computers. . In particular, the IB contains two registers that provide communication with the OP - the memory address register (MAR) and the memory data register (RDR). The first register is used to store the address of the OP cell with which data is exchanged, and the second contains the actual exchange data. The control and diagnostic unit (MCD) is designed to detect failures and failures of CPU nodes, restore the operation of the current program after failures and localize faults in the event of failures.

    Taking into account the above, let us present a general scheme for executing programs on a processor. The execution of a program located in the OP begins with the address of its first command being sent to the CS, the contents of the CS are sent to the RAP, and a read control signal is sent to the OP. After some time (corresponding to the access time to the OP), the addressed word (in this case, the first command of the program) is extracted from the OP and loaded into the RDP, then the contents of the RDP are sent to the SC. At this stage, the command is ready for decoding its command and execution. If the instruction contains an operation that must be performed by the ALU, then the required operands must be obtained. If the operand is in the OP (and it may also be in the UR), it must be fetched from memory. To do this, the address of the operand is sent to the RPA and the read cycle begins. The operand selected from memory in the RPA can be transferred to the ALU. Having thus selected one or more operands, the ALU can perform the required operation, storing its result in one of the RONs. If the result of an operation needs to be stored in the OP, it must be sent to the RAP. The address of the cell in which the result must be placed is sent to the RAP and the recording cycle begins. Meanwhile, the contents of the CS are incremented, indicating the next command to be executed. Thus, as soon as the execution of the current command is completed, fetching for the execution of the next command of the program can immediately begin.

    In addition to transferring data between the OP and the CPU, it is necessary to ensure the exchange of data with the host, which is done by machine instructions that control input/output. The natural order of program execution may be disrupted when an interrupt signal is received. An interrupt is a service request that is handled by the CPU executing the corresponding interrupt routine (ISR). Since an interrupt and its processing can change the internal state of the CPU, it is stored in the OP before the POP starts operating. State preservation is achieved by sending the contents of the RK, UR and some control information to the OP. Once the POP completes, the CPU state is restored, allowing execution of the interrupted program to continue.

    A random access memory device, as a rule, contains many identical storage elements forming a storage array (SA). The array is divided into individual cells; each of them is designed to store binary code, the number of bits in which is determined by the width of the memory sample (in particular, it can be one, half or several machine words). The method of organizing memory depends on the methods of placing and searching for information in the storage array. Based on this feature, they distinguish between address, associative and stack (magazine) memories.

    Address memory. In memory with an address organization, the placement and search of information in the memory is based on the use of the storage address of a word (number, command, etc.). The address is the number of the ZM cell in which this word is located.

    When writing (or reading) a word into a memory, the command initiating this operation must indicate the address (cell number) at which the recording (reading) is performed.

    A typical address memory structure contains a storage array of N-bit cells and its hardware frame, which includes an address register RgA having k (k» log N) digits, information register RGI, address sample block BAV, sense amplifier block BUS, block of bit amplifiers-formators of recording signals BUZ and memory management unit BUP.

    By address code in RgA BAV generates signals in the corresponding memory cell that allow reading or writing a word in the cell.

    The memory access cycle is initiated by an entry into BUP from outside the signal Appeal. The general part of the circulation cycle includes admission to RgA from the address bus SHA address and reception BUP and decoding of the control signal Operation, indicating the type of operation requested (read or write).

    Further when reading BAV decrypts the address, sends read signals to the 3M cell specified by the address, while the code of the word written in the cell is read by the BUS read amplifiers and transmitted to RGI. The read operation is completed by issuing a word from RGI to the output information bus SHIVYH.

    When recording, in addition to performing the above general part of the access cycle, the word being written is received from the input information bus SHIVh And RGI. Then to the selected BAV cell is written a word from RGI.

    Control unit BUP generates the necessary sequences of control signals that initiate the operation of individual memory nodes.

    Associative memory. In this type of memory, the search for the necessary information is carried out not by address, but by its content (by associative characteristics). In this case, the search by an associative characteristic (or sequentially by individual bits of this characteristic) occurs in parallel in time for all cells of the storage array. In many cases, associative search can significantly simplify and speed up data processing. This is achieved due to the fact that in memory of this type, the operation of reading information is combined with the execution of a number of logical operations.


    A typical structure of associative memory is shown in Fig. 4.3. The storage array contains N (n+1)-bit cells. To indicate the cell's occupancy, the nth service digit is used (0 - the cell is free, 1 - a word is written in the cell).

    Finding the frame number corresponding to the desired page in a multi-level page table requires several accesses to main memory, so it takes a lot of time. In some cases, such a delay is unacceptable. The problem of searching acceleration is solved at the computer architecture level.

    Due to the locality property, most programs access a small number of pages over a period of time, so only a small portion of the page table is actively used.

    A natural solution to the acceleration problem is to equip the computer with a hardware device for mapping virtual pages to physical pages without accessing the page table, that is, to have a small, fast cache memory that stores the part of the page table that is needed at the moment. This device is called associative memory, sometimes also called translation lookaside buffer (TLB).

    One table entry in associative memory (one input) contains information about one virtual page: its attributes and the frame in which it resides. These fields correspond exactly to the fields in the page table.

    Since the associative memory contains only some of the page table entries, each TLB entry must include a virtual page number field. The memory is called associative because it simultaneously compares the number of the displayed virtual page with the corresponding field in all rows of this small table. Therefore, this type of memory is quite expensive. The line whose virtual page field matches the desired value contains the page frame number. The typical number of TLB entries is from 8 to 4096. Increasing the number of associative memory entries must take into account factors such as the size of the main memory cache and the number of memory accesses per instruction.

    Let's consider the functioning of the memory manager in the presence of associative memory.

    First, information about the mapping of a virtual page to a physical page is found in associative memory. If the required entry is found, everything is fine, except in cases of violation of privileges, when the request to access the memory is rejected.

    If the desired entry is not in the associative memory, mapping is done through the page table. One of the entries in the associative memory is replaced by a found entry from the page table. Here we are faced with the traditional replacement problem for any cache (namely, which cache entry needs to be changed). An associative memory design must organize records in such a way that a decision can be made about which of the old records should be deleted when new ones are added.

    The number of successful searches for a page number in associative memory in relation to the total number of searches is called hit (match) ratio (proportion, ratio). The term "cache hit rate" is also sometimes used. Thus, the hit ratio is the part of the links that can be made using associative memory. Accessing the same pages increases the hit ratio. The higher the hit ratio, the lower the average access time to data located in RAM.

    Suppose, for example, that it takes 100 ns to determine the address in the case of a cache miss through the page table, and 20 ns to determine the address in the case of a cache hit through associative memory. With a 90% hit ratio, the average time to determine the address is 0.9x20 + 0.1x100 = 28 seconds.

    The quite acceptable performance of modern operating systems proves the effectiveness of using associative memory. The high probability of finding data in associative memory is associated with the presence of objective properties of data: spatial and temporal locality.

    It is necessary to pay attention to the following fact. When switching the context of processes, it is necessary to ensure that the new process “does not see” information related to the previous process in associative memory, for example, to clear it. Thus, the use of associative memory increases context switching time.

    The considered two-level (associative memory + page table) address conversion scheme is a striking example of a memory hierarchy based on the use of the principle of locality, as discussed in the introduction to the previous lecture.