• OOP database is an object-oriented database. Multi-level client-server systems

    ]. This allows you to separate the functions of storing, processing and presenting data for more efficient use of the capabilities of servers and clients.

    Among the multi-tier client-server architecture, the most common is the three-tier architecture ( three-tier architecture, three-tier), which assumes the presence of the following application components: a client application (usually called a “thin client” or terminal) connected to application server, which in turn is connected to database server [ , ].

    rice. 5.4.


    Rice. 5.4. Representation of a multi-tier client-server architecture

    • A terminal is an interface (usually graphical) component that represents the first level, the actual application, for the end user. The first level should not have direct connections with the database (for security reasons), be loaded with core business logic (for scalability requirements) and store the application state (for reliability requirements). The simplest business logic can be, and usually is, placed at the first level: authorization interface, encryption algorithms, checking entered values ​​for validity and format compliance, simple operations (sorting, grouping, counting values) with data already loaded on the terminal.
    • Application Server located on the second level. The second level contains most of the business logic. Outside of it remain fragments exported to terminals, as well as stored procedures and triggers immersed in the third level.
    • Database server provides data storage and is moved to the third level. Typically this is a standard relational or object-oriented DBMS. If the third level is a database along with stored procedures, triggers and a schema that describes the application in terms relational model, then the second level is constructed as software interface, which links client components with database application logic.

    In the simplest configuration physically application server can be combined with database server on one computer to which one or more terminals are connected via a network.

    In the “correct” (in terms of security, reliability, scaling) configuration database server located on a dedicated computer (or cluster) to which one or more application servers, to which, in turn, terminals are connected via the network.

    The advantages of this architecture are [ , , , ]:

    • client software does not require administration;
    • scalability;
    • configurability – isolation of levels from each other allows you to quickly and easily reconfigure the system when failures occur or during scheduled maintenance at one of the levels;
    • high security;
    • high reliability;
    • low requirements for channel (network) speed between terminals and application server;
    • low performance requirements and technical specifications terminals, as a result of a reduction in their cost.
    • the complexity of the server part and, as a result, the costs of administration and maintenance are growing;
    • higher complexity of creating applications;
    • more difficult to deploy and administer;
    • high performance requirements application servers And database server, and, therefore, the high cost of server equipment;
    • high requirements for channel (network) speed between database server And application servers.
    1. Performance;
    2. Presentation layer;
    3. Logic level;
    4. Data Layer;
    5. Data.


    Rice. 5.5. Five layers of multi-tier client-server architecture

    The presentation includes all information directly displayed to the user: generated HTML pages, style sheets, images.

    The presentation layer covers everything that has to do with the user's communication with the system. The main functions of the presentation layer include displaying information and interpreting user input and converting them into appropriate operations in the context of logic and data.

    The logic level contains the main functions of the system designed to achieve its goal. These functions include calculations based on input and stored data, validating all data elements and processing commands from the presentation layer, and passing information to the data layer.

    The data access layer is a subset of functions that provide interaction with third-party systems that perform tasks for the benefit of the application.

    System data is usually stored in a database.

    5.1.6. Distributed Systems Architecture

    This type of system is more complex in terms of system organization. The essence distributed systems is to store local copies of important data.

    Schematically, such an architecture can be represented as shown in Fig. 5.6.


    Rice. 5.6.

    More than 95% of the data used in enterprise management can be placed on one personal computer, allowing it to operate independently. The stream of corrections and additions generated on this computer is negligible compared to the amount of data used. Therefore, if you store continuously used data on the computers themselves, and organize the exchange of corrections and additions to the stored data between them, then the total transmitted traffic will sharply decrease. This makes it possible to reduce the requirements for communication channels between computers and more often use asynchronous communication, and thanks to this, create reliably functioning distributed information systems that use individual elements unstable connection like the Internet, mobile communications, commercial satellite channels. And minimizing traffic between elements will make the cost of operating such a connection quite affordable. Of course, the implementation of such a system is not elementary, and requires solving a number of problems, one of which is timely data synchronization.

    Each workstation is independent, containing only the information it needs to work with, and the relevance of data throughout the system is ensured through continuous exchange of messages with other workstations. Message exchange between workstations can be implemented in various ways, from sending data via email to transmitting data over networks.

    Client-server architecture(client-server architecture) is the concept of an information network in which the bulk of its resources are concentrated in servers serving their clients. The architecture in question defines two types of components: servers and clients.

    Server - is an object that provides service to other network objects upon their requests. Service is a customer service process.

    Figure Client-server architecture

    The server works on orders from clients and manages the execution of their tasks. After each job is completed, the server sends the results to the client that sent the job.

    The service function in the client-server architecture is described by a set of application programs, in accordance with which various application processes are performed.

    The process that causes service function using certain operations is called client. This could be a program or a user. Clients- these are workstations that use server resources and provide convenient user interfaces. User Interfaces These are the procedures for how a user interacts with a system or network.

    Figure Client-server model

    The client is the initiator and uses email or other server services. In this process, the client requests a service, establishes a session, gets the results it wants, and reports completion.

    IN networks with dedicated file server on a dedicated standalone PC a server network operating system is installed. This PC becomes server. Software ( BY), installed on a workstation, allows it to exchange data with the server. The most common network operating systems are:

    In addition to the network operating system, network applications are needed to take advantage of the network's benefits.

    Server-based networks have best characteristics and increased reliability. The server owns the main ones network resources, to which other workstations access.

    In modern client-server architecture, there are four groups of objects: clients, servers, data and network services. Clients are located in systems at user workstations. Data is mainly stored on servers. Network Services are shared servers and data. In addition, services manage data processing procedures.

    Network client - server architecture have the following advantages:

    Allows you to organize networks with a large number of workstations;

    Provide centralized management of user accounts, security and access, which simplifies network administration;


    Efficient access to network resources;

    The user needs one password to log into the network and to gain access to all resources to which user rights apply.

    Along with the advantages of the client-server network, there are also a number of disadvantages:

    A server malfunction can make the network inoperable, or at least a loss of network resources;

    Require qualified personnel for administration;

    They have a higher cost of networks and network equipment.

    Home > Document

    Multi-level "client-server"

    Multitier client-server architecture is a type of client-server architecture in which the data processing function is carried out on one or more separate servers. This allows you to separate the functions of storing, processing and presenting data for more efficient use of the capabilities of servers and clients. Among the multi-tier client-server architecture, the most common is a three-tier architecture (three-tier architecture), which assumes the presence of the following application components: a client application (usually called a “thin client” or terminal) connected to an application server, which in turn is connected to the server databases. Terminal is an interface (usually graphical) component that represents the first level, the actual application for the end user. The simplest business logic can be, and usually is, placed on the first level: authorization interface, encryption algorithms, checking input values ​​for validity. The application server is located on the second level. The second level contains most of the business logic. Outside of it remain fragments exported to terminals, as well as stored procedures and triggers submerged in the third level. The database server provides data storage and is moved to the third level. Typically this is a standard relational or object-oriented DBMS. If the third level is a database along with stored procedures, triggers and a schema that describes the application in terms of the relational model, then the second level is built as a programming interface that connects client components with the application logic of the database. In the simplest configuration, the application server can be physically combined with a database server on one computer, to which one or more terminals are connected via the network. In the “correct” (from the point of view of security, reliability, scaling) configuration, the database server is located on a dedicated computer (or cluster), to which one or more application servers are connected via the network, to which, in turn, terminals are connected via the network. The advantages of this architecture are:
      client software does not require administration; scalability; configurability – isolation of levels from each other allows you to quickly and easily reconfigure the system when failures occur or during scheduled maintenance at one of the levels; high security; high reliability; low requirements for channel (network) speed between terminals and application server; low requirements for the performance and technical characteristics of terminals, resulting in a reduction in their cost.
    Cons:
      the complexity of the server part and, as a result, the costs of administration and maintenance are growing; higher complexity of creating applications; more difficult to deploy and administer; high requirements for the performance of application servers and database servers, and, therefore, the high cost of server equipment; high requirements for channel (network) speed between the database server and application servers.

    26. Optimization of work with a database in the measurement mode.

    Concurrency Control Features Concurrency Control - this is how different DBMSs differ. This is what distinguishes a DBMS from a file system and one DBMS from another. It is important to a programmer that his database application works correctly under conditions of concurrent access, and this is something that people constantly forget to check. Techniques that work well under sequential access conditions work much worse when used simultaneously in multiple sessions. If you do not know thoroughly how concurrent access control mechanisms are implemented in a particular DBMS, then: the integrity of the data will be violated; the application will run slower than intended, even with a small number of users; the ability to scale to a large number of users will be lost. Problems simultaneous access is the most difficult to identify - the difficulties are comparable to debugging a multi-threaded program. A program can work great in the controlled, artificial environment of a debugger, but constantly crash in the “real world.” For example, under heavy access conditions, it may be that two threads are simultaneously modifying the same data structure. These types of errors are very difficult to identify and correct. Implementation of blocking The DBMS uses locks so that at any given time certain data can be changed by only one transaction. Simply put, locks are a mechanism for ensuring concurrent access. In the absence of a specific blocking model that prevents simultaneous changes. Principles of blocking in the Oracle DBMS. Oracle locks data at the row level and only when modified. Locks are never escalated to the block or table level. Oracle never locks data for read purposes. During normal reading, no locks are placed on rows. The session writing the data does not block the sessions reading the data. Let me repeat: read operations are not blocked by write operations. This is fundamentally different from almost all other DBMSs, in which read operations are blocked by write operations. A data writer session is locked only if another writer session has already locked the row that is intended to be modified. A read session never blocks a write session. By locking the resource we are trying to reserve, we ensure that no other session at the same time changes the resource's usage plan. He will have to wait until our transaction is committed - after which he will be able to see the reservation made in it. The possibility of overlapping plans is thus eliminated. The developer must understand that in a multi-user environment it is sometimes necessary to use the same techniques as in multi-threaded programming. In this case, the FOR UPDATE construct works like a semaphore. It provides sequential access to a specific row in the resource table, ensuring that two sessions do not reserve a resource at the same time. This approach allows for a high degree of concurrency because there can be thousands of resources to reserve, and we just ensure that sessions modify a particular resource one at a time. This is one of the few cases where it is necessary to manually lock data that should not be changed. You need to be able to recognize situations when it is necessary, and, equally important, when it is not necessary (an example of when it is not necessary is given below). In addition, this technique does not block other sessions from reading the resource, as might happen in other DBMSs, which ensures high scalability. Multivariate - this is the mechanism by which the Oracle DBMS provides: read consistency for queries: queries produce consistent results at the time they start executing; non-blocking queries: queries are not blocked by sessions in which data is changing, as is the case in other DBMSs. These are two very important concepts in Oracle DBMS. The term multivariance comes from the fact that, in fact, the Oracle DBMS can simultaneously support multiple versions of data in the database. Understanding the essence of multivariance, you can always understand the results obtained from the database. For example, we created a test table T and filled it with data. We have opened a cursor for this table. We didn't select data with this cursor, we just opened it. Remember that when you open a cursor, the Oracle server does not "respond" to the request; it doesn't copy the data anywhere when the cursor is opened (imagine how long it would take to open a cursor for a table with a billion rows otherwise). The cursor simply opens and provides query results as the data is accessed. In other words, it will read data from the table while retrieving it through the cursor. In the same (or another) session, we then delete all data from the table. Moreover, we even commit (COMMIT) this deletion. There are no more lines - are there? In fact, they can be retrieved using the cursor. In fact, the result set returned by the OPEN command was predetermined when the cursor was opened. We did not read a single block of table data when opening the cursor, but the result turned out to be firmly fixed. We won't know this result until we retrieve the data, but from our cursor's perspective, the result is immutable. It's not that Oracle copied all this data to another location when the cursor was opened; The delete operator saved the data by placing it in a data area called the undo segment. This is what read consistency is all about, and if you don't understand how Oracle's multivariance scheme works and what its consequences are, not only will you not be able to take full advantage of the Oracle DBMS, but you will also not be able to create correct Oracle applications that guarantee data integrity. Therefore, if you are used to implementing consistency and concurrency of queries in other DBMSs or have simply never encountered such concepts (you have no real experience working with a DBMS), then now you understand how important their understanding is for your work. To make the most of the potential capabilities of an Oracle DBMS, it is necessary to understand these problems and how to solve them specifically in Oracle, and not in other DBMSs.

    30. Add-on server. Addendum server structure. Registration of the add-on server

    The application server encapsulates most of the distributed application's business logic and support. client access to the database. The main part of the application server is the remote data module. First, like a regular data module, it is a platform for hosting non-visual data access components and provider components. The components of connections, transactions and components encapsulating data sets placed on it provide a three-tier application with communication with the database server. These can be sets of components for ADO, BDE, InterBase Express, dbExpress technologies. Secondly, the remote data module implements the main functions of the application server by providing clients with the AppServer interface or its descendant. To do this, the remote data module must contain the required. number of TDataSetProvider component providers. These components pass data packets to the client application, more precisely to the TdientDataSet components, and also provide access to interface methods.
    Delphi includes remote data modules. To create them, use the Multitier, WebSnap and WebServices pages of the Delphi Repository.
    Remote Data Module - a remote data module encapsulating the Automation server. Used to organize connections via DCOM, HTTP, sockets (see Chapter 21).
    Transactioiial Data Module is a remote data module that encapsulates the MTS (Microsoft Transaction Server) server.
    SOAP Server Data Module - a remote data module that encapsulates a SOAP server (Simple Object Access protocol).
    WebSnap Data Module is a remote data module that uses Web services and a Web browser as a server.
    Each component that encapsulates a set of data to be passed to a client must have a provider component associated with it in the data module.

    34. SQL features of high level. Categories of SQL statements.

    SQL is not a procedural language, and, by and large, it is not a programming language at all. PL/SQL is a procedural step-by-step programming language that encapsulates the SQL language. The result is a well-developed third generation programming language (3GL), similar to C++, Pascal, etc. At its core, PL/SQL is block-oriented. PL/SQL has strict variable scope rules, supports parameterized procedure and function calls, and so on it inherited from the ADA language such a tool as packages. PL/SQL provides for strict type control; all type incompatibility errors are detected at the compilation and execution stage. Explicit and implicit type conversion is also supported. PL/SQL - supports complex data structures, and overloading of subroutines is also provided to create a flexible application programming environment. PL/SQL language - has an Exception Handler element (exception handler) for synchronous error handling on stage of PL/SQL code execution. Also strictly speaking, the PL/SQL language is not object-oriented, although it has some tools for creating and working with database objects at the level of object-oriented programming languages. PL/SQL is a machine independent language programming. Operator is a symbol indicating an action that is performed on one or more expressions. Categories of operators used by SQL Server: Arithmetic Operators, Logical Operators, Assignment Operator, Scope Resolution Operator, Bitwise Operators, Set Operators, Comparison Operators, String Concatenation Operator, Compound Operators, Unary Operators

    36) SQL logical operators. Efficient query building

    Logical operators AND, OR and NOT. The AND and OR operators are used to combine search terms in WHERE clauses. The NOT operator reverses the value of a search term. The AND operator connects two conditions and returns TRUE only if both conditions are true. For example, this query will return only one row where the customer ID (BusinessEntityID) starts with the number 1 and the store name starts with "Bicycle":SELECT BusinessEntityID, NameFROM AdventureWorks2008R2.Sales.StoreWHERE BusinessEntityID LIKE "1%" AND Name LIKE N" Bicycle%";The OR operator also connects two conditions, but returns TRUE if at least one of the conditions is true. The following query returns 349 rows where either the customer ID starts with 1 or the store name starts with "Bicycle": SELECT BusinessEntityID, NameFROM AdventureWorks2008R2.Sales.StoreWHERE BusinessEntityID LIKE "1%" OR Name LIKE N"Bicycle%"; Optimization: Optimizing SQL QueriesMore and more applications are using databases. More and more data has to be stored and processed. If an application is slow, programmers, users and administrators primarily blame poor network performance, bad server hardware and each other :). And they forget about optimization. And this will continue until the application is subjected to severe analysis to improve performance. One way to increase the speed of an application is to optimize SQL queries. This method is good because you don’t have to go into the jungle of SQL server optimization. It's easier to avoid ineffective SQL queries. But if this has already happened, look for ways out of the current unpleasant situations. General optimization Each SQL operation has a so-called “utility coefficient” - the level of efficiency of this operation. The higher the score, the more “useful” the operation is, which means the SQL query is executed faster. Almost any condition consists of two operands and the operation sign between them. Examples To better understand the tables, consider an example of calculating the score of a query... WHERE smallint_column = 123455 points for the field on the left (smallint_column), 2 points for the exact numeric operand (smallint_column), 10 points for the comparison operation (=) and 10 points for the value on the right (12345 ). In total we received 27 points. Now let's look at more complex example:... WHERE char_column >= varchar_column || "x"5 points for the left field (char_column), 0 points for the character operand (char_column), 5 points for the greater than or equal to operation (>=), 3 points for the Boolean expression (varchar_column || "x"), 0 points for character operand (varchar_column). As a result, we get 13 points. Naturally, such calculations do not have to be carried out for each request. But when the question arises about the speed of the conditions of a particular query, it can be clarified using these two tables. The speed of the request is also affected by the amount of data selected and additional directives, which we will consider below. Also keep in mind that the "utility factor" calculation is not a one-size-fits-all in a universal way optimization. It all depends on the specific situation. The main law when optimizing queries is the law of transformation. It doesn’t matter how we present the condition, the main thing is that the result remains the same. Let's look at the example again. There is a query: ... WHERE column1< column2 AND column2 = column3 AND column1 = 5. Используя перестановку, получаешь запрос: …WHERE 5 < column2 AND column2 = column3 AND column1 = 5. Результат запроса будет один и тот же, а продуктивность разной, потому что использование точного значения (5) влияет на производительность.Если ты изучал С или С++, то знаешь, что выражение x=1+1-1-1 во время компиляции станет x=0. Удивительно, что лишь некоторые БД способны выполнять такие операции. При выполнении запроса БД будет выполнять операции сложения и вычитания и тратить твое драгоценное время. Поэтому всегда лучше сразу рассчитывать такие выражения там, где это возможно. Не … WHERE a - 3 = 5, а … WHERE a = 8.Еще одна возможность оптимизировать запрос - придерживаться general idea creating conditions in SQL. In other words, the condition should look like:<колонка> <операция> <выражение>. For example, the query "... WHERE column1 - 3 = -column2" is better converted to: ... WHERE column1 = -column2 + 3. And these optimization techniques work almost always and everywhere. Optimizing Conditions Now it's time to optimize the conditional SQL statements themselves. Most queries use the SQL WHERE clause, so by optimizing the conditions, you can achieve significant query performance. At the same time, for some reason, only a small part of database applications use condition optimization. AND It is obvious that in a series of several statements AND conditions should be arranged in order of increasing probability of the truth of a given condition. This is done so that when checking the conditions, the database does not check the rest of the condition. These recommendations do not apply to Oracle databases, where conditions are checked from the end. Accordingly, their order should be reversed - in descending order of probability of truth. ORThe situation with this operator is exactly the opposite of the situation with AND. The conditions must be arranged in descending order of probability of being true. Microsoft strongly recommends using this method when building queries, although many do not even know about it or, at least, do not pay attention to it. But again, this does not apply to the Oracle database, where the conditions should be arranged in order of increasing probability of truth. Another condition for optimization can be considered the fact that if identical columns are located next to each other, the query is executed faster. For example, the query ".. WHERE column1 = 1 OR column2 = 3 OR column1 = 2" will be slower than the query "WHERE column1 = 1 OR column1 = 2 OR column2 = 3". Even if the probability of truth of the condition column2 = 3 is higher than column1 = 2. AND + ORBack in school, I was told about the distributive law. It states that A AND (B OR C) is the same as (A AND B) OR (A AND C). It has been experimentally established that a query like "...WHERE column1 = 1 AND (column2 = "A" OR column2 = "B")" is somewhat faster than "...WHERE (column1 = 1 AND column2 = "A" ") OR (column1 = 1 AND column2 = "B")". Some databases themselves can optimize queries of this type, but it’s better to be safe. NOTThis operation should always be reduced to a more “readable” form (within reasonable limits, of course). Thus, the query "...WHERE NOT (column1 > 5)" is converted to "...WHERE column1<= 5". Более сложные условия можно преобразовать используя правило де Моргана, которое ты тоже должен был изучить в школе. Согласно этому правилу NOT(A AND B) = (NOT A) OR (NOT B) и NOT(A OR B) = (NOT A) AND (NOT B). Например, условие "...WHERE NOT (column1 >5 OR column2 = 7)" is converted to a simpler form: ...WHERE column1<= 5 AND column2 <>7. INMany people naively believe that the query "...WHERE column1 = 5 OR column1 = 6" is equivalent to the query "...WHERE column1 IN (5, 6)". Actually this is not true. The IN operation is much faster than the OR series. Therefore, you should always replace OR with IN whenever possible, although some databases perform this optimization themselves. Where a series of sequential numbers is used, IN should be changed to BETWEEN. For example, "...WHERE column1 IN (1, 3, 4, 5)" is optimized to look like: ...WHERE column1 BETWEEN 1 AND 5 AND column1<>2. And this request is really faster. LIKEThis operation should only be used when absolutely necessary, because it is better and faster to use searches based on full-text indexes. Unfortunately, I have to direct you to the World Wide Web for information about searching. CASE This function itself can be used to improve the speed of a query when it has more than one call to a slow function in a condition. For example, to avoid calling slow_function() again in the query "...WHERE slow_function(column1) = 3 OR slow_function(column1) = 5", you would use CASE:... WHERE 1 = CASE slow_function(column1)WHEN 3 THEN 1WHEN 5 THEN 1END Sort ORDER BY used for sorting, which is known to take time. The larger the amount of data, the longer the sorting will take, so it is necessary to optimize it. Three factors affect the sorting speed in queries:
      number of selected records; number of columns after the ORDER BY operator; length and type of columns specified after the ORDER BY statement.
    The most resource-intensive sorting is string sorting. Although text fields have a fixed length, the length of the contents of these fields can vary (within the size of the field). Therefore, it is not surprising that sorting a VARCHAR(100) column will be slower than sorting a VARCHAR(10) column (even if the data is the same). This happens because when sorting, the database itself allocates memory for its operations in accordance with the maximum field size, regardless of the contents. Therefore, when declaring fields, you should always use the size that is needed and do not allocate extra bytes in reserve. On Windows computers, INTEGER fields occupy 32 bits, and SMALLINT fields occupy 16 bits. It is logical to assume that sorting fields of type SMALLINT should be faster. In fact, INTEGER sorting is faster than SMALLINT sorting. Also, INTEGER sorting is faster than CHAR. Sorting characters also has its own nuances, the description of which will take more than one article. It can be fast and incorrect, or slow but with fewer errors. Sorting optimization is carried out for a specific situation, so no one can give universal recommendations. GroupingOperation GROUP BY used to determine a subset of the result of a query, and also to apply to that subset aggregate functions. Let's look at some of the most effective methods for optimizing the grouping operation. The first thing to remember is to use as few columns as possible for grouping. You should also avoid unnecessary conditions. For example, in a query SELECT secondary_key_column, primary_key_column, COUNT(*) FROM Table1 GROUP BY secondary_key_column, primary_key_column the secondary_key_column column is completely unnecessary. The reason is simple: secondary_key_column is a unique field, it may not have NULL values, which means that some data may simply be lost. But if you remove secondary_key_column from the GROUP BY section, some databases may throw an error stating that it is impossible to specify this field if it is not declared in the GROUP BY section. To solve this problem, you can write a query like this: SELECT MIN(secondary_key_column), primary_key_column, COUNT(*) FROM Table1 GROUP BY primary_key_column. This query is faster and more “correct” from the point of view of query construction. In most databases, the WHERE and HAVING operations are not equivalent and are not performed in the same way. This means that the following two queries are logically the same, but run at different speeds: SELECT column1 FROM Table1 WHERE column2 = 5 GROUP BY column1 HAVING column1 > 6 SELECT column1 FROM Table1 WHERE column2 = 5 AND column1 > 6 GROUP BY column1 The second query is faster than first. HAVING should be used in those rare cases where the condition (in the example column1 > 6) is difficult to express without sacrificing performance. If grouping is required, but without using aggregate functions (COUNT(), MIN(), MAX, etc.), it is reasonable use DISTINCT. So, instead of SELECT column1 FROM Table1 GROUP BY column1, it is better to use SELECT DISTINCT column1 FROM Table1. When using MIN() and MAX(), keep in mind that these functions work better separately. This means that they are better used in separate queries or in queries using UNION. When using the SUM() function, better performance can be achieved by using SUM(x + y) rather than SUM(x) + SUM(y). For subtraction, the opposite is better: SUM(x) – SUM(y) is faster than SUM(x – y). Table joins (JOINS) This is where it’s difficult to say anything about optimization, but when using JOIN. The fact is that the speed of performing such operations largely depends on the organization of the table itself: the use of foreign-key, primary-key, the number of nested connections, etc. Sometimes better performance can be achieved by using nested loops directly in the program. Sometimes JOINs work faster. There is no definitive advice on how to use different methods of joining tables. It all depends on the specific case and database architecture. Subqueries (SUBQUERIES) Previously, not all databases could boast of supporting subqueries, but now almost any modern database can do this. Even MySQL, which has been implementing subqueries for several years, has finally acquired their support. The main problem when optimizing subqueries is not optimizing the query code itself, but choosing the right way to implement the request. Problems that use subqueries can also be solved using nested loops or JOINs. When you use JOIN, you give the database the opportunity to choose the mechanism by which the tables will be joined. If you use subqueries, then you explicitly indicate the use of nested loops. What to choose? Below are the arguments in favor of one method or another. Choose for yourself depending on the situation. Advantages of JOIN:
      If a query contains a WHERE clause, the built-in database optimizer will optimize the query as a whole, while if subqueries are used, queries will be optimized separately. Some databases work more efficiently with JOINs than with subqueries (for example, Oracle). After a JOIN, the information will appear in the general “list”, which cannot be said about subqueries.
    Advantages of SUBQUERIES:
      Subqueries allow more free conditions. Subqueries can contain GROUP BY, HAVING, which is much more difficult to implement in JOINs. Subqueries can be used with UPDATE, which is not possible with JOINs. Recently, the optimization of subqueries by the databases themselves (with their built-in optimizer) has noticeably improved.
    The main advantage of JOINs is that you do not need to tell the database exactly how to perform the operation. And the main advantage of subqueries is that the subquery loop can have several iterations (repetitions), which, in turn, can significantly increase performance. Conclusion This article shows the most common ways to improve the performance of SQL queries. However, there are still many different tricks and tricks to optimize queries. Query optimization is more of an art than a science. Each database has its own built-in optimizers that can help in this difficult task, but no one will do all the work for you. As an old physics teacher said: “To solve problems, you need to solve them.” It is not recommended to use ORDER BY in conjunction with operations such as DISTINCT or GROUP BY, because these operators can create side effects for sorting. As a result, you may end up with an incorrectly sorted data set, which can be critical in some situations. This consequence does not apply to optimization, but you should not forget about it. Before increasing network performance and expanding server hardware, try optimization.Any SQL operation has a "utility factor". The higher the coefficient, the more “useful” the operation: the request is completed faster.Unlike compilers, not all databases can simplify expressions like x=1+1-1-1 to x=0. Consequently, they waste valuable time performing unnecessary tasks. Optimize them in advance.When using the SUM() function, you can achieve better performance by using SUM(x + y) rather than SUM(x) + SUM(y).But if the SUM() functions are required for subtraction, use the opposite: SUM(x) – SUM(y). SUM(x – y) is slower.Each database has its own built-in optimizers, but they are far from perfect. Therefore, optimize in advance.

    38. Physical organization of the database in InterBase.

    The InterBase database consists of sequentially numbered pages, starting from 0. The zero page is a service page and contains information necessary to connect to the database. Page size - 1 (default), 2, 4 or 8 KB. The page size is set when creating the database, but can be changed when saving and restoring the database. One page is read by the server during one logical access to the database. The volume of the I/O buffer for read-write operations is determined in the number of pages (75 by default). If the database will be read more often, the buffer size should be increased. If it will be written to more often, the buffer size can be reduced. InterBase supports multiple version mode for records. When a record is changed by any transaction, a new version of the record is created, where, in addition to the data, the transaction number and a pointer to the previous version of the record are written. Old version marked as changed; its pointer to the next version of the record contains a link to the newly created version. Each starting transaction works with latest version records for which changes have been confirmed. Thus, transactions working in parallel with the database always use different versions records, which allows you to remove locks for client applications simultaneously working with the same data in the database. When deleting a record, it is also not physically deleted from the disk, but is marked as deleted until all active transactions using this record are completed. InterBase places all versions of one TDB record on one page. After deleting entries, “holes” are formed on the page. When adding new entry the size of the maximum “hole” is analyzed, and if it is less than the length of the added record, the page is compressed, during which the “holes” are merged. If the freed space is not enough to accommodate a new record, it is written from a new page. Page loading is considered normal if “holes” occupy no more than 20% of the page volume. Page selection is not optimized in any way. The numbers of all free pages are stored on a separate service page of the database. When allocating pages, no actions are taken to allocate consecutive pages to store records of one database table, but the first page in the free list is allocated. If there is no free page, a new one is added to the end of the database. Only in this case the size of the database increases. The structure of records that supports multiple version mode and non-optimal allocation of pages lead to high fragmentation of the database and, as a consequence, to slowdown of work with it. Therefore, it is necessary to periodically defragment the database. A defragmented database is characterized by the arrangement of database table records on sequential pages and the absence of “garbage”. Garbage refers to versions of records that are not processed by any active transaction. To remove garbage, the database is saved to disk drive and then restored from the backup using the IBConsole utility (for previous versions of InterBase - using the InterBase Server Manager utility). This process guarantees the removal of all garbage, since at the time of saving the database installation should not be active connections to the database from other users and therefore there cannot be active transactions. In addition, InterBase provides automatic removal garbage against the background of active transactions. The default interval at which garbage is automatically removed is 20,000 transactions. This value can be changed using the IBConsole (InterBase Server Manager) utility. Automatic purge deletes only those versions of records for which there are no active transactions. As a result, not all old versions may be removed.

    SQL Basics

    1. Create Database.

    In various DBMSs, the procedure for creating databases is usually assigned only to the database administrator. In single-user systems, the default database can be created directly during the installation and configuration of the DBMS itself. The SQL standard does not define how databases should be created, so each SQL dialect typically uses a different approach. Creation process databases in the SQL server system consists of two stages: first, it organizes itself database and then owned by her transaction log. Information is posted according to files with *.mdf extensions (for databases) and *.ldf. (For transaction log). In file databases information about the main objects is recorded ( tables, indices, views, etc.), and in the file transaction log– about the process of working with transactions (monitoring data integrity, state databases before and after transactions). Creation databases in the SQL server system it is carried out with the CREATE DATABASE command. (procedure for creating databases in SQL server requires server administrator rights.)<определение_базы_данных>::= CREATE DATABASE database_name [<определение_файла>[,...n] ] [,<определение_группы>[,...n] ] ] [ LOG ON (<определение_файла>[,...n] ) ] [ FOR LOAD | FOR ATTACH ] The ON parameter specifies a list of files on the disk for storing information stored in database. The PRIMARY parameter specifies primary file. If it is omitted, then primary is the first file in the list. The LOG ON parameter specifies the list of files on the disk to be placed transaction log. File name for transaction log generated based on name databases, and the characters _log are appended to it at the end.

    SQL Basics.

    3. Creation of domains (Create Domain). Create Table

    CREATE TABLE table ( [, | ...]); where table is the name of the table being created, - field description, - description of restrictions and/or keys (square brackets mean optional, vertical bar | means “or”). The field description consists of the field name and field type = col (datatype | COMPUTED BY ( ) | domain) [ ] Here col is the field name; datatype is any valid SQL server type, character types can have a CHARACTER SET - a character set that determines the country's language. For the Russian language, set the character set WIN1251;COMPUTED BY ( ) - definition of a field calculated at the server level, where - a valid SQL expression that returns a single value; domain - the name of a domain (generic type) defined in the database; DEFAULT - a construct that defines the default value of the field; NOT NULL - a construct indicating that the field cannot be empty; COLLATE is a clause that defines the sort order for the selected set of characters. Example CREATE TABLE lsn_team (id lsn_dintkey, name lsn_dname UNIQUE, founded lsn_dfounded, PRIMARY KEY (id)) Creating a domain While studying a subject area, a database developer often encounters the fact that a built-in type is too “broad” " to store an attribute of the entity in question. For example, you need to enter age, and the INTEGER and SMALLINT data types provide too wide ranges. The server gives us the opportunity to create our own data type, imposing the necessary restrictions on it. A data type in SQL is called a domain and the command used to create it is CREATE DOMAIN: CREATE DOMAIN dage AS INTEGER DEFAULT 0 CHECK(VALUE >= 0 AND VALUE<= 120) Рассмотрим приведенную выше команду. Мы попросили сервер создать домен CREATE DOMAIN с именем dage на основе целочисленного типа AS INTEGER, причем, если пользователь не укажет возраст, то будет использовано значение по умолчанию 0 -- DEFAULT 0, и значение поля должно находиться в пределах от 0 до 120 -- CHECK(VALUE >= 0 AND VALUE<= 120). Мы могли бы указать, что поле будет обязательно для заполнения -- NOT NULL, но в этом нет необходимости, так как NULL значение в любом случае не пройдет проверку CHECK.

    SQL Basics

    5. Batkivska and DB. Security of full integrity

    Declarative integrity constraints are specified at the level of table creation statements. When describing a table, the table name is specified, which is an identifier in the base DBMS language and must comply with the requirements for naming objects in this language. In addition to the table name, the statement specifies a list of table elements, each of which serves either to define a column or to define an integrity constraint for the table being defined. Requires at least one column definition. That is, a table that does not have a single column cannot be defined. The number of columns in one table is not limited, but specific DBMSs usually have restrictions on the number of attributes. So, for example, in MS SQL Server 6.5 the maximum number of columns in a table was 250, but already in MS SQL Server 7.0 it was increased to 1024. When setting uniqueness constraints, this column is defined as a possible key, which assumes the uniqueness of each entered value in this column. And if this restriction is set, then the DBMS will automatically check for the absence of duplicate values ​​of this column in the entire table. If a reference constraint for a given column is specified in the integrity constraints section, the corresponding reference constraint definition for the table is generated: FOREIGN KEY(<имя столбца>) < спецификация ссылки>, which means that the values ​​of a given column must be taken from the corresponding column of the parent table. In this case, the parent table is a table that is related to this table by a one-to-many (1:M) relationship. In this case, each row of the parent table can be associated with several rows of the defined table. Translation of SQL statements is carried out in interpretation mode, so it is important that the parent table be described first, and then all the subordinate tables associated with it. Otherwise, the translator will determine a reference to an undefined object. First, all main tables should be described, and then the subordinate tables.

    SQL Basics

    8. Drink. Submit your investments

    A subquery is a very powerful feature of the SQL language. It allows you to build complex hierarchies of queries that are executed repeatedly during the process of constructing a result set or executing one of the data change operators (DELETE, INSERT, UPDATE). Conventionally, subqueries are sometimes divided into three types, each of which is a narrowing of the previous one:
      a table subquery that returns a set of rows and columns; a row subquery that returns only one row, but possibly multiple columns (such subqueries are often used in embedded SQL); a scalar subquery that returns the value of one column in one row.
    A nested subquery is a subquery enclosed in parentheses and nested within the WHERE (HAVING) clause of a SELECT clause or other clauses that use a WHERE clause. A nested subquery may contain in its WHERE (HAVING) clause another nested subquery, etc. It is easy to guess that the nested subquery was created so that when selecting rows of the table generated by the main query, it is possible to use data from other tables (for example, when selecting dishes for the menu, use data on the availability of products in the pantry of the boarding house).SELECT * from tbl1 WHERE f2=(SELECT f2 FROM tbl2 WHERE f1=1);Correlated Subqueries In a SELECT statement, an inner subquery can reference columns of the outer query specified in the SELECT clause. This subquery is executed for each row of the table, determining the condition for its inclusion in the generated result set. For example: SELECT * from tbl1 t1 WHERE f2 IN (SELECT f2 FROM tbl2 t2 WHERE t1.f3=t2.f3); In this case, for each row of table tbl1, the condition will be checked that the value of field f2 coincides with the value of the row of table tbl2, where the value of field f3 is equal to the value of field f3 of the external table (tbl1). This is the simplest example of a correlated subquery.
    1. Preface Database management systems (DBMS) are software systems designed to work with specially organized files (data arrays long-term stored in the external memory of computer systems), which are called

      Document

      Database management systems (DBMS) are software systems designed to work with specially organized files (data arrays stored long-term in external memory computing systems), which are called databases.

    2. Http://www citforum ru/database/osbd/contents shtml Fundamentals of modern databases

      Abstract
    3. Database management (DB). Informally, you can define a database as a certain set of data necessary for work and organized in one way or another

      Lecture

      In the first lecture, we will look at the general meaning of the concepts database (DB) and database management system (DBMS). Informally, you can define a database as a certain set of data necessary for work and organized in one way or another.


    Classic client-server architecture

    The term “client-server” means an architecture of a software package in which its functional parts interact according to the “request-response” scheme. If we consider two interacting parts of this complex, then one of them (the client) performs an active function, that is, it initiates requests, and the other (the server) passively responds to them. As the system develops, the roles may change, for example, some software block will simultaneously perform the functions of a server in relation to one block and a client in relation to another.

    Note that any information system must have at least three main functional parts - modules for data storage, data processing and user interface. Each of these parts can be implemented independently of the other two. For example, without changing the programs used to store and process data, you can change the user interface so that the same data is displayed in the form of tables, graphs, or histograms. Without changing the data presentation and storage programs, you can change the processing programs, for example, by changing the full-text search algorithm. Finally, without changing the programs for presenting and processing data, you can change the software for storing data, moving, for example, to a different file system.

    In a classic client-server architecture, the three main parts of the application must be distributed across two physical modules. Typically, data storage software is located on a server (for example, a database server), the user interface is on the client side, but data processing must be distributed between the client and server parts. This is the main drawback of the two-tier architecture, which results in several unpleasant features that greatly complicate the development of client-server systems.

    When splitting data processing algorithms, it is necessary to synchronize the behavior of both parts of the system. All developers must have full information about latest changes changes made to the system and understand these changes. This creates great difficulties in the development of client-server systems, their installation and maintenance, since it is necessary to spend significant efforts on coordinating the actions of different groups of specialists. Contradictions often arise in the actions of developers, and this slows down the development of the system and forces them to change ready-made and proven elements.

    To avoid inconsistency between different elements of the architecture, they try to perform data processing on one of two physical parts - either on the client side (thick client) or on the server (thin client, or an architecture called 2.5-tier client). server"). Each approach has its drawbacks. In the first case, the network is unjustifiably overloaded, since unprocessed, and therefore redundant, data is transmitted through it. In addition, supporting the system and changing it becomes more complicated, since replacing a calculation algorithm or correcting an error requires simultaneous complete replacement all interface programs, otherwise errors or data inconsistency may occur. If all information processing is performed on the server (when this is even possible), then the problem of describing built-in procedures and their debugging arises. The fact is that the language for describing built-in procedures is usually declarative and, therefore, in principle does not allow step-by-step debugging. In addition, a system with information processing on a server is absolutely impossible to transfer to another platform, which is a serious drawback.

    Majority modern means Rapid application development (RAD) that operates on various databases implements the first strategy, i.e., a thick client provides an interface to the database server via embedded SQL. This option for implementing a system with a “thick” client, in addition to the disadvantages listed above, usually provides unacceptable low level security. For example, in banking systems all transaction operators have to be given the right to write to the main table of the accounting system. Besides, this system It is almost impossible to transfer to Web technology, since specialized client software is used to access the database server.

    So, the models discussed above have the following disadvantages.

    1. "Thick" client:
    # complexity of administration;
    # updating the software becomes more difficult, since it must be replaced simultaneously across the entire system;
    # the distribution of powers becomes more complicated, since access is limited not by actions, but by tables;
    # the network is overloaded due to the transmission of unprocessed data;
    # weak data protection, since it is difficult to correctly distribute powers.

    2. "Fat" server:
    # implementation becomes more complicated, since languages ​​like PL/SQL are not suitable for developing such software and there is no good funds debugging;
    # the performance of programs written in languages ​​like PL/SQL is significantly lower than those created in other languages, which is important for complex systems;
    # programs written in DBMS languages ​​usually do not work reliably; an error in them can lead to failure of the entire database server;
    # The resulting programs are completely unportable to other systems and platforms.

    To solve these problems, multi-level (three or more levels) client-server architectures are used.

    Multi-tier client-server architectures

    Such architectures more intelligently distribute data processing modules, which in this case run on one or more separate servers. These software modules perform the functions of a server for interfaces with users and a client for database servers. In addition, different application servers can communicate with each other to more accurately divide the system into functional units that perform specific roles. For example, you can select a personnel management server that will perform all the functions necessary for personnel management. By associating a separate database with it, you can hide all implementation details of this server from users, allowing them to access only its public functions. In addition, such a system is very easy to adapt to the Web, since it is easier to develop HTML forms for user access to specific database functions than to all data.

    In a three-tier architecture, the thin client is not overloaded with data processing functions, but performs its main role as a system for presenting information coming from the application server. Such an interface can be implemented using standard Web technology tools - a browser, CGI and Java. This reduces the amount of data transferred between the client and the application server, allowing client computers to connect even over slow lines such as telephone lines. In addition, the client side can be so simple that in most cases it is implemented using a universal browser. But if you still have to change it, then this procedure can be carried out quickly and painlessly. The three-tier client-server architecture allows for more precise assignment of user permissions, since they receive access rights not to the database itself, but to certain functions of the application server. This increases the security of the system (compared to conventional architecture) not only from intentional attacks, but also from erroneous actions of personnel.

    As an example, consider a system whose various parts run on several servers remote from each other. Let’s assume that the developer has received a new version of the system, for installation of which in a two-level architecture it is necessary to simultaneously change all system modules. If this is not done, then the interaction of old clients with new servers can lead to unpredictable consequences, since developers usually do not count on such use of the system. In a three-tier architecture, the situation is simplified. The fact is that by changing the application server and data storage server (this is easy to do at the same time, since both of them are usually located nearby), we immediately change the set of available services. Thus, the likelihood of an error due to a mismatch between the versions of the server and client parts is sharply reduced. If in new version If any service has disappeared, then the interface elements that served it in the old system simply will not work. If the algorithm of the service has changed, it will work correctly even with the old interface.

    Multi-level client-server systems can be quite easily transferred to Web technology - to do this, it is enough to replace the client part with a universal or specialized browser, and supplement the application server with a Web server and small programs calling server procedures. Both the Common Gateway Interface (CGI) and more modern Java technology can be used to develop these programs.

    It should also be noted that in a three-level system, quite a lot of information is transmitted through the communication channel between the application server and the database. However, this does not slow down the calculations, since faster lines can be used to connect these elements. This requires minimal effort since both servers are usually located in the same premises. Thus, the total performance of the system increases - two different servers are now working on one task, and communication between them can be carried out via the fastest lines with minimal costs funds. True, there is a problem of consistency of joint calculations, which transaction managers - new elements of multi-level systems - are called upon to solve.

    Translation from English: Chernobay Yu. A.

    Development of client-server systems

    The architecture of a computer system has evolved along with the hardware's ability to use the applications it runs. The simplest (and earliest) of all was the "Mainframe Architecture", in which all operations and functioning are carried out within the server (or "host") computer. Users interacted with the server through "dumb" terminals, which transmitted instructions by capturing keystrokes to the server and displayed the results of the instructions to the user. Such applications were of a typical nature and, despite the relatively large computing power of server computers, were generally relatively slow and inconvenient to use, due to the need to transmit each keystroke to the server.

    Introduction and widespread adoption of the PC, with its own computing power and graphical user interface allowed applications to become more complex, and expansion network systems led to the second major type of system architecture, "File Partitioning". In this PC architecture (or " workstation") downloads files from a specialized "file server" and then manages the application (including the data) locally. This works well when the shared data usage, data updates, and the amount of data to be transferred are small. However, it soon became clear that file sharing is all more clogged the network, and applications became more complex and required that everyone more data was transmitted in both directions.

    The problems associated with applications processing data over a file shared over a network led to the development of client-server architecture in the early 1980s. In this approach, the file server is replaced by a database server which, rather than simply transmitting and storing files to connected workstations (clients), receives and actually executes requests for data, returning only the result requested by the client. By transmitting only the data requested by the client rather than the entire file, this architecture significantly reduces network load. This made it possible to create a system in which multiple users could update data through GUI interfaces linked to a single shared database.

    Typically, either Structured Query Language (SQL) or Remote Procedure Call (RPCs) are used to exchange data between the client and server. Several basic options for organizing a client-server architecture are described below.

    In a two-tier architecture, the load is distributed between the server (which hosts the database) and the client (which hosts the user interface). As a rule, they are located on different physical machines, but this is not a mandatory requirement. Provided that the levels are logically separated, they can be placed (for example, for development and testing) on ​​the same computer (Fig. 1).

    Figure 1: Two-tier architecture

    The distribution of application logic and data processing in this model was and remains problematic. If the client is “smart” and carries out the main data processing, then problems arise associated with the distribution, installation and maintenance of the application, since each client needs its own local copy software. If the client "dumb" the application logic and processing must be implemented in the database, and therefore it becomes completely dependent on the specific DBMS being used. In any case, each client must register and, depending on the access rights he receives, perform certain functions. However, the two-tier client-server architecture was a good solution when the number of users was relatively small (up to about 100 concurrent users), but as users grew, there were a number of restrictions on the use of this architecture.

    Performance: As the number of users increases, performance begins to deteriorate. The performance degradation is directly proportional to the number of users, each of which has its own connection to the server, which means that the server must maintain all these connections (using "Keep-Alive" messages) even when the database is not being used.

    Security: Each user must have their own individual access to the database, and have the rights granted to operate the application. To do this, it is necessary to store the access rights for each user in the database. When you need to add functionality to an application and you need to update user rights.

    Functionality: Regardless of what type of client is used, most of the data processing must reside in the database, meaning that it is entirely dependent on the capabilities provided in the database by the manufacturer. This can severely limit the functionality of your application because different databases support different features, use different programming languages, and even implement basic features like triggers differently.

    Portability: Two-tier architecture is so dependent on the specific database implementation that portability existing applications for various DBMSs, becomes a serious problem. This is especially true for applications in vertical markets where the choice of DBMS is not determined by the vendor.

    But despite this, two-tier architecture has found new life in the Internet era. It can work well in a disconnected environment where the UI is dumb (eg a browser). However, in many ways this implementation represents a return to the original mainframe architecture.

    In an effort to overcome the limitations of the two-tier architecture outlined above, an additional layer was introduced. This architecture is a standard client-server model with a three-tier architecture. The purpose of the additional layer (commonly called the "middle" or "rules" layer) is to manage application execution and database management. As with the two-level model, the levels can be located either on different computers (Figure 2), or on one computer in test mode.

    Figure 2: Three-tier architecture

    By introducing the middle row, the limitations of the two-tier architecture were largely eliminated, resulting in a much more flexible, and scalable, system. Since clients now connect only to the application server and not directly to the data server, the burden of maintaining connections is removed, as is the need to implement application logic within the database. The database can now only perform the functions of storing and retrieving data, and the task of receiving and processing applications can be performed by the middle level of a three-tier architecture. The development of operating systems, which included elements such as connection pooling, queues, and distributed transaction processing, strengthened (and simplified) the development of the middle tier.

    Note that in this model, the application server does not control the user interface, nor does the user actually query the database directly. Instead, it allows multiple clients to share business logic, computation, and access to search engine data. The main advantage is that the client requires less software and no longer needs direct connection to the database, which increases security. Consequently, the application is more scalable, the costs of support and installation on one server are much lower than for maintaining applications directly on the client’s computer or even on a two-tier architecture.

    There are many variations of the basic three-tier models designed to perform different functions. These include distributed transaction processing (where multiple DBMSs are updated in the same protocol), message-based applications (where applications do not communicate in real time), and cross-platform compatibility (Object Request Broker or "ORB" applications).

    Multi-tier architecture or N-tier architecture

    With the development of Internet applications against the backdrop of a general increase in the number of users, the main three-level client-server model has been expanded by introducing additional levels. Such architectures are called "multi-tier" and typically consist of four layers (Figure 3), where a server in the network is responsible for handling the connection between the browser client and the application server. The benefit is that multiple web servers can connect to a single application server , thereby increasing processing more simultaneously connected users.

    Figure 3: N-tier architecture

    Levels vs. Layers

    These terms are (unfortunately) often confused. However, between them big difference and have a certain meaning. The main difference is that layers are in the physical layer and layers are in the logical layer. In other words, the layer, theoretically, can be deployed independently on a separate computer, and the layer is a logical division within the layer (Figure 4). The typical three-layer model described above typically contains at least seven layers, separated across all three levels.

    The main thing to remember about a layered architecture is that requests and responses from each thread in one direction traverse all layers, and that layers can never be skipped. Thus, in the model shown in Figure 4, the only layer that can access layer "E" (the data access layer) is layer "D" (the rules layer). Likewise, layer "C" (the application validation layer) can only respond to requests from layer "B" (the error handling layer).

    Figure 4: Rows divided into logical layers