GIS is a distributed information system. Distributed information systems and networks

Submitting your good work to the knowledge base is easy. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

FEDERAL AGENCY FOR EDUCATION

State educational institution of higher professionaleducation

RUSSIAN STATE HUMANITIES UNIVERSITY

Branch of the Russian State University for the Humanities in Kaliningrad

Department of Economic, Management and Legal Disciplines

Test on “MANAGEMENT INFORMATION TECHNOLOGIES”

"Distributed information systems »

Afanasyev Oleg Alexandrovich

3rd year correspondence course

specialty 080507

"Organization Management"

Supervisor:

Ph.D., Associate Professor

Kornev K.P.

Kaliningrad 2010

Introduction 3
1. 4
2. The concept of a distributed database. 6
3. Methods for maintaining the integrity of a distributed database. 7
4. Standardization of digital representation of documentary information. 8
5. Standardization of document architecture definition and processing processes (ODA/ODIF). 13
6. Standards for knowledge presentation. 14
Conclusion. 18
List of used literature. 19

Introduction

Distributed databases cannot be considered outside the context of the more general and more significant topic of distributed information systems. The processes of decentralization and information integration taking place throughout the world must inevitably affect our country sooner or later. Russia, due to its geographical location and size, is “doomed” to the predominant use of distributed systems. Our work is devoted to studying the architecture of distributed data processing, methods for maintaining the integrity of a distributed database, standardizing the definition of document architecture and processing processes, and also consider different standards data presentation.

1. Distributed data processing.

Distributed data processing - implementation technique application programs group of systems. In this case, the user gets the opportunity to work with network services and application processes located in several interconnected subscriber systems.

Distributed data processing uses distributed data processing systems.

Distributed systems are client-server systems.

Fig.1. data processing in client/server architecture

So, in the simplest case, a client-server information system consists of three main components:

· database server that manages data storage, access and security, backup, monitoring data integrity in accordance with business rules and, most importantly, fulfilling client requests;

· a client that provides a user interface, executes application logic, checks the validity of data, sends requests to the server and receives responses from it;

· network and communication software that interacts between client and server via network protocols.

Main features of the client-server architecture

One of the models of interaction between computers on a network is called “client-server” (Fig. 2). Each of the elements that make up this architecture plays its role: the server owns and manages the information resources of the system, the client has the opportunity to use them.

Rice. 2. Client-server architecture

The database server is a multi-user version of the DBMS that processes queries received from all workstations in parallel. Its task is to implement transaction processing logic using the necessary synchronization techniques - supporting resource blocking protocols, ensuring, preventing and/or eliminating deadlock situations.

In response to a user request, the workstation will receive not “raw materials” for further processing, but finished results. With this architecture, workstation software only plays a role external interface(Front - end) centralized data management system. This allows you to significantly reduce network traffic, reduce the time spent waiting for blocked data resources in multi-user mode, relieve workstations and, with a sufficiently powerful central machine, use cheaper equipment for them.

Typically, the client and server are geographically separated from each other, in which case they are part of or form a distributed data processing system.

For modern DBMSs, the client-server architecture has become the de facto standard. If it is assumed that the information being designed will have a “client-server” architecture, this means that application programs implemented within its framework will be distributed in nature, i.e. some of the application functions will be implemented in the client program, others in server program. The basic principle of client-server technology is to divide the functions of a standard interactive application into four groups:

· data input and display functions;

· applied functions specific to the subject area;

· fundamental functions of storing and managing resources (databases);

· service functions.

2. The concept of a distributed database

A distributed database (DDB) is a collection of logically interconnected databases distributed across computer network. A distributed database management system is defined as a software system that allows the management of a distributed database in such a way that its distribution is transparent to users. This definition should clarify two distinctive architectural features. The first is that the system consists of a (possibly empty) set of query sites and a non-empty set of data sites. Data nodes have the means to store data, but request nodes do not. The request receiving nodes only run programs that implement the user interface for accessing the data stored in the data nodes. The second feature is that nodes are logically independent computers. Therefore, such a node has its own main and external memory, has its own operating system installed (may or may not be the same on all nodes), and has the ability to run applications. The nodes are connected by a computer network and are not part of a multiprocessor configuration. It is important to emphasize the loose coupling of processors, which have their own operating systems and functions independently .

3. Methods for maintaining the integrity of a distributed database

Integrity support relational model data in its classical sense includes 3 methods:

First method This is support for structural integrity, which is interpreted as the fact that a relational DBMS should only allow work with homogeneous data structures of the “relational relation” type. At the same time, the concept of a “relational relationship” must satisfy all the restrictions imposed on it in the classical theory of a relational database (the absence of duplicate tuples, respectively, the mandatory presence of a primary key, the absence of the concept of ordering of tuples).

Second method This is support for language integrity, which consists in the fact that a relational DBMS must provide languages for describing and manipulating data not lower than the SQL standard. Other low-level means of data manipulation that do not comply with the standard should not be available.

That is why access to information stored in the database and any changes to this information can only be done using SQL statements.

Third method This is support for referential integrity (Declarative Referential Integrity, DRI), which means ensuring one of the specified principles of relationship between instances of tuples of interconnected relations:

· tuples of a subordinate relation are destroyed when the tuple of the main relation associated with them is deleted.

· tuples of the main relation are modified when the tuple of the main relation associated with them is deleted, and an undefined Null value is placed in place of the key of the parent relation.

Referential integrity ensures the maintenance of a consistent state of the database during data modification when performing adding or deleting operations.

In addition to the indicated integrity constraints, which in general do not define the semantics of the database, the concept is introduced semantic integrity support.

Structural, linguistic and referential integrity determine the rules for how a DBMS works with relational data structures. The requirements to support these three types of integrity mean that every DBMS must be able to do this, and developers must take this into account when building databases using the relational model.

4. Standardization of digital representation of documentary information

Legal regulation of issues of working with electronic documents involves enshrining in regulatory legal acts, first of all, the concept of “electronic document” and the possibility of using electronic documents on an equal basis with traditional documents in various fields of activity, especially in the field of public administration. The use of electronic documents requires legislative support their legal force, that is, establishing the procedure for their certification (the composition and methods of registration of details), as well as protection against distortions in the process of electronic exchange. In this regard, modern legislation attempts are being made to create certain conditions for the use of electronic signature technology for these purposes.

Technical regulation (standardization) in the field of electronic document management is aimed at the development, adoption, application and implementation of requirements for electronic documents and various technological processes related to the use of electronic documents, for example, the processes of generating and verifying an electronic signature, procedures for storing, transporting and operating media used in data processing and storing information in electronic form.

Standardization of metadata arrays about information resources, the development of systems for their classification and cataloging should become the basis for the creation of effective means of navigation in the Russian information space, as well as ongoing monitoring of information resources (primarily government ones) and information activities.

In connection with the above, in 1999 the national standard GOST R 51353 was adopted, defining the composition and content of metadata electronic cards V geographic information systems, and in 2003, the interstate standard GOST 7.70 was adopted directly as a national standard of the Russian Federation, establishing the composition, content and presentation of details for the description of electronic information resources, which are databases and machine-readable information arrays. The GOST 7.70 standard is recommended both for registration authorities compiling catalogs of information resources, and for developers and distributors of electronic information resources (on removable media, in global and local networks).

National standards included in the System of Standards for Information, Librarianship and Publishing (SIBID) normatively streamline information processes that provide access to information collections. For example, the rules for the bibliographic description of electronic publications are established by the state standard GOST 7.82, according to which the description of electronic resources is as close as possible to the description of traditional documents enshrined in GOST 7.1.

The creation and operation of computer systems and documentation processing technologies is carried out on the basis of the rules set out in a set of standards for automated systems and other standards of the “Information Technology” series. Also at the state level, issues of information protection and the use of electronic digital signatures have been standardized.

In particular, the GOST 34.601 standard identifies eight stages in the creation of automated systems (AS) used in various fields of activity, including management: formation of requirements for the AS, development of the AS concept, technical specifications, preliminary design, technical design, working documentation, commissioning, support of AS.

The GOST 34.602 standard establishes the composition, content, rules for drawing up the document “Technical specifications for the creation (development or modernization) of a system,” as well as the procedure for its development, coordination and approval. This standard notes that the requirements included in the technical specifications must “not be inferior to similar requirements for the best modern domestic and foreign analogues.”

The national standard GOST R 52294, developed in 2004, defines the basic provisions for the creation, implementation, operation and maintenance of electronic regulations for the administrative and official activities of organizations. It applies to automated information processing and management systems of institutions, enterprises and organizations, regardless of the form of ownership and subordination. The provisions of this standard should be taken into account when creating new or improving existing organizational management technologies. The GOST 52294 standard contains definitions of the terms “regulations” (this is “a set of rules establishing the procedure for carrying out work or carrying out activities”), “work process” (this is “a set of interrelated or interacting activities that transform inputs into outputs and are implemented within the organization”). , “operation (work)” (this is “the part of the work process that creates a reproducible result within the work process”).

National standards also set out requirements for document management, unification and use of unified documentation systems, preparation of organizational and administrative documentation, terminology in the field of office work and archiving, terminology in the field of electronic information exchange.

It should be noted that the industry terminology standard for office work and archiving GOST R 51141, in force since January 1, 1999, does not fully reflect the new international terminology and does not take into account the new technical terminology that has arisen in connection with the use of computer information technology in the field of working with information and documentation. It requires updating based on the use of ISO standards and domestic experience in working with documentation.

The national standardization system consists, in addition to national standards, of all-Russian classifiers of technical, economic and social information and other classifications applied in the prescribed manner. All-Russian classifiers are normative documents that distribute technical, economic and social information in accordance with its classification (classes, groups, types, etc.). Unlike national standards, which are applied on a voluntary basis, all-Russian classifiers are mandatory for use when creating state information systems and information resources, as well as during interdepartmental information exchange.

The “Regulations on the development, adoption, implementation, maintenance and application of all-Russian classifiers of technical, economic and social information in the socio-economic field” approved by the Government of the Russian Federation contains a list of all-Russian classifiers, as well as executive authorities ensuring the development, maintenance and application of each of classifiers.

The unified forms of documents themselves are approved by the ministries (departments) of the Russian Federation - the developers of unified documentation systems. For example, the federal body of state statistics, within the framework of primary accounting documentation, maintains a subsystem of documentation for recording labor and its payment, develops and approves albums of unified forms of primary accounting documentation and their electronic versions.

It should be noted that in 2007, Rosarkhiv and VNIIDAD developed and put into operation a Unified Classifier of Documentary Information of the Archive Fund of the Russian Federation. This classifier establishes and consolidates a systematized list of names and indices of classification objects that is uniform for all state and municipal archives of the Russian Federation, which creates a solid basis for the formation of a unified archival information space our country.

Requirements for electronic documents may contain legislative and other regulatory legal acts that determine the status of various legal entities or their activities in a certain area. For example, in accordance with the Federal Law “On individual (personalized) accounting in the state pension insurance system” in Pension fund Information may be provided to the Russian Federation both in the form of documents in written form and in electronic form (on magnetic media or via communication channels).

5. Standardization of document architecture definition and processing processes (ODA/ODIF)

ODA/ODIF - Office Document Architecture / Office Document Interchange Format office documents/ Form for exchanging office documents)

An open standard for document architecture and interchange format, it allows the exchange of complex documents (that is, documents that simultaneously contain several different types of content, such as letters, raster graphics, and geometric [computer] graphics).

The world has formed information environment, whose infrastructure is based on computers and telecommunications. This environment, built using client-server technology, provides the ability to integrate diverse technical and software solutions. Along with the already well-known word Internet, several years ago the concept of intranet appeared, which means the use in corporate networks and systems of tools and standards developed for global networks. And for organizing electronic document management, these systems have their own standards and software. Probably, developers of electronic document transfer systems in libraries should definitely take into account the ISO 8613 (parts 1-6) “Office Document Architecture (ODA) and Interchange Format (ODIF)” standard. This standard specifies a method for describing electronic documents and the very description of structured information in a form convenient for machine processing and automated exchange.

Further, it should be said that documents or their copies provided to the user can be of different nature: electronic graphic copies, electronic files in one of text formats, photocopies or originals issued under the MBA. Accordingly, we must take into account all the necessary standards describing their formats and encoding (TIFF, GIF, JPEG, PDF, PostScript, CCITT Group 3/Group4 Facsimile Standard, ISO 2022 "Information Processing - 7-bit/8-bit character sets", ISO 4873 “8-bit code for Information Interchange - Structure and Rules for Implementation", ISO 6937 “Coded characters for Text Communication", ISO 8859 “8-bit single byte coded graphics character sets”, CP Windows-1251, etc.). We must also take into account and use ability to present documents in HTML ("HyperText Mark-up Language") and SGML (ISO 8859 "Information Processing - Text and Office Systems - Standard Generalized Mark-up Language") with corresponding code tables (UNICODE, UTF-8, ISO 10646 ). Technological support and functional e-IBA programs should provide not only various formats, but various methods of transporting documents, such as: transferring data by email, via an FTP server, perhaps using some other data network protocols, sending documents by fax or regular mail, etc.

6. Standards for knowledge presentation

One of the challenges in knowledge representation is how to store and process knowledge in information systems in a formal way so that mechanisms can use it to achieve their goals. Examples of applications here include expert systems, machine translation, computer-assisted maintenance, and information retrieval systems (including database user interfaces).

Semantic networks can be used to represent knowledge. Each node in such a network represents a concept, and arcs are used to define relationships between concepts. One of the most expressive and detailed knowledge representation paradigms based on semantic networks is MultiNet (an acronym for Multilayered Extended Semantic Networks).

Since the 1960s, the concept of knowledge frame or simply frame has been used. Each frame has its own name and a set of attributes, or slots, that contain values; for example, a house frame could contain slots for color, number of floors, and so on.

The use of frames in expert systems is an example of object-oriented programming, with property inheritance described by the "is-a" relationship. However, there has been considerable controversy regarding the use of the is-a relationship: Ronald Brachman wrote a paper entitled "What IS-A Is and Isn't" in which he found 29 different semantics of the is-a relationship in projects whose knowledge representation schemes included the "is-a" connection. Other connections include, for example, "has-part".

Frame structures are well suited for representing knowledge represented in the form of schemas and stereotypical cognitive patterns. Elements of such patterns have different weights, with larger weights assigned to those elements that correspond to the current cognitive schema). The pattern is activated under certain conditions: If a person sees a large bird, provided that his “sea circuit” is now active and his “terrestrial circuit” is not, he classifies it as a sea eagle rather than a land golden eagle.

Frame views are object-centered in the same sense as the Semantic Web: All the facts and properties associated with one concept are located in one place, so there is no need to waste resources on database searches.

A script is a type of frame that describes a sequence of events over time; A typical example is a description of a trip to a restaurant. Events here include waiting to be seated, reading the menu, placing an order, and so on.

In computer science (mainly in the field artificial intelligence) for structuring information, as well as organizing knowledge bases and expert systems, several ways of representing knowledge have been proposed. One of them is the presentation of data and information within the framework of a logical model of knowledge bases, based on the logical programming language Prolog.

The term “Knowledge Representation” most often refers to methods of knowledge representation oriented towards automatic processing by modern computers, and in particular, representations consisting of explicit objects (“the class of all elephants”, or “Clyde is an instance”), and of judgments or statements about them (“Clyde the elephant”, or “all elephants are gray”). Representing knowledge in such an explicit form allows computers to make deductive inferences from previously stored knowledge (“Clyde is gray”).

In the 1970s and early 1980s, numerous knowledge representation methods were proposed and tested with varying degrees of success, such as heuristic question-answer systems, neural networks, theorem proving, and expert systems. Their main areas of application at that time were medical diagnostics (eg MYCIN) and games (eg chess).

In the 1980s, formal computer languages for knowledge representation appeared. The main projects of that time tried to encode (enter into their knowledge bases) huge amounts of universal knowledge. For example, in the project “Cyc” was processed big encyclopedia, and it was not the information stored in it that was encoded, but the knowledge that the reader would need to understand this encyclopedia: naive physics, concepts of time, causality and motivation, typical objects and their classes. The Cyc project is developed by Cycorp, Inc.; most (but not all) of their database is freely available.

This work has led to a more accurate estimate of the complexity of the knowledge representation task. At the same time, in mathematical linguistics, much larger databases of linguistic information were created, and these, together with the huge increase in speed and memory capacity of computers, made a deeper representation of knowledge more realistic.

Several programming languages focused on knowledge representation have been developed. Prolog, developed in 1972 but not popularized until much later, describes propositions and basic logic, and can produce conclusions from known premises. The KL-ONE language (1980s) is even more aimed at representing knowledge.

In the field of electronic documents, languages have been developed that explicitly express the structure of stored documents, such as SGML and subsequently XML. They have facilitated the tasks of searching and retrieving information, which recently are increasingly associated with the task of representing knowledge. The Web community is highly interested in the Semantic Web, in which XML-based knowledge representation languages such as RDF, Topic Map, and others are used to make information stored on the Web more accessible to computer systems.

Hyperlinks are widely used today, but the related concept of semantic linking has not yet come into widespread use. Since Babylonian times they have been used math tables. Later these tables were used to represent the result of logical operations, for example truth tables were used to study and model Boolean logic. Table processors are another example of tabular representation of knowledge. Other knowledge representation methods are trees, which can be used to show connections between fundamental concepts and their derivatives.

Conclusion

At the moment, many organizations with a geographically distributed structure are faced with the acute problem of integrating data and applications within a single information space. Everyone is tired of carrying floppy disks on a trolleybus and suffering while converting data from the format of one application to the format of another.

There is a strong desire to do your own thing, i.e. have information in a timely manner at the right point at the right time, and not have to deal with exhausted programmers about what information did not reach where, and also why data entered in one application does not want to get into another.

IN in this case, you will need to apply Distributed Information Systems technology. This technology allows you to develop and implement company information systems within a single information space within a limited time frame, as well as save significant funds on its maintenance.

List of used literature

1.Basics of Web technologies./ P.B. Khramtsev, S.A. Brik, A.M. Rusak, A.I. Surgin / Edited by P.B. Khramtsova. - M.: INTUIT.RU "Internet University of Information Technologies", 2003. -512 p.

2. Glukhov V.A., Lavrik O.L. Electronic delivery of documents. - M.: INION RAS, 1999. - 132 p.

3. Friedland A.Ya. Informatics and computer technologies / A.Ya. Friedland, L.S. Khanamirova.- M.: Astrel. 2003.- 204 p.

4. Sakharov A. A. Concept of construction and implementation of information systems focused on data analysis // DBMS. - 1996. - No. 4. - P. 55-70.

5. Korovkin S. D., Levenets I. A., Ratmanova I. D., Starykh V. A., Shchavelev L. V. Solution to the problem of complex operational analysis of information from data warehouses // DBMS. - 1997. - No. 5-6. - P. 47-51.

6. Ensor D., Stevenson J. - M.: Oracle. Database design: Per. from English - K.: BHV Publishing Group, 1999. - 560 p.

Similar documents

Data processing unit: general device, selection element base. Structure of an operating machine. Calculation of the load capacity of the data bus. Calculation of the cycle time of the control machine. Memory: construction, controller. Processor-memory bus interface.

course work, added 01/07/2015

Inertial navigation systems and existing ways of their implementation. Description of the application architecture for collecting and marking data, structure and relationship of components. Basic functions of the data analyzer. Artificial neural networks and their purpose.

course work, added 09/04/2016

Basic concepts, definitions and classification of information systems, databases. Analysis of modern IBM mainframes and their features. Types of communication in railway transport and its purpose; information flows in transport systems.

tutorial, added 10/01/2013

Purpose of the database and its main functions. Categories of users, infological and datalogical design of the “Online store” database. Taking into account the specifics of the subject area, restrictions and business rules. Description of the user interface.

course work, added 09/30/2011

Hardware components of telecommunications computer networks. Workstations and communication nodes. Modules that form the area of interaction between application processes and physical means. Directions of data processing and storage methods.

lecture, added 10/16/2013

Functional diagram and mechanism of operation of a digital data processing device. Synthesis of a control automaton, selection of the trigger type, description of the control automaton and counters in Verilog language. The process of testing and modeling the control machine.

course work, added 12/05/2012

General and tactical and technical requirements for the design of on-board equipment. Data input unit for non-volatile storage and delivery of flight mission data to the on-board computer, as well as receiving registration data. Block diagram and design development.

thesis, added 04/16/2012

Studying local topology computer network- a collection of computers and terminals connected via communication channels into a single system that meets the requirements of distributed data processing. Development of a LAN for a darkroom. Network protocols.

course work, added 12/02/2010

Basic concepts of information system security. Properties of confidentiality, availability and data integrity. Protection of data at the time of its transmission over communication lines from unauthorized remote access to the network. Basic technologies security.

presentation, added 02/18/2010

Strain gauge method for assessing the state of the motor parts of the central nervous system. Structural organization strain gauge tremorograph. The main tasks of statistical processing of isometric data. Correlation and principal components methods.

Distributed IS

Distributed information systemis a set of databases that are remotely located from each other and have a number of common parameters. They operate according to general rules that are defined centrally simultaneously for all databases included in the information system. Information is exchanged according to rules that are also centrally determined.

The organization of a distributed information system is necessary for enterprises engaged in various types activities, if there is a need to solve such problems as the need to quickly obtain information from the database of remotely located units. Also, the need to implement such a system may arise when it is necessary to consolidate information in a common database.and, withcontained in the databases of legal entities that are part of the enterprise structure. This is carried out for the purpose of further data analysis and generation of reports from one database, both for the enterprise as a whole and separately for each legal entity.

Such an information system is implemented when it is necessary to introduce centralized changes to the structure and configuration of the database operating rules for the functioning of all remote departments and legal entities. At the same time, the ability to change certain rules directly from remote units may be prohibited.

Also implementation carried out when necessary to ensure control over changes in data in remotely located divisions of the organization.

Organization proceduredistributed information systemconsists of two stages. At the first stage, preparatory work is carried out: the structures of the information system, the rules for migrating information between databases that are part of a distributed information system, as well as the rules for limiting changes in such databases are determined.

The second stage includes the preparation processdistributed information system. At this stage, the selection of the optimally suitable software, with the help of which a distributed information base, working according to the rules described as a result of the preparatory work. Also at this stage, the selected software is configured in order to organize and effectively manage distributed information systems.

For the first time, the task of studying the foundations and principles of the creation and operation of distributed information systems was posed by the famous database specialist K. Date within the framework of the already mentioned more than once (where has it already been mentioned? ) of the System R project, which in the late 70s - early 80s resulted in a separate project to create the first distributed system (System R* project). The developers of the Ingres system also played a major role in the study of the principles of creation and operation of distributed databases.

Actually, distributed AIS are based on two main ideas:

Many organizationally and physically distributed users simultaneously working with common data - a common database (users with different names, including those located on different computing installations, with different powers and tasks);

Logically and physically distributed data that nevertheless composes and forms a single, mutually consistent whole - a common database (individual tables, records and even fields can be located on different computing installations or included in various local databases).

Chris Date also formulated the basic principles of creating and operating distributed databases. These include:

Transparency of data location for the user (in other words, for the user, a distributed database should appear and look exactly the same as a non-distributed one);

Isolation of users from each other (the user should not “feel”, “not see” the work of other users at the moment when he changes, updates, deletes data);

Synchronization and consistency (consistency) of the data state at any time.

A number of additional principles follow from the main ones:

Local autonomy (no computing installation should depend on any other installation for its successful functioning);

Lack of central installation (consequence of the previous point);

Location independent (the user does not care where the data is physically located, he works as if it were on his local installation);

Continuity of operation (no planned shutdowns of the system as a whole, for example, to connect a new installation or update the DBMS version);

Independence from data fragmentation (both from horizontal fragmentation, when different groups of records of one table are located on different installations or in different local databases, and from vertical fragmentation, when different column fields of one table are located on different installations);

Independence from data replication (duplication) (when any database table, or part of it, can be physically represented by several copies located on different installations, and “transparently” to the user);

Distributed query processing (query optimization should be distributed in nature - first global optimization, and then local optimization on each of the involved installations);

Distributed transaction management (in a distributed system, a single transaction may require actions to be performed on different installations; the transaction is considered completed if it is completed successfully on all involved installations);

Independence from hardware (it is desirable that the system can function on installations that include computers of different types);

Independence from the type of operating system (the system must function regardless of possible differences in the OS on different computing installations);

Independence from the communication network (ability to operate in different communication environments);

Independence from the DBMS (different types of DBMS can operate on different installations; in practice, they are limited to the range of DBMSs that support SQL).

In everyday life, DBMSs, on the basis of which distributed information systems are created, are also characterized by the term “Distributed DBMS”, and, accordingly, the term “Distributed Databases” is used.

The most important role in the technology of creating and operating distributed databases is played by the “Views” technique.

A view is an authorized global request to retrieve data stored in a database. Authorization means the ability to launch such a request only by a specifically named user in the system. Globality lies in the fact that data can be retrieved from the entire database, including data located on other computing installations. Let us recall that the result of a selection query is a data set representing a temporary table for the session of an open query, which can be further worked with as with ordinary relational data tables. As a result of such global authorized requests, a certain virtual database is created for a specific user with its own list of tables, relationships, etc. that is, with “its own” schema and with “its own” data. In principle, from the point of view of information tasks, in most cases the user does not care where and in what form the data itself is located. The data must be such and logically organized in such a way that the required information tasks and perform assigned functions.

The idea of the representation technique is illustrated schematically in the figure.

Rice. 1.1.The main idea of the representation technique
When a user logs into a distributed system, the DBMS kernel, identifying the user, runs queries on his previously defined and stored in the database representation and forms for him “his own” view of the database, perceived by the user as a regular (local) database. Since the database view is virtual, the “real” data is physically located where it was before the view was formed. When a user performs manipulations with data, the core of a distributed DBMS through the database system catalog itself determines where the data is located, develops an action strategy, i.e., determines where, on which installations it is more expedient to perform operations, where to do this and what data needs to be moved from others installations or local databases, checks compliance with data integrity constraints. Moreover, most of these operations are transparent (i.e., invisible) to the user, and he perceives work in a distributed database as in a regular local database.

Technologically in relational DBMS The presentation technique is implemented through the introduction of SQL constructs into the language, which allow, similarly to the “event-rules-procedures” technique, to create named query-views:

CREA TE VIEW NameRepresentations AS

SELECT...

FROM...

...;

In these constructions, after the name of the view and the keyword AS, a request for retrieving data is placed, which actually forms the corresponding view of a database object.

Authorization of views is carried out using GRANT commands (directives), present in the basic list of SQL language instructions (see section 4.1) and granting powers and privileges to users:

GRANT SELECTON ViewName TO UserName1, UserName2,...;

Accordingly, the REVOKE directive cancels previously set privileges.

Despite the simplicity and certain elegance of the idea of “representations”, the practical implementation of such technology for constructing and operating distributed systems encounters a number of serious problems. The first of them is related to the placement of the database system catalog, because when creating a “representation” of a distributed database for the user, the DBMS kernel must first of all “find out” where and in what form the data actually resides. The requirement that there is no central installation leads to the conclusion that any local installation must have a system directory. But then the problem of updates arises. If any user has changed data or its structure in the system, then these changes must be reflected in all copies of the system catalog. However, propagation of system catalog updates may encounter difficulties in the form of unavailability (busy) of system catalogs on other installations at the time of distribution of updates. As a result, the continuity of the consistent state of the data may not be ensured, and a number of other problems may arise.

The solution to such problems and the practical implementation of distributed information systems is carried out through a deviation from some of the principles discussed above for the creation and operation of distributed systems. Depending on what principle is sacrificed (lack of a central installation, continuity of operation, consistent state of data, etc.), several independent directions in distributed systems technologies have emerged - Client-Server technologies, replication technologies, object linking technologies.

Real distributed information systems, as a rule, are built on the basis of a combination of all three technologies, but from a methodological point of view it is advisable to consider them separately. Additionally, it should also be noted that the representation technique has proven to be extremely fruitful also in another area of DBMS data protection. The authorized nature of the requests that form views makes it possible to provide a specific user with the data and in the form that he needs for his immediate tasks, excluding the possibility of accessing, viewing and changing other data.

File-server architecture

With the advent of networks, data began to be stored on a file server. This is the first type of multi-user system. In this case, their search and processing occurs at workstations. With this approach, not only the data needed by the end user is sent to the workstation, but also data that is used only to complete the query (for example, fragments of index files or data that will be discarded when the query is executed). Thus, the amount of “unnecessary” information often exceeds the amount of “necessary” information.

Rice. 1.2.File server architecture diagram
The response time to a user request is the sum of the time it takes to transfer data from the file server to the workstation and the time it takes to complete the request on the workstation. In order for the response time of such a system to be acceptable, it is necessary to speed up data exchange with the disk and increase the volume RAM for caching data from disk. Also It is advisable to use a powerful computer as a workstation. The bottleneck may be the network environment, so the throughput of the network bus is also an important indicator. If the number of simultaneously working users and the volume of stored information increases, the size of the transmitted information increases, i.e. network traffic increases. And as a result, the system’s response time drops significantly. This technology implies that each workstation has its own copy of the DBMS, working with the same data. The interaction of these DBMSs to synchronize work through an intermediate link in the form of a file server leads to additional losses.

Real Architecture Diffusion"client-server"became possible thanks to the development and widespread implementation of the open systems concept. So we'll start with brief introduction into open systems.
On the previous page, the title mentioned the “file-server” architecture, but “client-server” is a different architecture.
The main point of the open systems approach is to simplify integration computing systems due to international and national standardization of hardware and software interfaces. The main motivation for the development of the concept of open systems was the widespread transition to the use of local computer networks and the problems of integrating hardware and software that this transition caused. Due to the rapid development of global communications technologies, open systems are becoming even more important and widespread.

One of the main principles of open systems aimed at users is independence from a specific supplier. By focusing on the products of companies that adhere to open systems standards, a consumer who purchases any product from such a company does not fall into slavery. He can continue to expand the power of his system by purchasing products from any other company that complies with the standards. Moreover, this applies to both hardware and software and is not an unreasonable declaration. The real possibility of independence from the supplier has been tested in domestic conditions.

The practical support of system and application software for open systems is a standardized operating system. Currently, such a system is UNIX. As a result of long work, the suppliers of various versions of the UNIX OS managed to come to an agreement on the basic standards of this operating system. Now all common versions of UNIX are basically compatible in terms of the interfaces provided to application, and in most cases, system programmers. According to many experts, despite the emergence of Windows NT, which claims to be a standard, it is possible that UNIX will remain the basis of open systems in the coming years.

Open systems technologies and standards provide a real and practice-tested opportunity to produce system and application software with the properties of portability and interoperability. The mobility property means the comparative ease of transferring a software system across a wide range of hardware and software that meets the standards. Interoperability means simplifying the integration of new software systems based on the use of ready-made components with standard interfaces.

Using an open systems approach benefits both manufacturers and users. First of all, open systems provide a natural solution to the problem of hardware and software generations. Manufacturers of such products do not solve all problems anew. They can, at least temporarily, continue to integrate systems using existing components. It should be noted that this creates a new level of competition. All manufacturers are required to provide some standard environment, but are forced to achieve the best possible implementation. Of course, after some time, existing standards will begin to act as a constraint on progress, and then they will have to be revised.

The advantage for users is that they can gradually replace old system components with more advanced ones without losing the functionality of the system. In particular, this is the solution to the problem of gradually increasing the computing, information and other capacities of a computer system.

The widespread use of computer networks of computers is based on the well-known idea of resource sharing. The high bandwidth of local networks provides efficient access from one computer network node to resources located in other nodes.

The development of this idea leads to the functional separation of network components: it is reasonable to have not only access to resources remote computer, but also receive from this computer a set of services that are specific to the resources of this computer. At the same time, it is impractical to duplicate software that supports these services in several network nodes. This is how we come to distinguish between workstations and network servers.

The workstation is designed for direct work user or category of users and has resources corresponding to the local needs of this user. Specific features of a workstation may include:

Amount of RAM (not all categories of users need large RAM);

Availability and volume disk memory(diskless workstations that use the external memory of a disk server are quite popular);

Processor and monitor specifications (some users may need powerful processor, others are more interested in the resolution of the monitor;

Still others require powerful tools for working with graphics, etc.).

If necessary, you can use the resources and/or services provided by the server.

A computer network server must have resources corresponding to its functional purpose and the needs of the network. Note that due to the focus on the open systems approach, it is more correct to talk about logical servers (meaning a set of resources and software that provide services over these resources), which are not necessarily located on different computers. The peculiarity of a logical server in an open system is that if, for reasons of efficiency, it is advisable to move the server to a separate computer, then this can be done without the need for any alteration, either of itself or of the application programs using it.

Examples of servers include:

a telecommunications server that provides services for connecting this local network with the outside world;
a computing or functional server that makes it possible to perform calculations that cannot be performed on workstations;
disk server with extended resources external memory and making them available for use by workstations and, possibly, other servers;
a file server that maintains common file storage for all workstations;
the database server is actually a regular DBMS that accepts queries over the local network and returns results;
and others.

A computer network server provides resources (services) to workstations and/or other servers. It is customary to call a computer network client, which requests a service from a certain server, and a server, a component of a computer network that provides service to some clients.

Client-server architecture

In relation to database systems, the client-server architecture is interesting and relevant mainly because it provides a simple and relatively cheap solution problems of collective access to databases on a local network. In some ways, database systems based on a client-server architecture are an approximation of distributed database systems, a greatly simplified approximation, of course, but one that does not require solving the core set of problems of truly distributed databases.

The general purpose of database systems is, of course, to support the development and execution of database applications. Therefore, at a high level, a database system can be viewed as a system with a very simple structure consisting of two parts - a server (a database machine called a database server) and a set of clients (or front-end).

The server is the DBMS itself. It supports all the basic functions of a DBMS: data definition, data processing, data protection and integrity, etc. In particular, it provides complete support at the external, conceptual and internal levels. Therefore, "server" in this context is just another name for the DBMS.
The client is the various applications that run "on top" of the DBMS: applications written by users and embedded applications provided by DBMS vendors or some third-party software vendors. Of course, from the users' point of view, there is no difference between built-in applications and applications written by the user - they all use the same server interface, namely the front-end interface.

The exceptions are special "utility" applications called utilities. Such applications can sometimes only work directly at the internal level of the system. Utilities refer to direct components of the DBMS rather than to applications in the usual sense. The following section of this lecture discusses utilities in more detail. Applications, in turn, can be classified into several clearly defined categories.

Applications written by users. These are mainly professional application programs written either in a common programming language such as C or PASCAL, or in some original languages such as FOCUS, although in both cases these languages must somehow communicate with the corresponding data sublanguage.
Applications provided by vendors (often called tools). In general, the purpose of such tools is to assist in the process of creating and running other applications, i.e. applications that are made specifically for some specific task (although the applications created may not look like applications in the generally accepted sense). Indeed, this category of tools allows users, especially end users, to create applications without writing traditional programs. For example, one of the vendor-provided tools may be a query language processor that allows the end user to issue unscheduled queries to the system. Each such request is, in essence, nothing more than special application(for example, ISQL DBMS MS SQL Server), designed to perform some specific functions.

The supplied tools, in turn, are divided into several independent classes:

query language processors;
report generators;
graphical business subsystems;
spreadsheets:
conventional language processors;
copy controls;
application generators;
other application development tools, including CASE products (CASE or Computer-Aided Software Engineering - software development automation), etc.

The details of these applications are beyond the scope of this course, but it should be noted that the main purpose of a database system is to support the creation and execution of applications, so the quality of the client tools available should be a primary concern in database selection (i.e., the process of selecting a suitable systems for a given customer). In other words, the DBMS itself is not the only and not necessarily the most important factor that needs to be taken into account.

It should be noted that since the system as a whole can be clearly divided into two parts (server and clients), it becomes possible to operate these two parts on different cars. In other words, there is the possibility of distributed processing.

Distributed processing assumes that individual machines can be connected by some kind of communication network in such a way that a particular data processing task can be distributed across multiple machines in the network. In fact, this capability is so compelling for various reasons, mostly practical, that the term "client/server" has come to be used exclusively when the server and clients are actually on different machines. This use of the term is careless, but very common. Technology that supports distributed data processing should provide the client with access to a distributed database in the same way as access to a centralized database. In this case, the data can be stored on a local node, on a remote node, or both nodes - their location must remain transparent to both the end user and the program.

The client/server architecture is characterized by the presence of one DBMS for all users, which is located on the server. At

With this technology, the user (client) program generates a request for data selection and sends the request to the server. The server selects data corresponding to the request being executed and sends it to the client program (application). The client program processes the received data and provides it to the user. In this case, the volume of transmitted information, and therefore network traffic, is significantly lower than when using a file server. It would be logical to expect that the overall response time should be reduced.

However, the response time in such a system consists of the request transmission time, the waiting time for resources on the server (for example, a processor or disk operation), the request execution time and the time for transmitting the results to the workstation - the client program. Moreover, the waiting time on the server can eat up the lion's share of the total request execution time. When developing programs that work using client-server technology, it is necessary to take this into account and not contact the server for one record, but read the data in batches.

If network traffic decreases, the computer acting as a server becomes the bottleneck. The requirements for it are very high. It is necessary to choose a powerful computer as a server, but it is not necessary to increase the power of workstations.

Currently, there are two client/server architecture models: two-tier and three-tier. A two-level model is characterized by a situation where the database consists of local database tables that are located on one node, and the database server operates there, and application programs are executed on client nodes

Two-tier client/server architecture model

Three-tier client/server architecture model

The three-level model is characterized by a situation where a distributed database consists of local database tables located on one node, data access programs and some application programs are located on another node (possibly on an application server), and client applications on client nodes (possibly only external interface).

For both models, a situation is possible when the database consists of local databases that are located on remote nodes (possibly geographically remote), then such a database is called distributed and a database server is required.

A database system can be considered as a system where the execution process is distributed according to the principle of interaction of two software processes, one of which in this model is called a “client”, and the other, serving the client, is a server (a machine that stores databases). The client process requests some services, and the server process provides them. It is assumed that one server process can serve many client processes.

Structure of the database system with separation of clients and server

In the simplest case, the server is the DBMS itself. It supports all the basic functions of a DBMS and provides complete support at the external, conceptual and internal levels.

Clients are various applications that run on the DBMS.

Typically, the application has the following groups of functions:

data input and display functions;
application functions that define the main algorithms for solving application problems;
data processing functions within the application,
information resource management functions;
service functions that play the role of links between the functions of the first four groups.

Two-tier client/server architecture model

If all five application components are distributed among only two processes that run on two platforms: on the client and on the server, then this model is called two-tier. It has several main varieties. Let's look at them.

The file server model is called the remote data management model. This model assumes the following distribution of functions - almost all parts of the application are located on the client: the presentation part of the application, application functions, as well as functions for managing information resources. The file server contains the files necessary for the operation of applications and the DBMS itself and supports access to files.

File server model

Because file transfer is a time-consuming process, this approach involves significant network traffic, which can lead to poor system performance.

In addition to this disadvantage, using a file server also has others:

Each workstation must have full copy DBMS;
managing concurrency, recovery, and integrity becomes more complex, since multiple DBMS instances can access the same files at once;
a narrow range of data manipulation operations, which is determined only by file commands;
Data protection is carried out only at the file system level.

The main advantage of this model is that it already divides the exclusive application into two interacting processes. In this case, the server can serve many clients turning to it with requests.

In the remote access model, the database is also stored on the server. The server also contains the DBMS kernel. The client hosts the parts of the application that support data input, display, and application functions.

The client contacts the server with requests for SQL language. The structure of the remote access model is shown in the figure.

Remote access model

The server accepts and processes requests from clients, verifies user credentials, ensures integrity constraints are met, performs data updates, executes queries and returns results to the client, maintains the system catalog, and provides concurrent database access and recovery. In addition, the network load is sharply reduced, since it is not file commands that are transmitted from clients to the server, but SQL queries, and their volume is significantly smaller. In response to requests, the client receives only the data that matches the request, rather than blocks of files, as in the file server model.

However, this technology also has a number of disadvantages:

SQL queries can significantly load the network during intensive work of client applications;
The presentation and application functions of the application must be repeated for each client application;
the server in this model plays a passive role, so information resource management functions must be performed on the client.

The client-server technology is supported by most modern DBMSs: Informix, Ingres, Sybase, Oracle, MS SQL Server. The basis of this model has been added to the mechanism of stored procedures and the mechanism of triggers.

The stored procedure mechanism allows you to create routines that run on the server and control its processes.

Thus, hosting stored procedures on the server means that the application functions of the application are divided between the client and the server. The information exchange traffic between the client and the server decreases sharply.

Centralized control of database integrity in the database server model is performed using a trigger mechanism. Triggers are also part of the database.

A trigger is a special type of stored procedure that responds to the occurrence of a specific event in the database. It is activated when you try to change data - during adding, updating and deleting operations. Triggers are defined for specific database tables.

Injecting triggers has little impact on server performance and is often used to enhance applications that perform multi-stage database operations.

In this model, the server is active, because not only the client, but also the server itself, using the trigger mechanism, can be the initiator of data processing in the database. Since the client functions are facilitated by transferring part of the application functions to the server, in this case it is called “thin”.

Despite all the positive qualities of this model, it still has one drawback - a very large server load.

Database server model

Why all these numbers???

Three-tier client/server architecture model

The three-tier model is an extension of the two-tier model and introduces an additional intermediate layer between the client and server. The architecture of the three-level model is shown in the figure.

Rice. ???. Three-tier model architecture
This architecture assumes that the client hosts: data input and display functions, including a graphical user interface, local editors, communication functions that provide client access to a local or global network.

Database servers in this model deal exclusively with the functions of managing database information resources: providing functions for creating and maintaining a database, maintaining database integrity, performing functions for creating database backups and restoring databases after failures, managing transaction execution, and so on.

The middle layer, which may contain one or more application servers, allocates common non-loadable functions for clients: the most common client application functions, functions that support the network domain operating environment, data directories, functions that provide messaging and query support.

The benefits of the three-tier model are most noticeable when clients are performing complex analytics on the database.

Principles of building distributed databases

Basic Concepts

A distributed database is a set of logically interconnected sets of shared data (and their descriptions) that are physically distributed in a computer network.

Distributed DBMS is a software package designed for managing distributed databases and providing transparent access users to distributed information.

A distributed database management system (distributed DBMS) consists of a single logical database distributed over a number of fragments. Each fragment of the database is stored on one or more computers running separate DBMSs and interconnected by a communication network. Any node is capable of independently processing user requests that require access to locally stored data (that is, each node has a certain degree of autonomy), and is also capable of processing data stored on other computers on the network.

Users interact with the distributed database through applications. Applications can be divided into those that require access to data on other nodes (local applications) and those that require such access (global applications). A distributed DBMS must have at least one global application, so the DBMS must have the following characteristics:

1. There is a set of logically related shared data.

2. The saved data is divided into a number of fragments.

3. Replication of data fragments may be provided.

4. Fragments and their copies are distributed across different nodes.

5. Nodes are interconnected by network connections.

6. Access to data on each node is controlled by the DBMS.

7. The DBMS on each node is capable of supporting the autonomous operation of local applications.

8. The DBMS of each node supports at least one global application.

What is USE in this picture??

Computer

Rice. 2.1.– Distributed DBMS topology

From the definition of a DBMS it follows that it must make data distribution transparent (invisible) to the end user. In other words, the fact that the distributed database consists of several fragments that can be located on different computers must be completely hidden from users. The goal of transparency is to make a distributed system appear externally like a centralized one. This requirement is sometimes called the basic principle of creating distributed DBMSs.

It is very important to understand the differences between distributed DBMSs and distributed data processing tools.

Distributed data processing is processing using a centralized database that can be accessed from different computers on the network.

The key point in defining a distributed DBMS is that the system operates on data physically distributed over a network. If data is stored centrally, then even if access to it is provided for any user over the network, this system simply supports distributed processing, but cannot be considered as a distributed DBMS.

Advantages and disadvantages of distributed DBMSs

Distributed database systems have additional advantages over traditional centralized database systems. Unfortunately, this technology is not without some disadvantages.

Advantages

Reflection of the structure of the organization.

Large organizations, as a rule, have many branches, which may be located in different parts of the country and even beyond its borders.

High degree of separability and local autonomy.

The geographic distribution of an organization can be reflected in the distribution of its data, with users at one site able to access data stored at other sites. The data can be placed on the node where the users who most often work with this data are registered. As a result, interested users gain local control over the data they require and can set or regulate local restrictions on its use. The global database administrator is responsible for the system as a whole. Typically, some of this responsibility is delegated to the local level, giving the local database administrator the ability to manage the local DBMS.

Increased data availability.

In centralized DBMSs, the failure of the central computer causes the entire DBMS to stop functioning. However, the failure of one of the nodes of a distributed DBMS or the communication line between nodes leads to the fact that only some nodes become unavailable, while the entire system as a whole remains operational. Distributed DBMSs are designed in such a way that they can function despite such failures. If one of the nodes fails, the system will be able to redirect requests addressed to the failed node to another node.

Increased reliability.

If data replication is organized, as a result of which data and their copies will be placed on several nodes, the failure of an individual node or the communication line between nodes will not lead to the termination of access to data in the system.

Increased productivity.

If the data is located on the busiest node, which has inherited a high level of parallel processing from its predecessor systems, then deploying an additional DBMS can help improve the speed of database access (compared to access to a remote centralized DBMS). Moreover, since each node only accesses part of the database, the utilization rate central processor and I/O services may be lower than in the case of a centralized DBMS.

Economic benefits.

In the 1960s, computing power increased in proportion to the square of the cost of its hardware, so that a system that cost three times as much as a given one was nine times more powerful. This relationship is called Grosch's law. However, it is now generally accepted that it is much cheaper to assemble a system from small computers whose power is equivalent to that of large computer. It turns out that it is much more profitable to install your own low-power computers in departments of the organization, in addition, it is much cheaper to add new workstations to the network than to upgrade a mainframe system.

A second potential source of savings occurs when databases are geographically dispersed and applications require access to distributed data. In this case, due to the relatively high cost of transmitting data over the network (compared to the cost of processing it locally), it may be cost effective to split the application into its appropriate parts and perform the necessary processing locally on each node.

Modularity of the system.

In a distributed environment, expanding an existing system is much easier. Adding a new node to the network does not affect the functioning of existing ones. This flexibility allows the organization to easily expand. Overloads due to increasing database size are usually resolved by adding new ones to the network. computing power and external memory devices. In centralized DBMSs, expanding the database may require replacing the hardware (with a more powerful system) and the software used (with a more powerful or more flexible DBMS).

Flaws

Increasing difficulty.

Distributed DBMSs, capable of hiding the distributed nature of the data they use from end users and providing the necessary level of performance, reliability and availability, are certainly more complex software systems than centralized DBMSs. The fact that data can be copied also creates an additional prerequisite for increasing the complexity of distributed DBMS software. If data replication is not maintained at the required level, the system will have lower levels of data availability, reliability and performance than centralized systems, and all the advantages outlined above will turn into disadvantages.

Increase in cost.

An increase in complexity also means an increase in the costs of acquiring and maintaining a distributed DBMS (compared to conventional centralized DBMSs). In addition, the deployment of a distributed DBMS requires additional hardware necessary to establish network connections between nodes. We should also expect an increase in the cost of paying for communication channels caused by the growth of network traffic. In addition, labor costs for personnel required to maintain local DBMS and network connections will increase.

Protection problems.

In centralized systems, access to data is easily controlled. However, in distributed systems it will be necessary to organize access control not only to copied data located at several production sites, but also to protect the network connections themselves. Previously, networks were viewed as unsecured communications infrastructures. Although this is partly true today, significant progress has been made in securing network connections.

Complicating data integrity control.

Database integrity refers to the correctness and consistency of the data stored in it. Integrity requirements are usually formulated in the form of certain restrictions, the implementation of which will guarantee the protection of information in the database from destruction. Implementing integrity constraints typically requires access to a large amount of data used to perform checks, but does not require update operations. In distributed DBMSs, the increased cost of data transfer and processing can hinder the organization effective protection from data integrity violations.

Lack of standards.

Although it is quite obvious that the functioning of distributed DBMSs depends on the efficiency of the communication channels used, only recently have the outlines of standards for communication channels and data access protocols begun to emerge. The lack of standards significantly limits the potential capabilities of distributed DBMSs. In addition, there are no tools or methodologies that can help users transform centralized systems into distributed systems.

Lack of experience.

At the moment, general-purpose distributed systems have not yet become widespread. Accordingly, the necessary experience in the industrial operation of distributed systems, comparable to the experience in the operation of centralized systems, has not yet been accumulated. This state of affairs is a serious deterrent for many potential supporters of this technology.

Complicating the database development procedure.

Designing distributed databases, in addition to the usual challenges associated with the centralized database design process, requires decisions about data fragmentation, distribution of fragments across individual nodes, and data replication. Such complexities add to the already challenging process of database design.

Homogeneous and heterogeneous distributed DBMSs

Distributed DBMSs are divided into homogeneous and heterogeneous. In homogeneous systems, all nodes use the same type of DBMS. In heterogeneous systems, different types of DBMS can operate on nodes using different data models, i.e. a heterogeneous system may include nodes with relational, network, hierarchical or object-oriented DBMSs.

Homogeneous systems are much easier to design and maintain. In addition, this approach allows you to gradually increase the size of the system, sequentially adding new nodes to an existing distributed system. Additionally, it becomes possible to increase system performance by organizing on different nodes parallel processing information.

Heterogeneous systems usually arise when independent nodes already operating their own systems with databases, are eventually integrated into the newly created distributed system. In heterogeneous systems, in order to organize interaction between different types of DBMS, it is necessary to ensure the conversion of transmitted messages. To ensure transparency regarding the type of DBMS used, users of each node should be able to form queries of interest in the language of the DBMS used on their local node. The system must undertake the search for the required data and perform all necessary transformations of the transmitted messages. In the general case, data can be requested from another node, which is characterized by the following features: a different type of equipment used, a different type of DBMS used, a different type of equipment and DBMS used.

If a different type of equipment is used, but the nodes use the same DBMS, methods for performing transformations are quite obvious and include replacing codes and changing the length of the machine word. If the types of DBMS used on the nodes are different, the conversion procedure is complicated by the fact that it is necessary to convert the data structures of one data model into equivalent structures of another data model.

A typical solution used in some relational systems is for separate parts of heterogeneous distributed systems to use gateways designed to convert the language and data model of each type of DBMS used into the language and data model of the relational system. However, the gateway approach has some serious limitations. Firstly, gateways do not allow organizing a transaction management system even for individual pairs of systems. In other words, the gateway between two systems is nothing more than a request translator. For example, gateways do not allow the system to coordinate the management of concurrent execution and recovery procedures for transactions that involve updating data in both databases. Secondly, the use of gateways allows us to solve only the problem of translating queries from the language of one DBMS to the language of another. Therefore, they usually do not solve the problem of creating a homogeneous structure and eliminating differences between data representations in different systems.

Ensuring transparency in a distributed DBMS

Transparency refers to hiding information about a specific system implementation from users. Distributed DBMSs can provide different levels transparency. However, in any case, the same goal is pursued: to make working with a distributed database completely similar to working with a conventional centralized DBMS. There are four main types of transparency that can occur in a distributed database system.

1. Transparency of placement.

2. Transparency of transactions.

3. Transparency of implementation.

4. Transparency in the use of the DBMS.

It should be noted that full transparency is not always accepted as one of the main goals. Full transparency makes managing distributed data an extremely difficult task. Additionally, applications written with full access visibility across a geographically distributed database typically suffer from poor manageability, modularity, and message processing performance. It should also be noted that it is rare to find all the types of transparency discussed here in one system.

Transparency of placement

Transparency of database layout allows end users to perceive the database as a single logical whole. If a distributed DBMS provides location transparency, then the user does not need to consider data fragmentation or location.

Transparency of fragmentation.

Fragmentation transparency is the highest level of placement transparency. If a distributed DBMS provides transparent fragmentation, then the user does not need to know exactly how the data is fragmented. In this case, access is based on the global schema and the user does not need to specify the names of the fragments or the location of the data.

Location transparency.

Location transparency represents the average level of location transparency. In this case, the user must have knowledge of how data is fragmented in the system, but does not need information about the location of the data in the system.

Replication transparency.

Very closely related to location transparency is another type of transparency: replication transparency. It means that the user does not need to know about existing copies of fragments. Replication transparency refers to the transparency of the location of copies.

Local display transparency.

This is the lowest level of placement transparency. If there is a local mapping in the transparency system, the user must specify both the names of the fragments used and the location of the corresponding data elements, taking into account the presence of all necessary copies.

Naming transparency.

A direct consequence of the placement transparency options discussed above is the requirement for naming transparency. As with a centralized database, each element of a distributed database must have a unique name. Therefore, a distributed DBMS must ensure that no two system nodes can create a database object that has the same name. One solution to this problem is to create a central name server, which will be responsible for the complete uniqueness of all names existing in the system. However, this approach has the following disadvantages:

1. Loss of a certain part of local autonomy.

2. Performance problems arise (as the central node becomes the bottleneck of the entire system).

3. Reduced availability - if the central node becomes unavailable for any reason, all other nodes in the system will not be able to create new database objects.

An alternative solution is to use prefixes placed on object names to identify the node that created the object. However, this approach leads to a loss of placement transparency.

An approach that overcomes the disadvantages inherent in both of these methods is to use aliases (synonyms) created for each of the database objects. The task of converting aliases into names of corresponding database objects is assigned to the distributed DBMS.

Transaction transparency

Transaction transparency in a distributed DBMS environment means that when performing any distributed transactions, the integrity and consistency of the distributed database is guaranteed to be maintained. A distributed transaction accesses data stored in multiple locations. Each transaction is divided into several subtransactions - one for each node whose data is accessed.

Execution Transparency

Execution transparency requires that work in a distributed DBMS environment be performed in exactly the same way as in a centralized DBMS environment. In a distributed environment, system operation should not exhibit performance degradation associated with its distributed architecture, such as slow network connections. Execution transparency also requires that the distributed DBMS be able to discover the most efficient query execution strategies.

In a centralized DBMS, the query processor must evaluate each query that requires access to data and find the optimal execution strategy, which is an ordered sequence of database operations. In a distributed environment, a distributed query processor transforms a query that requires access to data into an ordered sequence of operations on local databases. In this case, additional complexity arises due to the need to take into account the presence of fragmentation, replication and a certain data layout. The distributed query processor must figure out:

1. Which fragment should you refer to?

2. Which copy of the fragment to use if its data is involved in replication.

3. Which data storage location should you contact?

The distributed query processor develops an execution strategy that is optimal in terms of some cost function. Typically distributed queries are evaluated using the following metrics:

1. Access time including physical access to the data on the disk.

2. CPU time spent processing data in RAM.

3. Time required to transfer data over network connections.

The first two factors are similar to those taken into account in centralized systems. However, in a distributed DBMS environment, it is also necessary to take into account the costs of data transfer, which in many cases turn out to be dominant. In such situations, you can ignore I/O costs and CPU time when optimizing. But local networks have data transfer speeds comparable to disk access speeds. In this case, optimization must take into account all three cost indicators.

Transparency of DBMS usage

Transparency in the use of a DBMS makes it possible to hide from the user of a distributed DBMS the fact that different local DBMSs can operate on different nodes. Therefore, this type of transparency is only applicable in the case of heterogeneous distributed systems. This is typically one of the most difficult types of transparency to implement.

Twelve Rules of Data

Basic principle. From the end user's point of view, a distributed system should look exactly the same as a regular non-distributed system.

Rule 1. Local autonomy.

Nodes in a distributed system must be autonomous. In this context, autonomy means the following:

1. Local data is locally owned and maintained locally.

2. All local operations remain strictly local.

3. All operations on a given node are controlled only by this node.

Rule 2. No dependence on the central node.

There should not be a single node in the system without which it could not function. This means that there should be no central servers for services such as transaction management, deadlock detection, query optimization, and global system catalog management.

Rule 3. Continuous operation.

Ideally, a system should never require a planned shutdown to perform the following operations:

1. Adding and removing a node from the system.

2. Dynamic creation and removal of fragments from one or more nodes.

Rule 4. Location independent.

Location independence is equivalent to location transparency. The user must access the database from any of the nodes. Moreover, the user must access any data as if it were stored on his node, regardless of where it is physically located.

Rule 5. Independence from fragmentation.

The user should be able to access the data regardless of how it is fragmented.

Rule 6. Independence from replication.

The user should not need to know whether there are copies of the data. This means that the user does not have to access a specific copy of the data element directly and worry about updating all existing copies of the data element.

Rule 7. Processing distributed requests.

The system must support processing queries that reference data located on multiple nodes.

Rule 8. Processing of distributed transactions.

The system must support the execution of a transaction as a unit of recovery. The system must ensure that global and local transactions are carried out while maintaining the four basic properties of transactions: continuity, consistency, isolation and durability.

Rule 9. Independence from the type of equipment.

A distributed DBMS must operate on hardware with different computing platforms.

Rule 10. Operating system independence.

A direct consequence of the previous rule is the requirement that a distributed DBMS must operate under different operating systems.

Rule 11. Independence from network architecture.

A distributed DBMS must function in a wide variety of communication networks.

Rule 12. Database independence.

It should be possible to create a distributed DBMS based on local DBMSs of various types, the functioning of which may even be based on the support different models data. In other words, a distributed DBMS must support heterogeneous architecture.

The last four rules are still only an unattainable ideal. Since their wording is too general and there are no standards for computer and network architecture, in the foreseeable future we can only count on partial compliance with the requirements of the last four rules by developers of distributed DBMSs.

Oracle DBMS capabilities for building distributed databases

Oracle distributed DBMS

Oracle was the first to introduce distributed databases back in the early 1980s in response to demands for access to data across multiple platforms.

Like many other commercial distributed DBMSs, Oracle does not support the type of fragmentation mechanism described above, but the database administrator can manually distribute data to achieve a similar effect. But in this case, the end user must know how the table is fragmented and take this information into account when using the application. In other words, the distributed Oracle DBMS does not support fragmentation transparency. However, this DBMS supports location transparency.

Distributed Oracle databases can be combined by establishing links between individual databases. By using views, procedures, and synonyms, the programmer can make access to these distributed data sources transparent to a specific location.

To execute queries, distributed Oracle databases are linked by establishing and maintaining database links. Most often, a database link is created by the database administrator, who is responsible for managing access to distributed data sources.

When you create a database link, you use the name of the database you want to link to and the name of the server or domain that hosts it.

Typically, links are created as private or public, depending on the permissions assigned to create those links and the type of access that should be granted to users. Private connections limit the number of users, while any user can work with public connections. By default, communication users and their passwords will match the settings of the source database. You can also create associations with specified usernames and passwords. These relationships are called authenticated relationships and can be used to

R
Database link

identified by…

Database

EMP
DEP
Insert into EMP@SALES … ;

delete from DEP ... ;

select ... from EMP@SALES ... ;

Application

establishing both private and public connections.

A database link is a kind of pointer that defines a one-way link between two database servers.

Rice. 3.1– Database link operation scheme

What is SERVE in this picture??

The database link specifies the following information:

1. Network protocol(for example, TCP/IP) that is used for the connection.

2. The name of the remote host (computer on the network) on which the remote database is located.

3. Database name on the remote host.

4. Name of the account for accessing the remote database.

5. Password for this account.

Local database

Remote

Rice. 3.2.– Diagram of connection to a remote database

A database link can be created using the following command:

create database link

connect to (current_user | identified by
}

using "(DESCRIPTION =

Distributed information systems.

Internet/Intranet based architecture with migrating programs

Distributed system

In the literature you can find various definitions distributed systems, and none of them is satisfactory or consistent with the others.

For our purposes, a fairly free description will suffice.

Distributed system is a set of independent computers that appears to their users as a single unified system.

This definition makes two points. The first relates to the hardware: all machines are autonomous.

The second concerns software: users think they are dealing with unified system. Both points are important. We'll come back to them later in this chapter, but first we'll cover some basic issues related to both hardware and software.

Characteristics of distributed systems:

1. The differences between computers and methods of communication between them are hidden from users. The same applies to the external organization of distributed systems.

2. Users and applications experience a consistent experience across distributed systems, no matter where or when they interact.

Distributed systems should also be relatively easy to expand, or scale. This characteristic is a direct consequence of having independent computers, but at the same time does not indicate how these computers are actually combined into a single system.

Distributed systems usually exist permanently, but some parts of them may temporarily fail. Users and applications should not be notified that parts of the system have been replaced or repaired, or that new ones have been added to support additional users.

To maintain a unified view of the system, distributed systems often include an additional layer of software that sits between top level, which contains users and applications, and the lower layer, which consists of operating systems.

Rice. 1.1. The distributed system is organized as a middleware service.

Accordingly, such a distributed system is usually called intermediate level system (middleware). Note that the intermediate layer is distributed among many computers.

Typically, a system in which more than one database server operates is considered distributed. This is used to reduce the load on the server and ensure the operation of geographically remote departments. The varying complexity of creation, modification, maintenance, and integration with other systems make it possible to divide information systems into classes of small, medium and large distributed systems. Small information systems have a short life cycle (life cycle), orientation towards mass use, low price, impossibility of modification without the participation of developers, using mainly desktop database management systems (DBMS), homogeneous hardware and software, which do not have security features. Large corporate information systems, federal-level systems and others have a long life cycle, migration of legacy systems, diversity of hardware and software, the scale and complexity of the tasks being solved, the intersection of many subject areas, analytical data processing, and territorial distribution of components.

Distributed databases (RDB)- a set of logically interconnected databases distributed on a computer network.

The RDB consists of a set of nodes connected by a communication network in which:

each node is a full-fledged DBMS in itself;

nodes interact with each other in such a way that a user of any of them can access any data on the network as if it were on his own node.

Each node is itself a database system. Any user can perform operations on data on his local node in the same way as if this node was not part of the distributed system at all. A distributed database system can be thought of as a partnership between separate local DBMSs on separate local nodes.

Fundamental principle for creating distributed databases (“Rule 0”): To the user, a distributed system should look the same as a non-distributed system.

A fundamental principle entails certain additional rules or purposes. There are only twelve such goals:

Local independence. Nodes in a distributed system must be independent, or autonomous. Local independence means that all operations on a node are controlled by that node.

Lack of support for the central node. Local independence implies that all nodes in a distributed system should be treated as equals. Therefore, there should not be any calls to the "central" or "master" node in order to obtain some centralized service.

Continuous operation. Distributed systems should provide a higher degree of reliability and availability.

Location independent. Users should not know where exactly the data is physically stored and should act as if all the data was stored on their own local node.

Fragmentation independent. A system supports fragmentation independence if a given relation variable can be divided into parts or fragments when organizing its physical storage. In this case, data can be stored in the place where it is most often used, which allows localization of most operations and reduced network traffic.

Replication independent. A system supports data replication if a given stored relation variable - or in general a given fragment of a given stored relation variable - can be represented by several separate copies or replicas that are stored on several separate nodes.

Processing distributed requests. The point is that a request may need to contact multiple nodes. In such a system, there may be many possible ways to forward data to satisfy the request in question.

Distributed transaction management. There are 2 main aspects of transaction management: recovery management and concurrency management. With regard to recovery management, to ensure the atomicity of a transaction in a distributed environment, the system must ensure that the entire set of agents related to a given transaction (an agent is a process that runs for a given transaction on a separate node) has either committed its results or performed a rollback. As for concurrency control, in most distributed systems it is based on a blocking mechanism, just like in non-distributed systems.

Hardware independence. It is desirable to be able to run the same DBMS on different hardware platforms and, moreover, to ensure that different machines participate in the operation of a distributed system as equal partners.

Operating system independent. Ability to operate the DBMS under various operating systems.

Network independence. The ability to support many fundamentally different nodes, differing in hardware and operating systems, as well as a number of different types of communication networks.

Independence from the type of DBMS. It is necessary that the DBMS instances on different nodes all support the same interface, and it is not at all necessary that these are copies of the same version of the DBMS.