• Main classes of subd. Relational database

    To the beginning

    Databases and DBMS

    Information systems

    One of the most important conditions for ensuring the effective functioning of any organization is the presence of a developed information system. Information system implements automated collection, processing and manipulation of data, contains technical means of data processing, software and maintenance personnel.

    The modern form of information systems are data banks, which include a computer system, one or more databases (DB), and a management system databases(DBMS) and a set of application programs (AP). The main functions of data banks are:

    Data storage and protection;

    Changing (updating, adding and deleting) stored data;

    Search and selection of data based on user requests;

    Data processing and output of results.

    Databaseprovides storage of information and is a named collection of data organized according to certain rules, including general principles for describing, storing and manipulating data.

    Database management system is a package of application programs and a set of language tools designed for creating, maintaining and using databases.

    Application programs (applications) as part of data banks are used for data processing, calculations and generation of output documents in a given form.

    Applicationis a program or set of programs that use a database and provide automation of information processing from a certain subject area. Applications can be created both in the DBMS environment and outside the DBMS - using a programming system, for example,Delphi or C++Builder, using database access tools.

    To work with a database, in many cases you can only use DBMS tools, for example, creating queries and reports. Applications are developed mainly in cases where it is necessary to ensure the convenience of working with the database for unskilled users or the DBMS interface does not suit the user.

    The most important advantage of using databases in information systems is ensuring the independence of data from application programs. There is no need to deal with issues of data placement in memory, methods of accessing it, etc.

    Such independence is achieved by the multi-level presentation of data in the database at the logical (user) and physical levels supported by the DBMS.

    As a rule, the main criterion for the optimal functioning of a database is the time characteristics of the implementation of user requests by application programs.

    Database creation tools

    File systems

    Development of basic concepts of data representation

    Any computational process is a mapping of some input data into output data.

    The ratio of the complexity of the representation of the processed data and the calculation algorithm determines two classes of problems:

    - computational tasks - a fairly simple presentation of data and a complex calculation process;

    - data processing tasks (non-computational tasks) – a simple data processing algorithm and complexpresentation of processed data.

    In accordance with this, it is necessary to pay attention to both the development of an algorithm for solving the problem and the methods of presenting the processed data.

    Since the late 60s, computers have been intensively used to solve so-called non-computational problems associated with the processing of various kinds of documents. When using file systems, data is stored in a file designed only for this task. In this case, a description of the data is included in the application program. Changing the file record format requires changing the application program. Thus, the software system that solves the problem defines and manages its own data.

    Disadvantages of file systems

    1. The record structure of a file is known only to the program in which it was created. Changing the structure requires changing the programs that use this data file. Thus, programs are data dependent .

    2. Problems with access authorization. You can use OS tools to restrict access. This solution is possible, but inconvenient. We need centralized methods of accessing information.

    3. Problems with organizing multi-user access. File management systems provide multi-user mode, but have features that make them difficult to use for databases. There are no problems when reading data from multiple users. Making changes requires synchronizing user actions. Typically, when opening a file, the mode (read/write) is indicated. If at this point the file is open by another process in modification mode, then the OS either reports that the file cannot be opened, or the action is blocked until the other process closes. In any case, either several users cannot modify the database at the same time, or the process is slow.

    In an application program that uses one or more separate files to solve a problem, the programmer working with this task was responsible for the safety and reliability of the data. Using a database involves working with it in several application programs, problem solvers different users.

    Naturally, a programmer solving one of the applied problems can no longer be responsible for the safety and reliability of integrated data. In addition, expanding the range of tasks solved using a database can lead to the emergence of new types of records and relationships between them. Such a change in the database structure should not lead to changes in the multitude of previously developed and successfully functioning application software systems that work with the database. On the other hand, a possible change in any of the application programs, in turn, should not lead to a change in the data structure. All of the above determinesthe need to separate data from application programs.

    Database management systems

    The role of the interface between application programs and the database, ensuring their independence, is played by a software package - a database management system (DBMS).

    DBMS is a software package for supporting an integrated set of data, designed for creating, maintaining and using a database by many users (application programs).

    Basic functions of a database management system.

    1. Determining the structure of the database to be created, its initialization and initial loading

    2. Providing users with the ability to manipulate data (selecting the necessary data, performing calculations, developing an input/output interface, visualization).

    3. Ensuring application program independence and (logical and physical independence).

    4. Protecting the logical integrity of the database.

    5. Protecting physical integrity.

    6. Managing user permissions to access the database.

    7. Synchronizing the work of multiple users.

    8. Storage environment resource management.

    9. Supporting the activities of system personnel.

    1. Determining the structure of the database to be created, its initialization and initial loading. In most modern DBMSs, the database is represented as a collection of tables.

    2. Providing users with the ability to manipulate data (selecting the necessary data, performing calculations, developing an input/output interface, visualization). Such capabilities in a DBMS are presented either through the use of a special programming language included in the DBMS, or using a graphical interface.

    3. Ensuring the independence of application programs and data (logical and physical independence). The most important property of a DBMS is the ability to support two independent views of the database - the “user’s view”, embodied in the logical representation of data, and its reflection in application programs; and “system view” - the physical representation of data in computer memory. Ensuring logical data independence provides the ability to change (within certain limits) the logical representation of the database without having to change the physical data storage structures. Thus, changing the logical representation of data in application programs does not lead to changes in data storage structures. Ensuring physical independence of data makes it possible to change (within certain limits) ways of organizing a database in computer memory without causing the need to change the “logical” representation of data. Thus, changing the way the database is organized does not change the application programs.

    4. Protecting the logical integrity of the database.

    The main purpose of implementing this function is to increase the reliability of the data in the database. The reliability of data may be compromised when it is entered into the database or due to unlawful actions of data processing procedures that receive and enter incorrect data into the database. To increase the reliability of data, so-called integrity constraints are declared in the system, which in certain cases “catch” incorrect data. Thus, in all modern DBMSs, the compliance of the entered data with its type described when creating the structure is checked. The system will not allow you to enter a character in the field numeric type, will not allow you to enter an invalid date, etc. In developed systems, integrity constraints are described by the programmer based on the substantive meaning of the problem, and they are checked every time the data is updated. More details

    5. Protecting Physical Integrity. When operating a computer, malfunctions (for example, due to a power outage) and damage to computer storage media are possible. In this case, connections between data may be broken, which leads to the impossibility of further work. Developed DBMSs have database recovery tools. The most important concept used is that of “transaction”. Transaction is a unit of actions performed with the database. A transaction can contain multiple database modification statements, but either all of them are executed or none are executed. The DBMS, in addition to maintaining the database itself, also maintains a transaction log.

    We will illustrate the need to use transactions in databases using a simplified example. Let's assume that the database is used in a certain bank and one of the clients wants to transfer money to the account of another client of the bank. The database stores information about the amount of money each client has. We need to make two changes in the database - reduce the amount of money in the account of one of the clients and, accordingly, increase the amount of money in the other account. Of course, actually transferring money at a bank is a much more complex process involving many tables and possibly many databases. However, the essence remains the same - you need to either perform all the actions (increase the account of one client and decrease the account of another), or not perform any of these actions. You cannot reduce the amount of money in one account but not increase the amount of money in another.

    Let's also assume that after performing the first action (decreasing the amount of money in the first client's account), a failure occurred. For example, the connection between the client computer and the database may have been interrupted, or a system failure may have occurred on the client computer, causing the operating system to restart. What happened to the database in this case? The command to decrease money in the first client's account was executed, but the second command to increase money in the other account was not, which would lead to an inconsistent, out-of-date database state.

    Using the transaction mechanism allows you to find a solution in this and similar cases. Before performing the first action, a transaction start command is issued. The transaction includes the operation of withdrawing money in one account and increasing the amount in another account. The transaction completion statement is usually called COMMIT. Since the transaction has not been completed since the first action, no changes will be made to the database. Changes are made (committed) only after the transaction is completed. Until this statement is issued, no data will be saved in the database. In our example, since the transaction commit statement was not issued, the database will “roll back” to its original state - in other words, the amounts in the customer accounts will remain the same as they were before the transaction began. The database administrator can monitor the status of transactions and, if necessary, manually roll back transactions.

    In addition, in obvious cases, the DBMS independently decides to “roll back” the transaction.

    Transactions do not have to be short. There are transactions that last several hours or even several days. Increasing the number of actions within one transaction requires an increase in the occupied system resources. Therefore, it is advisable to keep transactions as short as possible. All transactions are recorded in the transaction log - both committed and those completed by “rollback”. Maintaining a transaction log together with creating database backups allows you to achieve high database reliability.

    Let's assume that the database was corrupted as a result of a hardware failure of the computer on which the DBMS server was installed. In this case, you need to use the last one made backup copy databases and transaction log. Moreover, you only need to apply to the database those transactions that were committed after the backup was created. Most modern DBMSs allow the administrator to recreate the database based on a backup and transaction log. In such systems, at some point the database is copied to backup media. All access to the database is recorded programmatically in the change log. If the database is destroyed, a recovery procedure is launched, during which all changes made are made to the backup copy from the change log.

    6. Managing user permissions to access a database.

    Different users may have different permissions to work with data (some data must be inaccessible; certain users are not allowed to update data, etc.). The DBMS provides mechanisms for delimiting access powers, based either on the principles of passwords or on a description of powers.

    7. Synchronizing the work of multiple users.

    Quite often there may be a situation where several users simultaneously perform an update operation on the same data. Such collisions can lead to a violation of the logical integrity of the data, so the system must provide measures that prevent other users from updating the data until the user working with this data has completely finished working with it. The main concept used here is “blocking”. Locks are necessary in order to prevent different users from working with the database at the same time, since this can lead to errors.

    To implement this prohibition, the DBMS sets a lock on the objects that the transaction uses. There are different types of locks - table, page, row and others, which differ from each other in the number of locked records.

    Row locking is used most often - when a transaction accesses one row, only this row is locked, the remaining rows remain available for change.

    Thus, the process of making changes to the database consists of the following sequence of actions: a transaction start statement is issued, a data change statement is issued, the DBMS analyzes the statement and tries to establish the locks necessary for its execution, if the blocking is successful, the statement is executed, then the process is repeated for the next transaction operator. After all statements within a transaction have successfully completed, the transaction commit statement is executed. The DBMS commits the changes made by the transaction and releases the locks. If any of the operators fails, the transaction is “rolled back”, the data receives its previous values, and the locks are released.

    8. Storage Environment Resource Management.

    The database is located in the external memory of the computer. When working in the database, new data is entered (memory is occupied) and data is deleted (memory is freed). The DBMS allocates memory resources for new data, redistributes freed memory, organizes a queue of requests to external memory, etc.

    9. Supporting the activities of system personnel.

    When operating a database, there may be a need to change DBMS parameters, select new access methods, change (within certain limits) the structure of stored data, as well as perform a number of other system-wide actions. The DBMS provides the ability to perform these and other actions to support the activities of the database to the system personnel serving the database, called the database administrator.

    DBMS classification

    DBMSs are usually divided according to the data model used (like databases) into the following types: hierarchical, network, relational and object-oriented.

    Based on the nature of use, DBMSs are divided into personal(DBDP) and multi-user(DBDM).

    Personal DBMSs includeVisual FoxPro, Paradox, Clipper, dBase, Accessetc. Multi-user DBMSs include, for example, DBMSOracle And Informix.Multi-user DBMS include a database server and a client part, operate in a heterogeneous computing environment - different types of computers and different operating systems are allowed. Therefore, on the basis of a DBMS it is possible to create an information system operating using client-server technology. The versatility of multi-user DBMSs is reflected accordingly in the high price and computer resources required to support them.

    A DBMS is a set of language and software tools designed for creating, maintaining and using a database.

    Personal DBMS provide the ability to create personal databases and inexpensive applications that work with them, and, if necessary, create applications that work with the database server.

    The control component of many DBMSs is the kernel, which performs the following functions:

    - data management in external memory;

    - buffer management RAM(workspaces into which data is pumped from the database to increase work speed);

    - transaction management.

    Transaction - this is a sequence of operations on a database, considered by the DBMS as a single whole. Under transaction refers to the impact on the database, transferring it from one integral state to another. The impact is expressed in changes to data in database tables.

    If one of the changes made to the database as part of a transaction fails, a rollback must be made to the state of the database that occurred before the transaction began. Consequently, all changes made to the database as part of a transaction are either simultaneously confirmed, or none of them are confirmed.

    When executed, the transaction can either be completed successfully, and the DBMS will record the changes made in external memory. If the PC hardware fails, none of the changes will be reflected in the database. The concept of a transaction is necessary to maintain the logical integrity of the database.

    Ensuring database integrity - a necessary condition for the successful functioning of the database. Database integrity- a property of a database, meaning that the database contains complete and consistent information necessary and sufficient for the correct functioning of applications. To ensure the integrity of the database, integrity constraints are imposed in the form of certain conditions that must be satisfied by the data stored in the database. An example of such conditions is the restriction of the ranges of possible values ​​for the attributes of objects, information about which is stored in the database, or the absence of duplicate records in the tables of a relational database.

    Security This is achieved in a DBMS by encrypting application programs, data, password protection, and supporting levels of access to the database and to a separate table.

    Expanding the capabilities of the DBMS user is achieved by connecting systems for constructing graphs and diagrams, as well as connecting modules written in languages programming.

    Network support is provided by:

    means of controlling user access to shared data, i.e. means of blocking files (tables), records, fields, which are implemented to varying degrees in different DBMSs;

    means of a transaction mechanism that ensures the integrity of the database when operating on the network.

    Support for interaction with Windows applications allows the DBMS to embed information in a report that is stored in files created using other applications, for example, in a documentWordor in a workbookExcel, including graphics and sound. For this purpose, the DBMS supports mechanisms developed for the environmentWindows, such as: DDE { Dynamic Data Exchange - dynamic data exchange) andOLE { Object Linking and Embedding - object binding and embedding).

    Data presentation levels

    Modern approaches to creating databases involve their three-level organization. This method of organizing a database was proposed American National Standards Institute (ANSI) ) and is used everywhere.

    At the topmost (outer) level there can be many models. This level defines the viewpoint of individual users (applications) on the database. Each application sees and processes only the data that it needs.

    At the conceptual level, the database is presented in the most general form, which unites all external representations of the subject area. At the conceptual level, we have a generalized model of the subject area for which the database was created. There is only one conceptual representation. When developing a conceptual model, efforts are aimed at structuring data and identifying relationships, without considering the specifics of implementation and development efficiency.

    The internal (physical) level is the actual data located on external storage media. The internal model determines the placement of data, access methods, and indexing techniques.

    The three-level organization of the database allows for logical and physical independence when working with data. Logical independence implies the ability to change one application without adjusting other applications working with the same database.

    Physical independence implies the ability to transfer stored information from one media to another while maintaining the functionality of all applications that use this database.

    Classification of data models

    A data model is a set of rules by which data is organized.

    This very simple definition can be refined. A data model is an abstraction that, when applied to specific data, allows users and developers to treat it as information, that is, information containing not only data, but also the relationships between them.

    It is customary to distinguish three groups of data models: infological, datalogical and physical.

    Fig.1 Data models

    Infological(semantic) model is a generalized description of the subject area that is not tied to any computer or DBMS. This description, using natural language, mathematical formulas, tables, graphs, and other tools, combines the user's private understanding of the contents of the database with the developers' understanding of the data that may be needed in future applications.

    This human-centric model is completely independent of the physical parameters of the data storage environment. Therefore, the information model should not be changed as long as it adequately reflects the subject area, that is, until changes have occurred in the subject area.

    Datalogicalthe models are computer-oriented and are supported by specific DBMSs. With their help, the DBMS allows users to access stored data without worrying about their physical location. Since data access is carried out using a specific DBMS, datalogical models are described in data description language the DBMS used.

    The necessary data is found by the DBMS on external storage devices using physical data models. The physical model operates in categories related to the organization of external memory and data storage structures that are used in a given operating environment.

    Datalogical models

    This group includes such well-known models as hierarchical, network, relational and object-oriented.

    Classification of models and their descriptions appeared after development relational model. Before this, databases were developed using existing technologies. And much later they analyzed existing databases and completed their theoretical description.

    Graph-theoretical models reflect a collection of real-world objects in the form of a graph. Depending on the type of graph, hierarchical and network models are distinguished. Hierarchical and network data models began to be used in DBMS in the early 60s of the 20th century. Currently, they are used less frequently than the relational data model.

    To deal with complex data sets, mathematicians have developed a hierarchical data model. This model appeared earlier than other datalogical models. It is this data model that was used in the first officially recognized industrial DBMS from IBM.

    The hierarchical model involves storing data in a form similar to the organization of directories in MS DOS: all directories start from the root and branch like a tree. There is only one path to each file, that is, one directory name corresponds to the file.

    In the real world, some objects are inherently hierarchical structures: some objects are parents, others are children. Hierarchy is simple and natural for showing relationships between objects. Suffice it to recall the numerous classifications used in different fields of knowledge, for example, the above classification of data models. Another example is the data structure of an enterprise.

    In a hierarchical database, all records branch from one root. A record always has only one parent and can itself also be a parent for another record.

    The main advantage of the hierarchical model is speed. Since all relationships between tables are predefined and static, searches and other operations on the data set are performed very quickly.

    The most significant drawback is inflexibility. Because relationships are stored within each record, the data only makes sense in a specific context. Another disadvantage is the difficulty of transferring data from computer to computer. The third disadvantage is that global changes to the data are almost impossible. A change requires that each record, including parent and child records, be modified individually.

    Working with this data model requires a significant amount of knowledge. Most databases that use a hierarchical model require specially trained personnel to ensure proper functioning.

    A network model is proposed to provide flexibility in data management. The development of this model was greatly influenced by the American scientist Charles Bachman.

    The basic principles of the network data model were formulated in the mid-60s. The reference version of the network data model was described in the reports of the CODASYL (CONference on DAta SYstem Languages) working group in the mid-70s.

    The network model differs from the hierarchical model in that it allows more than one group relationship to be defined for a record. This model consists of many records that can be owners or members of group relationships. The network model allows searching in various structures and supports a one-to-many relationship for records.

    As in a hierarchical database, information about relationships is stored in records and must be predefined. Therefore, the network data model has the same limitations as the hierarchical one.

    Relational data model

    Basic concepts and definitions of the relational model

    Relational model

    In 1970, E.F. Codd ( E. F. Codd ) introduced the relational database model. The concept of this model is based on the fact that the organization of data in the database should be flexible, dynamic, and easy to use. The user should work only with the logical representation of the data, and the database management system will take care of the physical structure of the data. Codd formulated the basic principles of relational databases.

    The relational model uses tables and is based on two statements:

    · the database must consist of tables and only tables. Only the contents of tables determine database operations;

    · description of data and manipulation of it must be independent of the method of storing data at the lower level. In other words, relational database management systems (RDBMSs) must provide their own management system based only on the logical representation of data.

    In his paper, Codd described a language for operating with relational structures. Over time, this language evolved into what is now called structured language requests SQL (Structured Query Language).

    Codd developed a set of basic rules that a relational model DBMS must comply with. There are 12 of them in total. Really existing databases do not fully satisfy all Codd's rules. Each developer implements the relational model differently. As a result, the properties of relational databases vary greatly.

    Codd's rules can be divided into 4 categories:

    1) basic capabilities – data description and programming language;

    2) access to data – rules of access, storage and search,

    3) flexibility – rules for changing (modifying) data;

    4) integrity – rules to ensure data quality and security.

    When using a relational DBMS model, the user works with a logical data structure. To move to the lower (physical) level, Codd proposed the concept of a data dictionary.

    A data dictionary is a central table and repository of information about a database; it contains information about the location of data, field names, data types, relationship maps. The data dictionary works with the operating system and links tables (logical data) with files (physical data).

    Being a mathematician by training, E. Codd proposed using the apparatus of set theory (union, intersection, difference, Cartesian product) for data processing. He showed that any representation of data is reduced to a set of two-dimensional tables of a special type, known in mathematics as attitude– relation (English) The smallest unit of data in the relational model is a separate atomic(indecomposable) data value for a given model. So, in one subject area, the surname, first name and patronymic can be considered as a single meaning, and in another - as three different meanings.

    Domain is a set of atomic values ​​of the same type. The meaning of domains is as follows. If the values ​​of two attributes come from the same domain, then comparisons using those two attributes are probably meaningful. If the values ​​of two attributes are taken from different domains, then comparisons between them are probably meaningless.

    A relation on domains D1, D2, ..., Dn (not necessarily all of them different) consists of a header and a body.

    Heading consists of such a fixed set of attributes A1, A2, ..., An that there is a one-to-one correspondence between these attributes Ai and the domains Di (i=1,2,...,n) that define them.

    Body consists of a time-varying set tuples, where each tuple in turn consists of a set of attribute-value pairs (Ai:Vi), (i=1,2,...,n), one such pair for each attribute Ai in the header. For any given attribute-value pair (Ai:Vi), Vi is the value from the single domain Di that is associated with the attribute Ai.

    Degree of relationship is the number of its attributes. A relation of degree one is called unary, degree two is called binary, degree three is called ternary, ..., and degree n is called n-ary. Degree of relationship

    Cardinal number or power ratio is the number of its tuples. The cardinal number of a ratio changes over time, unlike its degree.

    Since a relation is a set, and sets by definition do not contain matching elements, no two tuples of a relation can be duplicates of each other at any given point in time. Let R be a relation with attributes A1, A2, ..., An. The set of attributes K=(Ai, Aj, ..., Ak) of a relation R is said to be a possible key of R if and only if two time-independent conditions are satisfied:

    1. Uniqueness: at any given time, no two different tuples R have the same value for Ai, Aj, ..., Ak.
    2. Minimality: none of the attributes Ai, Aj, ..., Ak can be excluded from K without violating uniqueness.

    Every relation has at least one possible key because at least the combination of all its attributes satisfies the uniqueness condition. One of the possible keys (chosen randomly) is taken as its primary key. The remaining possible keys, if any, are called alternative keys.

    The above-mentioned and some other mathematical concepts provided the theoretical basis for the creation of relational DBMSs, the development of appropriate language tools and software systems that ensure their high performance, and the creation of the foundations of the theory of database design. However, for the mass user of relational DBMSs, informal equivalents of these concepts can be successfully used:

    Relation – Table (sometimes File),
    Tuple – String (sometimes Record),
    Attribute – Column, Field.

    It is assumed that "record" means "an instance of a record" and "field" means "the name and type of the field."

    1. Each table consists of rows of the same type and has a unique name.

    2. Rows have a fixed number of fields (columns) and values ​​(multiple fields and repeating groups are not allowed). In other words, at each table position at the intersection of a row and a column, there is always exactly one value or nothing.

    3. The rows of the table must differ from each other by at least one value, which makes it possible to uniquely identify any row of such a table.

    4. The columns of the table are uniquely assigned names, and each of them contains homogeneous data values ​​(dates, last names, integers or monetary amounts).

    5. The complete information content of the database is represented in the form of explicit data values ​​and this method of presentation is the only one. In particular, there are no special "links" or pointers connecting one table to another.

    6. When performing operations on a table, its rows and columns can be processed in any order, regardless of their information content. This is facilitated by the presence of table names and their columns, as well as the ability to select any of their rows or any set of rows with the specified characteristics

    Keys

    Relational theory requires that data be uniquely unified according to three criteria:

    · the table where this data element is stored;

    · the name of the field in this table;

    · the primary key value for the record.

    A primary key is a field or group of fields that guarantees the uniqueness of a record.

    When designing a table, you should select as many fields as the primary key to ensure that each row in the table is unique. Some tables contain one field that uniquely identifies each record. Other tables may require a composite key ( composite key ), that is, a primary key consisting of a combination of fields. Even if a table has a composite primary key, there can only be one.

    Constructing a primary key is mandatory. Data often has a natural key ( natural key ). For example, a Social Security number identifies any U.S. taxpayer; banks issue account numbers to their clients; hospitals assign numbers to patients on file. All of these—social security number, bank account, file number—are the best candidates for a primary key because they uniquely identify taxpayers, clients, and patients, respectively.

    When choosing a key, you must be careful, since some data only appears to be unique. For example, first and last name, company name and order date.

    If the data does not contain a natural primary key, then one must be created. There are two schools of thought that offer different ways to create an artificial key ( artificial key).

    The first school argues that the key should be as close to the data as possible. For example, entries in tables Paradox by default, they are automatically sorted and displayed in the order determined by the primary key. If you build a key using four letters of the last name plus two letters of the first name, plus a sequentially assigned number, then the sorting will show the records in alphabetical order. But such a key also has inconveniences, for example, if you change your last name, you will have to update the links.

    The second school believes that the key should have nothing to do with the data, the so-called surrogate key ( surrogate key).

    The primary key should be as short as possible. A long key results in more data entry errors. Since a relational database uses primary keys to organize relationships between tables, the appearance of erroneous relationships worsens data security. If the natural primary key turns out to be too long, then it is recommended to switch to using a surrogate key. This approach is often used in practice and is known as unique identifier generation.

    Key element data is an element from which the values ​​of other data elements can be determined. Two or more data elements can uniquely identify an object. Key data elements should be selected carefully because the correct selection contributes to the creation of a valid conceptual data model.

    Primary keyis an attribute or group of attributes that uniquely identifies each row in a table.

    Alternative(secondary) key is an attribute or group of attributes that do not match the primary key and uniquely identify an instance of an object.

    Indexes

    The indices are integral part database structures and are designed to speed up the search for information in the table.

    An index is a structure associated with a table or view and designed to speed up the search for information in them. An index is defined on one or more columns, called indexed columns. It contains the sorted values ​​of an indexed column or columns with references to the corresponding row in the source table or view. Improved performance is achieved by sorting data. Using indexes can significantly improve search performance, but storing indexes requires additional database space.

    As an example of searching in a table, imagine a telephone directory where all subscribers are arranged alphabetically. Obviously, it is very easy to find a phone number in such a directory if the subscriber's last name is known. On the other hand, it is extremely difficult to find a subscriber’s last name by his phone number, because The directory is not organized by phone number; you will have to search for the desired phone number using a simple search method. Thus, the ordering of information greatly facilitates searching. This principle forms the basis of the index system.

    The figure shows a telephone directory with entries not ordered by telephone number and an index generated for this directory. The figure shows that the index is an array of integers where the directory entry numbers are placed in ascending order of the phone number. Thanks to this, the records become organized by phone number, and instead of searching using a brute-force search, you can use the bisection method or the binary tree method.


    Rice. 3. An example of an index on the “phone number” field.

    Connections

    Connectionis a functional relationship between entities. If there is a relationship between some entities, then facts from one entity refer to or are in some way related to facts from another entity. Maintaining consistency of functional dependencies between entities is called referential integrity. Since the relationships are "inside" the relational model, the implementation referential integrity can be executed both by the application and by the DBMS itself (using declarative referential integrity mechanisms and triggers).

    When describing relationships, the connection between records of different tables is implied. For example, if a one-to-many relationship is mentioned, it means that one record of a table is related to many records of another table. In no case does this mean the connection of one table with many tables.

    The simplest relationship between table records is one-to-one. This type of relationship occurs when the tables being linked have the same primary key. Most often, this type of relationship is used when there is a table with a large number of fields, some of which are secondary (not so significant). For example, a record about a person in the personnel department may consist of last name, first name, patronymic, passport details, autobiography, etc. The autobiography can be classified as secondary information and placed in an additional table with a one-to-one type of connection.

    The most common type of connection one-to-many. For example, customer and orders: one customer can make many orders. The fields through which connections are made are not free, that is, they cannot have arbitrary values. For example, the order must mention a customer who is in the “Customers” table. From the point of view of the “Clients” table, the “Client Full Name” field can be arbitrary, since it does not depend on the fields of other tables.

    If all the key fields of one table and part of the key fields of another table are related, then the type of relationship can only be one-to-many.

    The many-to-many relationship type occurs when fields that are partially included in the primary key of one and another table are related. For example, the “Product Name” field in the “Orders” table and the “Product Name” field in the “Deductions” table. A product can be ordered by several customers, and deductions for the product go to different specialists for each sale of the product (if the “Deductions” table has two fields in the primary key - product name and specialist or product name and manager).

    The above discussed ways to link tables using the fields included in the primary key. However, there is another way to link tables; on the one hand, fields that are not included in the primary key may be involved, and on the other hand, fields that are included in the primary key may be involved. This is done using secondary or foreign keys ( foreign key ). The secondary key is built using fields that are not included in the primary key.

    Thus, when defining a relationship, one table carries out the relationship using the fields included in the primary key, and the other table can use all the fields of the primary key, part of them, or fields not included in the primary key.

    Unlike relationships based only on a primary key, relationships built using a secondary key are called potential relationships. The database developer himself decides whether to use such binding or not.

    A many-to-one relationship is essentially an inverted one-to-many relationship. The values ​​in the relationship fields must be defined by a table in which the fields used are unique, that is, only one record can define many others.

    Referential integrity is ensuring that the foreign key value of a child entity instance matches the primary key values ​​in the parent entity. Referential integrity can be checked for all operations that modify data.

    Stored procedure is a program that combines queries, procedural logic (assignment operators, branching, etc.) and data stored in the database. This mechanism allows you to contain quite complex programs along with data that perform a large amount of data processing work without transferring data over the network and without interacting with the client. In this case, the database can be a functionally independent application layer that communicates with other layers to receive queries and update data.

    Rulesallow you to trigger specified actions when changing or adding data to the database and thereby control the truth of the data placed in it. Typically an action is a call to a specific procedure or function. Rules can be associated with a field or record and, accordingly, are triggered when data in a specific field or table record changes. You cannot use rules when deleting data. Unlike restrictions, which are only a means of controlling relatively simple conditions for the correctness of data entry, rules allow you to check and maintain arbitrarily complex relationships between data elements in the database.

    Triggeris a predefined action or sequence of actions that is automatically performed when updating, adding or deleting data operations. A trigger is a powerful tool for controlling changes in data in the database and helps the programmer automate the operations that should be performed in this case. The trigger is executed after the data update rules are checked. Neither the user nor the application can activate the trigger; it runs automatically when the user or application performs certain actions with the database. The trigger includes the following components:

    * restrictions for the implementation of which a trigger will be created;

    * an event that will characterize the occurrence of a situation requiring verification of restrictions. Events are most often associated with a change in the state of the database (for example, adding a record to a table), but additional conditions can also be taken into account (for example, adding a record only with a negative value);

    Using triggers when designing a database provides the following advantages:

    * triggers are always executed when the corresponding actions are performed. A developer thinks about using triggers when designing a database and may no longer think about them when developing an application for accessing data;

    * if necessary, triggers can be changed centrally directly in the database. User programs working with this database will not require modernization;

    * A data processing system that uses triggers has better portability to a client-server architecture due to fewer modifications required.

    Normalization of relations is the process of building an optimal structure of tables and relationships in a relational database. The normalization process groups data elements into tables that represent objects and their relationships. Normalization theory is based on the idea that a certain set of tables has better properties for including, modifying, and deleting data than all other sets of tables that can be used to represent the same data.

    Main functions of a DBMS Application software Software, users Database management system Operating system Database Providing software access to the database Hardware database management

    DBMS Software components of a DBMS include the kernel and service tools (utilities). ØThe DBMS core is a set of software modules necessary and sufficient to create and maintain a database, that is, a universal part that solves standard problems of information services for users. ØService programs provide users with a number of additional capabilities and services, depending on the subject area being described and the needs of a particular user. A database management system is a software system designed to create a common database on a computer for many applications, keep it up to date and ensure effective user access to the data contained in it within the framework of the powers granted to them.

    Classification of DBMS Based on the degree of universality, DBMS are divided into two classes: 1. general purpose DBMS (ON DBMS) 2. specialized DBMS (Sp. DBMS). Specialized DBMSs are created in cases where none of the existing general-purpose DBMSs can satisfactorily solve the problems facing developers. There may be several reasons: the required data processing speed is not achieved; it is necessary to operate the DBMS under conditions of severe hardware limitations; support for specific data processing functions is required. Sp. DBMS are designed to solve a specific problem, and acceptable parameters for this solution are achieved in the following way: 1. through knowledge of the features of a specific subject area, 2. by reducing the functional completeness of the system.

    Classification of DBMS Based on the methods of organizing data storage and processing, DBMSs are divided into Ø Centralized Ø Distributed. The former work with a database that is physically stored in one place (on one computer). This does not mean that the user can work with the database only on the same computer: access can be remote (in client-server mode). Most centralized DBMSs delegate the task to the organization remote access to data on network support, performing only its standard functions, which become more complicated due to the simultaneous access of many users to data. Based on the data model, there are hierarchical, network, relational, object-relational and object-oriented DBMSs.

    Requirements for relational DBMS (according to Codd) 1. 2. 3. Explicit presentation of data (The Information Rule). Information must be presented in the form of data stored in cells. Data stored in cells must be atomic. The order of rows in a relational table should not affect the meaning of the data. Guaranteed Access Rule. Each data element must be guaranteed to be accessible using a combination of the table name, row primary key, and column name. Systematic Treatment of Null Values. Unknown values ​​(NULLs) other than any known value must be supported for all data types in all operations.

    Requirements for relational DBMS (according to Codd) 4. 5. Access to the data dictionary in terms of the relational model (Dynamic On-Line Catalog Based on the Relational Model). The data dictionary must be stored in the form of relational tables, and the DBMS must support access to it using standard language tools. Completeness of a language subset (Comprehensive Data Sublanguage Rule). A relational database management system must support a single query language that allows you to perform all data operations: data definition operations, data manipulation operations, data access control, transaction management.

    Requirements for relational DBMS (according to Codd) 6. 7. Support for updated views (View Updating Rule). The view being updated must support all the data manipulation operations that relational tables support: selecting, inserting, modifying, and deleting data. Availability of high-level data management operations (High-Level Insert, Update, and Delete). Operations of insertion, modification and deletion of data must be supported not only in relation to one row of the relational table, but in relation to any set of rows.

    Requirements for relational DBMS (according to Codd) 8. Physical Data Independence. Applications should not depend on the methods used for storing data on media, or on the hardware of the computers on which the relational database is located. 9. Logical Data Independence. The presentation of data in an application should not depend on the structure of relational tables.

    Requirements for relational DBMS (according to Codd) 10. Integrity Independence. All information necessary to maintain integrity must be contained in the data dictionary. The DBMS must check the specified integrity constraints and automatically maintain data integrity. 11. Distribution Independence. The database can be distributed, can be located on several computers, and this should not have an impact on applications. 12. Coordination of language levels (Non-Subversion Rule). There should be no other means of accessing data other than the standard language for working with data. If a low-level data access language is used, it must not ignore the security and integrity rules that are supported by the higher-level language.

    Requirements for the composition and functions of the DBMS 1. Storage, retrieval and updating of data. 2. Directory (SSD), accessible to end users. Typically, the system catalog stores the following information: names, types, and sizes of data items; connection names; integrity support restrictions imposed on the data; names of users who are granted access rights to the data; external, conceptual and internal schemes and mappings between them; Statistics such as transaction rates and access counts to database objects.

    Advantages of having a DSS ØInformation about data can be centrally collected and stored, which will allow you to control access to this data. ØYou can define the meaning of the data, which will help other users understand its purpose. ØCommunication is simplified because there are precise definitions of the meaning of the data. ØThe system catalog may also indicate one or more users who are the owners of the data or have the right to access it. ØThanks to centralized storage, redundancies and inconsistencies in the description of individual data elements can be easily detected. ØChanges made to the database can be logged. ØThe consequences of any changes can be determined before they are made, since the system catalog records all existing data elements, the relationships established between them, as well as all their users. ØSecurity measures can be further strengthened. ØNew possibilities for organizing data integrity support are emerging. ØAn audit of stored information can be performed.

    Oracle System Data Dictionary Stores all information about the structure, information objects and relationships in a particular database. A data dictionary is a set of tables and auxiliary objects (indexes, clusters, synonyms, views, sequences), information about which is also stored in dictionary tables. Logically, the data dictionary is divided into: base tables; ürepresentations of base tables; Dynamic tables and their representations. In total, the data dictionary includes more than 100 base tables, which are located in the SYSTEM table space and nowhere else. Their names include the "$" character (therefore, it is not recommended to use it in the names of non-basic objects), for example: AUD$ – audit information table; FILE$ – file table; USER$ – user table; IND$ – index table; OBJ$ – object table; SEG$ – segment table; SYN$ – table of synonyms; TAB$ – table of tables; TS$ – table of table areas; VIEW$ – table of views.

    Working with the system dictionary To obtain information from the data dictionary, users are provided with base table views. They are divided into three groups: DBA - views intended for users who are DBAs, that is, who are assigned the DBA role. These views provide the most complete information from the data dictionary; USER – views through which each user receives information about the objects he owns; ALL – views that give each user all the information about the objects to which he is allowed access. For example: DBA/ALL/USER_INDEXES – all/available/user indexes; DBA/ALL/USER_IND_COLUMNS – all/available/user index columns; DBA/ALL/USER_OBJECTS – all/available/user objects; DBA/ALL/USER_SYNONYMS – all/available/user synonyms; DBA/ALL/USER_TABLES – all/available/user tables; DBA/ALL/USER_TAB_COLUMNS – all/available/user columns of tables; DBA/ALL/USER_TAB_PRIVS – all/available/user privileges on tables; DBA/ALL/USER_VIEWS – all/available/user views.

    Working with the system dictionary Some representations (according to the meaning of their application) are present only in one or two groups. This is most typical for DBA views, for example: DBA_DATA_FILES – data about the physical files of the database and logs; DBA/USER_FREE_SPACE – free memory in table spaces (all and available to a specific user); DBA_PROFILES – list of options for the “cost” of system resources; DBA_ROLES – list of roles defined in the database. Examples of retrieving data from the SSD: select table_name from user_tables; select * from all_views; select view_name from dba_views;

    Working with the system dictionary The DICT synonym for the DICTIONARY representation is important. It is used to select the names of tables, views, data dictionary synonyms with descriptions, if any are in the database. Here's a small fragment: select * from dict; ALL_CATALOG All tables, views, synonyms, sequences available to the user ALL_DB_LINKS Database links available to the user DBA_OBJECTS All objects in the database DBA_ROLES All roles that exist in the database USER_EXTENTS Extents owned by the user USER_VIEWS Definitions of views owned by the user DUAL A special table containing one column DUMMY and one row DICT Synonym for DICTIONARY TABS Synonym for USER_TABLES

    Working with the DBA system dictionary, access to these tables is open, but working at this level, except in cases of EMERGENCY, is NEVER recommended: all data dictionary information is available through views of the base tables; data in the base tables is presented without duplication according to the rules within the system ordering, without decoding; the number, names, sizes of table columns are made without taking into account sufficient clarity; accidental, intentional, or for any other reason, CORRECTION of the contents of base tables (even in obvious cases, for example, storing data on long-deleted table spaces), as a rule, leads to DAMAGE of the data dictionary, that is, to LOSS of the entire database. The rarest exception is AUD$ (audit information table), from which unnecessary entries should be periodically deleted, since when audit mode is enabled, this table quickly fills up and can overflow the SYSTEM table space.

    Requirements for the composition and functions of the DBMS 3. 4. 5. 6. 7. 8. 9. Transaction support. Parallel work management service. Recovery services. Data access control services. Data integrity services. Data independence support services. Support services.

    Support Services Typically designed to assist the DBA in efficiently administering the database. Some examples of such utilities. Import utilities for loading a database from flat files, and export utilities for exporting a database to flat files. Monitoring tools designed to track the performance and usage of the database. Statistical analysis programs that allow you to evaluate the performance or utilization of a database. Index reorganization tools designed to rebuild indexes when they are full. Garbage collection and memory reallocation tools to physically remove deleted writes from storage devices, consolidate freed space, and reallocate memory as needed.

    Main software components of the DBMS Query processor. Converts queries into a sequence of low-level commands for the database manager. Database manager. Accepts requests and reviews external and conceptual schemas to determine those conceptual records needed to satisfy the request's requirements. Then it calls the file manager to fulfill the incoming request. File manager. Manipulates files intended for data storage and is responsible for allocating available disk space. It creates and maintains a list of structures and indexes defined in the internal schema. If hashed files are used, then it is his responsibility to call hashing functions to generate record addresses.

    Basic software components of the DBMS DML language preprocessor. This module converts DML statements embedded in application programs into calls to standard functions of the base language. To generate the appropriate code, the DML preprocessor must communicate with the query processor. DDL language compiler. Converts DDL commands into a set of tables containing metadata. These tables are then stored in the system catalog and the control information is stored in the headers of the data files. Dictionary manager. Controls access to the system directory and ensures work with it. The system catalog is accessible to most DBMS components.

    Main software components of the DBMS Access rights control module. This module checks whether the given user has the authority to perform the requested operation. Command processor. After verifying the user's authority to perform the requested operation, control is transferred to the command processor. Integrity controls. For operations that modify the contents of the database, integrity controls check whether the requested operation satisfies all data integrity constraints (for example, requirements set for keys). Query optimizer. This module determines the optimal query execution strategy.

    Main software components of the DBMS Transaction Manager. Performs the required processing of operations received during the execution of transactions. Scheduler. Responsible for conflict-free execution of parallel operations with the database. It controls the relative order in which operations requested in individual transactions are performed. Recovery Manager. Ensures that the database is restored to a consistent state when failures occur. In particular, it is responsible for committing and undoing transaction results. Buffer manager. Responsible for transferring data between RAM and a secondary storage device - for example, a hard drive or magnetic tape. The recovery manager and buffer manager are sometimes (collectively) called the data manager, and the buffer manager itself is sometimes called the cache manager.

    Basic objects of Oracle Database (DATABASE) - an object that is located at the highest level of the physical organization of the Oracle database there is an object that is called: database. The database consists of a data dictionary, the data itself, and various auxiliary objects: an initialization parameter file, a control file, a rollback segment file, and two transaction log files. (This list can be expanded, for example, by copies of the control file). The database can be created automatically during installation of the Oracle DBMS or manually using the CREATE DATABASE command. Tablespace (TABLESPACE) is a memory area designed to store all database objects. A table area has a name and occupies one or more operating system files. Created by the CREATE TABLESPACE command. Sometimes a table area is called a tablespace.

    Basic Oracle Objects User (USER) is an object that has the ability to create and use other Oracle objects, as well as request server functions. Such functions include organizing a session, changing the state of the server and database, creating other database objects, requests to execute SQL statements, etc. In Oracle DBMS, the user name is the same as the schema name. Created by the CREATE USER command. Each database object belongs to the user who created it and is located in his schema. The full name of any database object (except the database, tablespaces and users) consists of the name of the schema in which it was created and the actual name of the object, for example: scott. emp Here scott is the user name (schema), emp is the name of the object (the “Employees” table), and the dot is the so-called. qualified reference separating levels of definition.

    Basic Oracle Objects Cluster (CLUSTER) is an object that defines a way to jointly store data from several tables containing information that is usually processed together. Clustering tables allows you to reduce the sampling time. Created by the CREATE CLUSTER command. Includes data tables. A table (TABLE) is the basic structure of the relational model. As you know, all information in a database is stored in tables. Tables are made up of many named columns or attributes. The set of values ​​of a column is defined using integrity constraints, that is, a limited concept of domain (set of valid values) is supported. The table may be empty or consist of one or more rows of attribute values. Rows of table attribute values ​​are also called records or tuples. Created by the CREATE TABLE command, can be created in a cluster.

    Oracle Basic Objects An index (INDEX) is a database object created to improve data retrieval performance. An index is created on a column(s) of a table and provides faster access to that table's data by ordering the data in the column(s) by value. Created by the CREATE INDEX command. Clusters, tables, and indexes are called memory objects because they store factual data. When created, they are allocated a certain amount of memory (one or more extents), which can be increased by adding data to them. An extent is a contiguous area of ​​memory in a tablespace. All extents belonging to the same object form a segment. Cluster Table Index

    Oracle Core Objects A View (VIEW) is a named, dynamically server-maintained selection of data from one or more tables. At the heart of a view is a SELECT statement called the base view query. Basic request defines visible to the user data. The view allows you to limit the data that the user can modify. The data in the view is not stored: the server generates the view each time it is accessed (this is called view materialization). Using views, a security administrator can limit the portion of the database that users can access to only the data they actually need to do their jobs. Created by the CREATE VIEW command. A SEQUENCE is an object that provides the generation of unique numbers in conditions of multi-user asynchronous access. Typically sequence elements are used to insert unique identification numbers for database table elements. Created by the CREATE SEQUENCE command.

    Oracle Core Objects A synonym (SYNONYM) is an alternative name or alias for an Oracle object that allows database users to access that object. A synonym can be private or general. A public synonym allows all database users to refer to the corresponding object by its alternative name. In this case, you do not need to specify the name of the schema to access the object, even if you connected not as the owner of the object, but from another schema. Created by the CREATE SYNONYM command. A role (ROLE) is a named set of privileges that can be granted to users or other roles. Used to effectively manage data access restrictions. Oracle supports several standard or predefined roles (DBA, CONNECT, RESOURCE, etc.). Created by the CREATE ROLE command.

    Core Oracle Objects Specific to distributed systems are Oracle objects such as snapshot and database link. A snapshot (SNAPSHOT) is a local copy of a table in a remote database, which is used either to replicate (copy) all or part of the table, or to replicate the result of a data query from several tables. Snapshots may be modifiable or read-only. Read-only snapshots can be updated periodically to reflect changes to the underlying table. Changes made to the snapshot being modified are propagated to the main table and other copies. Created by the CREATE SNAPSHOT command. A DATABASE LINK is a database object that allows you to access remote database objects. The database link name can be thought of as a reference to the parameters of the remote database access mechanism (host name, protocol, etc.). Using a single name makes it easier to work with remote database objects. Created by the CREATE DATABASE LINK command.

    Basic Oracle objects To program data processing algorithms and support complex data integrity rules, Oracle uses procedural objects: A procedure (PROCEDURE) is a subroutine in PL/SQL language designed to solve a specific data processing task. Created by the CREATE PROCEDURE command. A FUNCTION is a PL/SQL subroutine designed to solve a specific problem and return a specific value. Created by the CREATE FUNCTION command. A PACKAGE is a named, structured set of variables, procedures, and functions related by a single functional design. A package consists of a specification and a package body. The specification contains descriptions of external variables, constants, types and subroutines, and the package body contains the implementation of subroutines and descriptions of internal variables, constants and types that are available only within the package. A package specification is created with the CREATE PACKAGE command, and the package body is created with the CREATE PACKAGE BODY command. A trigger (TRIGGER) is a stored procedure that is automatically executed when an event associated with the trigger occurs. Typically events are associated with the execution of an INSERT, UPDATE, or DELETE statement on a table. Created by the CREATE TRIGGER command.

    Physical structure of the Oracle database Environment parameters: $ORACLE_HOME – name of the Oracle home directory. $ORACLE_SID – Oracle database name. The Oracle database includes: Control files (ctrl 1$ORACLE_SID. ctl, ctrl 2$ORACLE_SID. ctl, . .) Instance startup parameters file init$ORACLE_SID. ora Database configuration parameter file config$ORACLE_SID. ora Change log files (log 1$ORACLE_SID. dbf, log 2$ORACLE_SID. dbf, . .) System table space (SYSTEM, system$ORACLE_SID. dbf) Temporary table space (TEMP, temp$ORACLE_SID. dbf) Table space for user data (USER, user$ORACLE_SID. dbf)

    Oracle SGA Memory Structures is the memory shared by all processes in an instance. There is only one SGA per instance. Changes made to SGA elements for one process are immediately available to all processes running within that instance. Created when an Oracle instance starts, the SGA has a fixed size. It exists until the instance is manually terminated, or the operating system is rebooted, or Oracle itself crashes. The main internal structures of the SGA are: the Database Buffer Cache, that is, a set of free, read and modified data blocks in which information from the database is stored; transaction log buffer (Redo Log Buffer); shared (common) buffer pool (Shared Buffer Pool).

    Oracle RAM structures. SGA Data buffer cache contains two lists: a list of currently least used blocks (LRU – least_recently_used), which includes blocks read from disk but not yet modified, as well as free data buffers; a list of modified (dirty) blocks that have not yet been written to disk. Please note: disk-memory exchange is always performed in blocks, regardless of how full they are with data records and the number of records changed during processing; when accessing data, Oracle first checks whether the required data is in the buffer cache, and only if it is not, accesses the disk; Data blocks read from disk go to the beginning of the LRU list. If they are then modified, Oracle transfers them to the list of "dirty" blocks for subsequent writing to disk; If there are not enough free buffers in the cache to execute the next request, Oracle removes blocks from the “tail” of the LRU list, as the least actively used ones, and in their place reads the required data blocks from disk.

    Oracle RAM structures. SGA The change log buffer is a cyclic memory. This buffer receives all changes that occur in the database with user, system, and service data. Since the change log is intended to restore the state of the database after emergency situations, the log entries contain the “old” and “new” values ​​of the changed elements, in particular, entire data records after operations of inserting them into the database or deleting them from the database. If data processing is so intensive that the log buffer becomes full, that is, if the LGWR (log writing process) process cannot keep up with moving data from the buffer to disk, Oracle begins to hold back user processes. The shared (common) buffer pool includes: 1. Dictionary Cache: stores the most frequently (in current work) information from the system data dictionary, namely: names of tables and views, column names and data types, user privileges and roles, access rights to database objects, etc. 2. shared (common) area SQL and PL/SQL (Shared SQL and PL/SQL), which is also known as the “library cache”: includes itself a set of cursors, that is, memory structures in which the results of parsing and execution plans of SQL statements and PL/SQL blocks are stored.

    Oracle RAM structures. PGA is an area of ​​RAM allocated to support the functioning of a separate process. There is one and only one completely dedicated to a process and independent of other processes PGA for each instance process. The size of the PGA can dynamically increase during operation. The PGA is often referred to as the Process Global Area. When the Oracle process exits normally, all PGA memory is returned to the operating system. The PGA of the Oracle server process includes: a stack area containing variables and service information about the session; a private SQL area, sometimes called the User Global Area (UGA), in which SQL statements and PL/SQL blocks are parsed. This area is physically located in the SGA (a variant of the MTS architecture) or in the PGA (dedicated server architecture). The important thing is that recursive sessions do not receive their own UGAs, but share the UGAs of the session that spawned them; an optional sorting area (size sort_area_size), which is required as temporary memory to store intermediate data sorting results. If the allocated memory is insufficient to perform the sort, the process uses a temporary segment of the corresponding tablespace.

    Oracle Instance Processes Set of those working with the database background processes and the SGA (System Global Area) spawned upon startup of the instance constitute the Oracle instance. All processes of an instance operate on a single software core ($ORACLE_HOME/bin/oracle) of the DBMS. Typically, instance processes are defined as background (servicing, auxiliary, additional) and server (meaningful request processing). The minimum required for Oracle to function is a set of the following four background processes: ora_pmon_ – the process of monitoring the internal state of the system ora_dbwr_ – the process of writing data to the Oracle database ora_lgwr_ – the process of writing to the change log ora_smon_ – the process of system monitoring.

    Processes of an Oracle instance 1. pmon – background process monitor. It monitors: the state of processes in the system (in particular, it monitors access to the server from users (connect) and starts server processes); detects emergency situations and “dead” blocking of server processes; releases resources, that is, removes locks; completes transactions, removes processes from the active list; restores the state (rollback) of the database after abnormal termination situations of user processes. 2. dbwr – background process of writing data blocks to the database from the list of modified blocks in the SGA. dbwr "wakes up" to work if: the length of the list of modified blocks exceeds a threshold value; the list of free buffers does not have enough memory to read new blocks; another 3-second time interval has expired; The background logging process lgwr signals the start of the next checkpoint.

    Oracle 3 instance processes. lgwr is a background process that writes to the database change log. Transaction logging works like this: As a transaction progresses, small records called redo entries are created that contain enough information to reconstruct the changes made by the transaction. Transaction redo items are temporarily stored in the redo log buffer. When a transaction is requested to complete, the lgwr process reads the required redo items from the transaction log buffer and writes them to the database transaction log. A transaction is complete when the lgwr process writes the transaction redo item to the transaction log and records its completion in the transaction log. Data from the SGA buffer is transferred to disk in the following cases: a COMMIT operation has been performed to commit changes to the next transaction; another 3-second time interval has expired; The log buffer in the SGA is one-third full; The dbwr process wrote another portion of modified buffers to disk.

    Processes of an Oracle 4 instance. smon – a mandatory system monitoring process performs: automatic recovery (roll forward) of the database if its previous launch ended abnormally or abnormally; freeing temporary segments from unnecessary data; combining contiguous free extents of table spaces into contiguous areas. 5. arch – an optional background process for archiving operational log files for registering changes in the database. The copy location (disk, tape, ...) is determined by the log_archive_dest parameter in the init. ora. If the arch process has not had time to archive the next log file (for example, the file system is full and there is not enough space to accommodate the archive file), and a switch to it is required, Oracle suspends the operation, performing only transactions not related to journaling.

    Oracle 6 instance processes. ckpt is an optional auxiliary process for writing a checkpoint to the online change log. Usually control points writes lgwr. The ckpt process (checkpoint_process = true in the init. ora file) only exempts lgwr from this function. 7. reco – (semi) mandatory process responsible for connections with remote databases. The reco process can be omitted (the disributed_transaction = 0 parameter in init. ora), but then the instance will not be able to use any “connections between databases”. 8.snp. X – from one to ten processes for automatically updating snapshots of the local database. The quantity is set in init. ora with the snapshot_refresh_processes parameter, and the snapshot_refresh_interval parameter determines how regularly they are turned on. snp processes. X can be classified as server-based, since they, by communicating with other (in particular, the same) databases, work with user information in the database.

    Oracle 9. db instance processes. XX – additional processes for writing to the database. If the database performance bottleneck is I/O, and the database is physically located on multiple disks, it is recommended to run several additional write processes (on average, one for each individual disk). Number of additional db. XX is determined by the db_writers parameter. 10. d. XXX – dispatcher processes in the MTS architecture variant with shared servers. The number of dispatchers currently functioning depends on the workload of Oracle, but does not exceed the number specified by the mts_max_dispatchers parameter. Each dispatcher serves only a specific network or internal protocol. For example: mts_dispatchers="tcp, 1" mts_dispatchers="ipc, 1"

    Oracle 11.s instance processes. XXX – server processes in the MTS architecture version with shared servers. The number of servers currently functioning depends on the intensity of Oracle's work, but does not exceed the number specified by the mts_max_servers parameter. When starting, Oracle launches several (mts_servers) server processes, and then, as the load increases or decreases, it starts or terminates additional processes. 12. oracle – a dedicated server process that individually serves some user process (in a particular case, the snp process) process, quite possibly operating on another machine. 13.loc. X – from one to ten locking processes that provide mutual management of resources in a parallel server environment.

    Oracle server architectures The single-user version (example environment - MS DOS) is characterized by the fact that: the user process, server process and background processes are combined into one operating system task; You can run only one database and one Oracle instance; in a distributed database cannot function as a server. The multi-user option (an example environment is UNIX) is characterized by the fact that: user, server and background processes are divided into separate operating system tasks; it is possible to run multiple databases and Oracle instances; It is possible to function as a server in a distributed database.

    Oracle server architectures The single-task option (an example environment is Net. Ware) is characterized by the fact that: the user process and the server process form a single operating system task, called the user task; Only one user task can be executed on the server at any given time; It is possible for many users to access the database via Net 8 (SQL*Net). The dual-tasking option (example environment - UNIX) is characterized by the fact that: the user process and the service server process are completely independent processes of the operating system, to the point that they can function on different cars and platforms (client-server architecture); at any given time, several (many) user and server processes can function on the server; it is possible for many users to access local databases through Net 8 (SQL*Net) and for local users to access remote databases.

    Oracle server architectures Single-threaded architecture, or option with dedicated (Dedicated) servers: rigidly assigning to each user process a server process that executes it and only its requests to the database. Parallel server (environment - cluster systems, for example, RM-1000): each cluster processor runs its own Oracle instance, which includes a separate SGA area and a set of system processes; each instance maintains its own change logs; the database and control files are common to all instances; each instance can be connected to many users; Each instance is individually addressable and can operate independently as part of a distributed system.

    Oracle server architectures Multi-threaded architecture (MTS - Multi-Tread Server), a variant with shared servers, is characterized by: the presence of dispatcher processes that receive requests from user processes and return to them the results of requests executed by server processes; the presence in SGA of: one input queue for all server processes, into which dispatchers place service requests from users; several output queues, one assigned to each process by the dispatcher, where the servers are placed and from where the dispatchers transmit the results of executing queries to the database to users; transfer of private SQL areas previously located in the PGA of server processes to the SGA of the Oracle instance; dynamic change depending on the current system load in the number of functioning dispatchers and server processes; neither dispatchers nor servers are assigned to any user processes: requests are serviced as they are received; possibility of simultaneous functioning of dedicated and shared servers.

    A data bank is understood as a set of databases, as well as software, language and other tools intended for the centralized accumulation of data and their use using electronic computers.

    The data bank includes one or more databases, a database directory, a database management system (DBMS), as well as libraries of queries and application programs.

    The data bank is designed to store large amounts of information and quickly find the necessary information and documents.

    A data bank is created in a subscriber system of any capacity - from a personal computer to a supercomputer. But even the largest data bank is limited in its capabilities. Therefore, online banks specialize by collecting information in certain areas of science, technology, and products. The core of the bank is databases and knowledge bases. A database is an organized structure designed to store information. Data and information are interrelated concepts, but not identical, I should note the inconsistency in this definition. Today, most database management systems (DBMS) allow you to place not only data in their structures, but also methods (that is, program code), with the help of which interaction occurs with the consumer or with other software and hardware systems. Thus, we can say that modern databases store not only data, but also information.

    BnD has special means, making it easier for users to work with data (DBMS).

    Centralized data management has advantages over a conventional file system:

    — reduction of data storage redundancy;

    — reducing the labor intensity of development, operation and modernization of information systems;

    Ensuring convenient access to data as users

    - both data professionals and end users.

    Basic requirements for BnD:

    - adequacy of the display of the subject area (completeness, integrity and - consistency of data, relevance of information;

    — the ability to interact between users of different categories, high efficiency of data access;

    — friendly interfaces, short training time;

    — ensuring secrecy and limiting access to data for different users;

    — reliability of storage and data protection.

    The definition of the term Data Bank is given in the Temporary Regulations on State Accounting and Registration of Databases and Data Banks, approved by Decree of the Government of the Russian Federation dated February 28, 1996 No. 226, clause 2 (SZ RF, 1996, No. 12, Art. 1114)

    Initially (early 60s) a file storage system was used. To solve primarily engineering problems, characterized by a small amount of data and a significant amount of calculations, the data was stored directly in the program. Applied sequential method data organization, there was high redundancy, identical logical and physical structures and complete data dependence. With the advent of economic and managerial tasks (management information system - MIS), characterized by large volumes of data and a small proportion of calculations, this organization of data turned out to be ineffective. Data ordering was required, which, as it turned out, could be carried out according to two criteria: use (information arrays); storage (databases). Initially, information arrays were used, but the superiority of databases soon became clear. The use of files to store only data was proposed by Mac Gree in 1959. Methods of access (including random access) to such files were developed, while the physical and logical structures were already different, and the physical location of the data could be changed without changing the logical representation.

    In 1963, S. Bachman built the first industrial IDS database with a networked data model, which was still characterized by data redundancy and its use for only one application. Data was accessed using the appropriate software. In 1969, a group formed that created the CODASYL set of standards for the network data model.

    In fact, modern database architecture began to be used. Architecture is understood as a type (generalization) of a structure in which any element can be replaced by another element, the characteristics of the inputs and outputs of which are identical to the first element. A significant leap in the development of database technology was given by the paradigm of the relational data model proposed by M. Codd in 1970. A paradigm is understood as a scientific theory embodied in a system of concepts that reflect the essential features of reality. Now logical structures could be obtained from the same physical data, i.e. The same physical data could be accessed by different applications through different paths. It has become possible to ensure data integrity and independence.

    At the end of the 70s, modern DBMSs appeared, providing physical and logical independence, data security, and having developed database languages. The last decade is characterized by the emergence of distributed and object-oriented databases, the characteristics of which are determined by the applications of automated design tools and database intellectualization.

    Closely related to the concept of a database is the concept of a database management system - this is a set of software tools designed to create the structure of a new database, fill it with content, edit content and visualize information. Visualization of database information means the selection of displayed data in accordance with a given criterion, their ordering, design and subsequent output to an output device or transmission via communication channels.

    There are many database management systems in the world. Although they may work differently with different objects and provide the user with various functions and tools, most DBMSs rely on a single, well-established set of basic concepts. This gives us the opportunity to consider one system and generalize its concepts, techniques and methods to the entire class of DBMS.

    The DBMS organizes the storage of information in such a way that it is convenient:

    browse,

    replenish,

    change,

    look for the information you need,

    make any selections

    sort in any order.

    Database classification:

    a) according to the nature of the stored information:

    Factual (card indexes),

    Documentary (archives)

    b) according to the method of data storage:

    Centralized (stored on one computer),

    Distributed (used in local and global computer networks).

    c) according to the data organization structure:

    Tabular (relational),

    Hierarchical,

    Modern DBMSs make it possible to include not only text and graphic information, but also sound fragments and even video clips.

    The ease of use of the DBMS allows you to create new databases without resorting to programming, but using only built-in functions. DBMS ensure the correctness, completeness and consistency of data, as well as convenient access to them.

    Popular DBMSs are FoxPro, Access for Windows, Paradox. For less complex applications, information retrieval systems (IRS) are used instead of DBMSs, which perform the following functions:

    storing a large amount of information;

    quick search for required information;

    adding, deleting and changing stored information;

    output it in a form convenient for humans.

    Information in databases is structured into individual records, which are a group of related data elements. The nature of the relationship between records determines two main types of database organization: hierarchical and relational.

    If there is no data in the database (empty database), then it is still a full-fledged database. This fact has methodological significance. Although there is no data in the database, there is still information in it - this is the structure of the database. It defines methods for entering data and storing it in the database. The simplest “non-computer” version of a database is a business diary, in which each calendar day is allocated a page. Even if not a single line is written in it, it does not cease to be a diary, since it has a structure that clearly distinguishes it from notebooks, workbooks and other stationery products.

    Databases can contain various objects, but the main objects of any database are its tables. The simplest database has at least one table. Accordingly, the structure of the simplest database is identical to the structure of its table.

    Currently, there is a rapid growth in the number of electronic commerce systems (ECS). E-commerce has a number of distinctive features that sharply distinguish it from all previously known methods of classical commerce due to the exceptional communication characteristics of the Internet

    E-commerce systems must have the ability to coordinate business transactions across multiple business applications, be able to extract individual pieces of information from various sources, and deliver them to the customer in a timely and seamless manner. necessary information, - all based on a single user Web request.

    SEC has a set of specific properties that distinguish them from classical commerce systems (regular stores, supermarkets, stock exchanges, etc.). At the same time, these properties must be taken into account when constructing and analyzing process models in SEC, since the classical formulation of the optimization problem of optimal control of a discrete system is not suitable. So, the properties of SEC: The operating time is unlimited, unlike classical systems, where there is a strictly regulated work schedule. We can say that the flow of visitors is distributed evenly over time. Unlike classical systems in SEC (this is especially typical for B2C class systems), visitors come not only to make purchases, but also to receive some information: to get acquainted with the assortment, prices, terms of payment and delivery of goods.

    At the same time, classic systems are characterized by such a feature that visitors are very likely to become buyers. Therefore, it is possible to consider various models and methods for assessing the efficiency of SEC functioning: the ratio of the number of buyers to the number of visitors, the impact of SEC operation and feedback on the incoming flow of applications.

    It is typical for SECs that many visitors come there several times to get some information, and only after they are satisfied with all the conditions will they make a purchase.

    SEC can serve a fairly large number of visitors at the same time. This characteristic is limited only by the software and hardware capabilities of the SEC. That is, in the case of SEC, from the user’s point of view, there are no queues waiting for service. This is especially true for fully or partially automated SECs.

    In SEC, a case is possible when a visitor, who has placed products in a virtual cart, leaves the system without making a purchase (it is natural that all products remain in the system, since it is simply impossible to steal them). Drawing an analogy with classical shopping systems, it is again difficult to imagine a situation where a visitor, upon entering a store, first loads a full cart with goods, and then unloads everything and leaves the store. In SEC this case is possible if the set of control factors is not optimal (or suboptimal)

    Database management systems allow you to combine large amounts of information and process them, sort them, make selections according to certain criteria, etc.

    Modern DBMSs make it possible to include not only text and graphic information, but also sound fragments and even video clips. The ease of use of the DBMS allows you to create new databases without resorting to programming, but using only built-in functions. DBMS ensure the correctness, completeness and consistency of data.

    1.2 Relational databases

    Relational DBMS (RSDBMS; otherwise Relational Database Management System, RDBMS) is a DBMS that manages relational databases.

    The concept of relational is associated with the developments of the famous English specialist in the field of database systems, Edgar Codd.

    These models are characterized by simplicity of data structure, user-friendly tabular representation and the ability to use the formal apparatus of relational algebra and relational calculus for data processing.

    The relational model focuses on organizing data in the form of two-dimensional tables. Each relational table is a two-dimensional array and has the following properties:

    – each table element is one data element;

    – all columns in the table are homogeneous, that is, all elements in the column have the same type (numeric, character, etc.);

    – each column has a unique name;

    – there are no identical rows in the table;

    – the order of rows and columns can be arbitrary.

    The basic concepts of relational DBMS are: 1) attribute; 2) relationships; 3) tuple.

    A database, then, is nothing more than a collection of tables. RDBS and record-oriented systems are organized based on the B-Tree standard or the Indexed Sequential Access Method (ISAM) and are the standard systems used in most modern software products. To provide combinations of tables to define relationships between data, which are almost completely absent in most B-Tree and ISAM software implementations, languages ​​like SQL (IBM), Quel (Ingres) and RDO (Digital Equipment) are used, with the industry standard currently became the SQL language, supported by all relational DBMS manufacturers.

    The original version of SQL is an interpreted language designed to perform operations on databases. The SQL language was created in the early 70s as an interface for interacting with databases based on the then new relational theory. Real applications are usually written in other languages ​​that generate SQL code and pass it to the DBMS as ASCII text. It should also be noted that almost all real relational (and not only relational) systems, in addition to implementing the ANSI SQL standard, now known in the latest edition under the name SQL2 (or SQL-92), include additional extensions, for example, support for client-server architecture or application development tools.

    The table rows are made up of fields that are known in advance to the database. Most systems cannot add new data types. Each row in the table corresponds to one record. The position of a given row can change as new rows are deleted or inserted.

    To uniquely identify an element, it must be associated with a field or set of fields that guarantees the element's uniqueness within the table. This field or fields are called the table's primary key and are often numbers. If one table contains the primary key of another, this allows you to organize a relationship between elements of different tables. This field is called a foreign key.

    Since all fields of one table must contain a constant number of fields of predefined types, it is necessary to create additional tables that take into account the individual characteristics of the elements using foreign keys. This approach greatly complicates the creation of any complex relationships in the database. Another major disadvantage of relational databases is the high complexity of manipulating information and changing relationships.

    Despite the considered disadvantages of relational databases, they have a number of advantages:

    dividing tables using different programs;

    expanded “return code” for errors;

    high speed of request processing (command SELECT language SQL; the result of the selection is a table that contains fields that satisfy the specified criterion);

    the concept of object databases itself is quite complex and requires serious and lengthy training from programmers;

    relatively high speed when working with large volumes of data.

    In addition, significant amounts of money have already been invested in relational DBMSs around the world. Many organizations are not confident that the costs associated with migrating to object databases will be worth it.

    Therefore, many users are interested in a combined approach that would allow them to take advantage of the benefits of object databases without completely abandoning their relational databases. Such solutions do exist. If the transition from a relational database to an object database is too expensive, then using the latter as an extension and addition to relational DBMS is often a more cost-effective alternative. Compromise solutions allow you to maintain a balance between objects and relational tables.

    Object-relational adapters - uh This method involves the use of the so-called object-relational adapter, which automatically allocates program objects and stores them in relational databases. An object-oriented application works like an ordinary DBMS user. Although there is some performance penalty, this option allows programmers to concentrate entirely on object-oriented development. In addition, all applications in the enterprise can still access data stored in relational form.

    Some object DBMSs, such as GemStone from GemStone Systems, can themselves act as a powerful object-relational adapter, allowing object-oriented applications to access relational databases.

    Object-relational adapters, such as Hewlett-Packard's Odapter for Oracle, can be useful in many applications, such as middleware that integrates object-oriented applications with relational databases.

    Object-relational gateways - p When using this method, the user interacts with the database using the OODBMS language, and the gateway replaces all object-oriented elements of this language with their relational components. You again have to pay for this in productivity. For example, the gateway must transform objects into a set of relationships, generate original identifiers (OIDs) of the objects, and pass this to the relational database. The gateway must then, each time the RDBMS interface is used, convert the OID found in the database to the corresponding object stored in the RDBMS.

    Performance in the two approaches considered depends on the method of accessing the relational database. Each RDBMS consists of two layers: the data manager layer and the storage manager layer. The first of them processes statements in SQL language, and the second displays data into the database. A gateway or adapter can interact with both the data layer (that is, accessing the RDBMS using SQL) and the media layer (low-level procedure calls). Performance in the first case is much lower (for example, the Hewlett-Packard OpenODB system, which can act as a gateway, supports only at a high level).

    Hybrid DBMS - e Another solution would be to create hybrid object-relational DBMSs that can store both traditional tabular data and objects. Many analysts believe that the future lies with such hybrid databases. Leading relational DBMS vendors are beginning (or planning) to add object-oriented capabilities to their products. In particular, Sybase and Informix are planning to introduce object support in future versions of the DBMS. Independent companies also intend to conduct similar developments. For example, the Shores company is preparing to equip the Oracle8 DBMS with object-oriented tools, which is scheduled for release at the end of 1996.

    On the other hand, object DBMS vendors such as Object Design are aware that object-oriented databases will not replace relational DBMSs in the foreseeable future. This forces them to create gateways to support relational and hierarchical databases or various kinds of interfaces, a typical example of which is the Ontos Integration Server object-relational interface from Ontos, used in combination with its Ontos/DB OODB.

    1.3 Multidimensional databases

    A powerful database with a special storage organization - cubes, allowing users to analyze large volumes of data. A multidimensional database allows for high speed work with data stored as a collection of facts, dimensions and pre-computed aggregates.

    In specialized DBMSs based on a multidimensional representation of data, data is organized not in the form of relational tables, but in the form of ordered multidimensional arrays:

    Hypercubes - all cells stored in the database must have the same dimension, that is, be in the most complete measurement basis

    Polycubes - each variable is stored with its own set of dimensions, and all associated processing complexities are transferred to the internal mechanisms of the system.

    The use of multidimensional databases in online analytical processing systems has the following advantages:

    high performance. Products belonging to this class usually have a multidimensional database server. Data in the analysis process is selected exclusively from a multidimensional structure, and in this case, searching and retrieving data is much faster than with a multidimensional conceptual view of a relational database, since the multidimensional database is denormalized, contains pre-aggregated indicators and provides optimized access to the requested cells

    searching and retrieving data is carried out much faster than with a multidimensional conceptual view of a relational database - the average response time to an ad hoc query when using a multidimensional DBMS is usually one to two orders of magnitude less than in the case of a relational DBMS with a normalized data schema

    the structure and interfaces best match the structure of analytical queries. This method is more akin to the human mental model, since the analyst is accustomed to operating with flat tables. By cutting a cube with a two-dimensional plane in one direction or another, it is easy to obtain the interdependence of any pair of quantities relative to the chosen measure. For example, how the cost of manufacturing a product (measure) changed over time (dimension) broken down by sections, workshops and production facilities (another dimension)

    multidimensional DBMSs easily cope with the tasks of including various built-in functions in the information model, while objectively existing limitations of the SQL language make performing these tasks based on relational DBMSs quite difficult and sometimes impossible.

    MOLAP can only work with its own multidimensional databases and is based on proprietary technologies for multidimensional DBMSs, therefore it is the most expensive. These systems provide a full cycle of OLAP processing and either include, in addition to the server component, their own integrated client interface, or use external spreadsheet programs to communicate with the user. To maintain such systems, a special staff of employees is required to install, maintain the system, and create data views for end users.

    Other disadvantages of MOLAP models include:

    do not allow working with large databases. Today their real limit is 10-20 gigabytes. In addition, due to denormalization and pre-executed aggregation, 20 gigabytes in a multidimensional database, as a rule, correspond (according to Codd) to 2.5-100 times less volume of the original detailed data, that is, at best, several gigabytes.

    Compared to relational ones, they use external memory very inefficiently. Hypercube cells are stored in them in the form of logically ordered arrays (fixed-length blocks), and such a block is the minimum indexed unit. Although multidimensional DBMSs do not store blocks that do not contain any specific value, this only partially solves the problem. Because data is stored in an ordered manner, undefined values ​​are not always completely removed, and then only when the sort order allows the data to be organized into the largest possible contiguous groups. But the sort order most often used in queries may not be the order in which they should be sorted to maximize the elimination of non-existent values. Thus, when designing a multidimensional database, you often have to sacrifice either performance (and this is one of the first advantages and the main reason for choosing a multidimensional DBMS) or external memory (although, as noted, the maximum size of multidimensional databases is limited)

    there are no uniform standards for the interface, languages ​​for describing and manipulating data

    do not support data replication, which is often used as a loading mechanism. Therefore, the use of multidimensional DBMS is justified only under the following conditions:

    the volume of source data for analysis is not too large (no more than a few gigabytes), that is, the level of data aggregation is quite high.

    the set of information dimensions is stable (since any change in their structure almost always requires a complete restructuring of the hypercube).

    The system's response time to unregulated requests is the most critical parameter.

    requires extensive use of complex built-in functions to perform cross-dimensional calculations on hypercube cells, including the ability to write custom formulas and functions.

    2. Practical part

    2.1 Problem statement

    2.1.1 Purpose of solving the problem

    The management of the company Stroy-design LLC, which carries out activities related to the performance of work on the repair of premises, wants to automate calculations for calculating the cost of work performed in order to promptly provide an invoice to the client. This will help reduce settlement time, avoid human errors and increase customer satisfaction with the services provided. Therefore, it was decided to calculate the cost of the work performed and create an invoice for payment, which should contain the name of the work, the volume of work performed, the price per unit of product, and the cost of the work. The task that will be solved in the MS Excel software environment on a monthly basis is called “Calculating the cost of work performed.”

    The goal of solving this problem is to timely calculate the cost of work for the prompt provision of a detailed invoice to clients.

    2.1.2 Problem condition

    Input operational information The document “Calculation of the cost of work performed” is used, which contains the details: name of the work, volume of work performed, price per unit of product (rub.), cost of work (rub.), the last two details must be calculated and calculated. Based on it, the following screen form is created:

    Name
    work

    Units
    measurements

    Volume
    carried out
    works

    Price
    works, rub.

    Qi

    C i

    S i


    Conditionally permanent information (reference) serves as the price list of the organization, containing the following details (conditional form): name of the work, price per unit of production (rub). Based on it, the following screen form is created:

    Price list

    Job title

    Price per unit of production, rub.

    Latin letters in the table indicate the elements of the corresponding calculation formulas.

    As a result You should receive an invoice with the following details: name of the work, price per unit of product (rubles), volume of work performed, cost of work (rubles), invoice number (filled in automatically). The client's name and date are entered manually. Information is provided in the following documents:

    Structure of the resulting document “Invoice”

    Stroyservis LLC

    ACCOUNT No.

    Date

    20__

    Client's full name


    p/p

    Name
    work

    Units
    measurements

    Volume
    carried out
    works

    Price per unit of production, rub.

    Price
    works, rub.

    Replacing batteries

    pcs.

    Wallpaper sticker

    m 2

    Pipe replacement

    Parquet flooring

    m 2

    TOTAL:

    ΣS i

    VAT:

    N

    AMOUNT WITH VAT:

    SN

    Ch. accountant

    In addition, the information contained in the tables for analysis must be presented in the form of diagrams.

    In technology, organize inter-table connections for the automatic generation of the “Invoice” document using the VLOOKUP or VIEW functions.

    2.2. Computer model for solving the problem

    2.2.1. Information model for solving a problem

    Information model, reflecting the relationship between the source and resulting documents, is shown in Fig. 2.


    2.2.2. Analytical model for solving the problem

    To receive the document " Calculation of the cost of performed
    works » it is necessary to calculate the following indicators:

      cost of work, rub.;

      VAT, rub.;

      amount including VAT, rub..

      Calculations are performed using the following formulas:

      S i = C i ∙Q i ,

      N = ΣS i ∙ 0.18 ,

      SN = ΣS i + N,

      Where S i
      - price i th work; C i
      - price for i-th unit of production; Q i - volume of work performed i th work; N- VAT;SN- amount including VAT.

      2.2.3. Technology for solving MS Excel problems

      Solving the problem using MS Excel

      Call Excel:

      click the "Start" button;

      select the “Programs” command in the main menu;

      From the Microsoft Office menu, select MS Excel.

      Rename “Sheet 1” to “Price List”:

      Select the “Rename” command in the context menu and click left button mice;

      press the "Enter" key.

      Enter the table title "Price List":

      type “Price list” on the keyboard;

      4. Format the title:


      Rice. 2. Example of selecting a group of cells

      On the toolbar in the “Home” tab, select the “Alignment” section and click the button.

      5. Format cells A2:B2 for entering long headings:

      select cells A2:B2;

      execute the “Alignment” command in the “Format Cells” section of the “Home” menu on the toolbar;

      select the “Alignment” tab;

      in the “Display” option group, check the “word wrap” option (Fig. 3);


      Rice. 3. Setting word hyphenation when entering long words into a cell

      headers

      Click OK.

      6. Enter in cells A2:B2 the information presented in Fig. 4.


      Rice. 4. Names of fields in the “Price list” table

      7. Format cells A3:A8 to enter text characters:

      select cells A3:A8;

      on the toolbar in the “Home” menu, select “Cells”, where in the “Format” item, select the “Format Cells” command;

      select the “Number” tab;

      select the “Text” format (Fig. 5);

      Click OK.


      Rice. 5. Selecting cell format

      8. Repeat step 9 for the range of cells B3:B8, selecting the “Numeric” format.

      9. Enter the initial data (Fig. 6).


      Rice. 6. “Price list” table view

      10. Give the group of cells a name:

      select cells A3:B8;

      select the “Assign a name” command in the “Defined names” section of the “Formula” menu (Fig. 7);


      Rice. 7. View of the “Name Creation” window

      Click the "OK" button.

      11. Rename “Sheet 2” to “Calculation of the cost of work” (similar to steps 2).

      12. Create a table “Calculation of the cost of work performed” (similar to steps 3 - 7, 8) (Fig. 8).


      Rice. 8. Table view “Calculation of cost of work”

      13. Fill in the columns “Name of work” and “Price per unit of product, rub.”:

      make cell A3 active;

      in the “Data” menu, select the “Data Validation” command, in the “Data Type” field of which select “List”;

      enter the value in the “Source” field, highlighting the range A3:A8 in the “Price List” (Fig. 9);


      Rice. 9. Setting up a list of payers

      click "OK" button;

      in order to enter the name of a job from the list in each cell of column A (“Job Name”), make cell A3 active and, placing the cursor on the marker in the lower right corner, left-click and drag it to cell A6 (Fig. 10 );


      Rice. 10. View of the “Work cost calculation” sheet when setting up the list

      in the “Select function” field, click “VLOOKUP” (Fig. 11);


      Rice. 11. View of the first window of the function wizard

      click "OK" button;

      enter the name of the work in the “Searched_value” field by clicking on cell A3;

      press "Enter";

      enter information in the “Table” field;

      use the “Use in formula” command of the “Formulas” menu, selecting “Insert names”;

      select “Name:” “Price_list” (Fig. 12);


      Rice. 12. Entering an array name as a formula argument

      click "OK" button;

      press "Enter";

      enter the information - number 2 in the "Column_Number" field;

      enter the information - number 0 in the “Interval_viewing” field (Fig. 13);


      Rice. 13. View of the second window of the function wizard

      Click "OK" button;

      14. Fill in the column “Scope of work performed.”

      15. Enter the names of the works in cells A4:A6:

      Make cell A4 active;

      Click on the button next to cell A4 and from the proposed list select the name of the work - Battery replacement, pcs. Cell C4 - “Price per unit of production, rub.” will be filled in automatically (Fig. 14);


      Rice. 14. Automatic filling of the Price per unit of product by its name

      similarly fill in cells A5:A6, cells C5:C6 will also be filled in automatically.

      16. Fill in the column “Cost of work, rubles”
      table “Calculation of the cost of work performed”.
      To do this:

      enter the formula =B3*C3 into cell D3;

      multiply the formula entered in cell D3 for the remaining cells D4:D6 of this column (using the autocomplete function).

      Thus, a loop will be executed whose control parameter is the line number.

      17. The completed table looks like this (Fig. 15).


      Rice. 15. Result of filling out the table “Calculation of the cost of work”

      18. Rename "Sheet 3" to " Check "(similar to steps in paragraph 2).

      19. On the “Account” worksheet, create the necessary table, following the previous paragraphs.

      20. Use the LOOKUP() function to create inter-table relationships. However, before doing this, sort the values ​​of the table “Calculations of the cost of work performed” in ascending order by the column “Name of work”. To do this:

      select the range of cells A2:D6;

      select “Sorting and Filter” on the Home page, and then “Custom Sorting”;

      in the window that appears, select “Sort by” “Name of works”;

      Click OK.

      use the “Insert Function” command in the “Formula” menu;

      in the “Select a function” field, click “VIEW”;

      click "OK" button;

      enter the name of the work in the “Searched_value” field by clicking on cell C9;

      press "Enter";

      enter information in the “Vector to be viewed” field, namely ‘Work cost calculation’!$A$3:$A$6;

      press "Enter";

      enter information in the “Required vector” field, namely ‘Calculation of the cost of work’!$С$3:$С$6;

      press “Enter” (Fig. 16);


      Rice. 16. View of the second window of the VIEW function wizard

      click "OK" button;

      22. Repeat steps similar to step 22 for cells D9:D12, E9:E12.

      23. Fill in the “TOTAL” column of the table as follows:

      enter the formula =SUM(F9:F12) in cell F13.

      24. Fill in the “VAT” column. To do this, enter the formula =F13*0.18 in cell F14.

      25. Fill in the column “AMOUNT WITH VAT”. To do this, enter the formula =F13+F14 in cell F15.

      26. As a result, you should get the table shown in Fig. 17.


      Rice. 17. Form of invoice for payment for work performed

      27. To analyze information about the cost of each type of work for a received order:

      make the “Account” sheet active;

      select range C9:F12;

      select the “Histogram” command in the “Charts” section of the “Insert” menu;

      select the required histogram type;

      rename the histogram “Cost of each type of work” (Fig. 18).


      Rice. 18. Histogram “Cost of each type of work”

      2.3. Results of a computer experiment and their analysis

      2.3.1. Results of a computer experiment

      To test the correctness of the problem solution, fill out the input documents and then calculate the results.

      Price list

      Job title

      Price per unit of production, rub.

      Bathtub replacement, pcs.

      Replacement of pipes, m

      Wallpaper sticker, m2

      Parquet flooring, m2

      Whitewashing of the ceiling, m2

      Calculation of the cost of work performed

      Job title

      Scope of work performed

      Price per unit of production, rub.

      Cost of work, rub.

      Battery replacement, pcs.

      1000

      Replacement of pipes, m

      Wallpaper sticker, m2

      1400

      Parquet flooring, m2

      1200

      Stroy-design LLC

      ACCOUNT No.

      Date


      .
      .20

      Client's full name

      No.

      Job title

      Scope of work performed

      Price per unit of production, rub.

      Cost of work, rub.

      Battery replacement, pcs.

      1000

      Wallpaper sticker, m2

      1400

      Replacement of pipes, m

      Parquet flooring, m2

      1200

      TOTAL:

      4560

      VAT:

      820,8

      AMOUNT WITH VAT:

      5380,8

      As a result of solving the problem, the statements obtained using the computer coincide with the test ones.

      2.3.2. Analysis of the results obtained

      Thus, the formation of the resulting document (table) “Invoice” allows us to solve the problem - reduce the time for calculating the cost of work, eliminate errors caused by the human factor and increase the degree of customer satisfaction. Creating various charts (histograms, graphs) based on table data using MS Excel allows you not only to visually present the results of information processing for analysis in order to make decisions, but also to quickly carry out manipulations in the area of ​​their construction in favor of the most convenient presentation of visualization results according to specified user (analyst) parameters.

      The main ideas of modern information technology are based on the concept that data should be organized into databases in order to adequately reflect the changing real world and meet the information needs of users. These databases are created and operate under the control of special software systems called database management systems (DBMS).

      The increase in the volume and structural complexity of stored data and the expansion of the circle of users of information systems have led to the widespread use of the most convenient and relatively easy-to-understand relational (tabular) DBMS. To ensure simultaneous access to data by many users, often located quite far from each other and from the place where databases are stored, network multi-user versions of databases based on a relational structure have been created. In one way or another, they solve specific problems of parallel processes, integrity (correctness) and security of data, as well as access authorization.

      The DBMS must provide access to data to any users, including those who have virtually no and (or) do not want to have any idea about: the physical placement of data and their descriptions in memory; mechanisms for searching the requested data; problems that arise when many users (application programs) request the same data simultaneously; ways to ensure data protection from incorrect updates and (or) unauthorized access; keeping databases up to date and many other DBMS functions.

      Today, relational databases remain the most common due to their simplicity and clarity both during the creation process and at the user level.

      The main advantage of relational databases is compatibility with the most popular query language SQL. With a single query in this language, you can join several tables into a temporary table and cut out the required rows and columns from it (selection and projection). Since the tabular structure of a relational database is intuitive for users, the SQL language is simple and easy to learn. The relational model has a solid theoretical foundation on which the evolution and implementation of relational databases were based. Riding the wave of popularity generated by the success of the relational model, SQL became the primary language for relational databases.

      In the process of analyzing the above information, the following disadvantages of the considered database model were identified: since all fields of one table must contain a constant number of fields of predefined types, it is necessary to create additional tables that take into account the individual characteristics of the elements using foreign keys. This approach makes it very difficult to create any complex relationships in the database; high complexity of manipulating information and changing connections.

      In the practical part, the task was solved using MS Excel 2010 in relation to a fictitious enterprise - the company Stroy-design LLC, which carries out activities related to the performance of renovation work. Tables were built based on the data given in the assignment. The cost of work for the received order has been calculated; the calculation data is entered into the table. Inter-table connections have been organized using the VLOOKUP or VIEW functions to automatically generate an invoice issued to the client for payment for work performed. The document “Invoice for payment for work performed” has been generated and filled out. The results of calculating the cost of each type of work for the received order are presented in graphical form.

      Computer training program in the discipline Informatics” / A.N. Romanov, V.S. Toroptsov, D.B. Grigorovich, L.A. Galkina, A.Yu. Artemyev, N.I. Lobova, K.E. Mikhailov, G.A. Zhukov, O.E. Krichevskaya, S.V. Yasenovsky, L.A. Vdovenko, B.E. Odintsov, G.A. Titorenko, G.D. Savichev, V.I. Gusev, S.E. Smirnov, V.I. Suvorova, G.V. Fedorova, G.B. Konyashina. – M.: VZFEI, 2000. Date updated 11/24/2010. – Access by login and password.

      Computer training program in the discipline “Information systems in economics” / A.N. Romanov, V.S. Toroptsov, D.B. Grigorovich, L.A. Galkina, A.V. Mortvichev, B.E. Odintsov, G.A. Titorenko, L.A. Vdovenko, V.V. Braga, G.D. Savichev, V.I. Suvorov. – M.: VZFEI, 2005. Date updated 10.15.2010. – URL: . Access by login and password.

      DBMS CONCEPT AND TYPES OF DATABASE MODELS COLLECTION OF SOCIOLOGICAL DATA USING DATABASE TECHNOLOGIES. CREATION OF TABLES AND DB FORMS 2013-11-05

    With this article we begin a new series devoted to databases, modern technologies for accessing and processing data. During this cycle, we plan to consider the most popular desktop and server database management systems (DBMS), data access mechanisms (OLD DB, ADO, BDE, etc.) and utilities for working with databases (administration tools, report generators, tools graphical presentation of data). In addition, we plan to pay attention to methods of publishing data on the Internet, as well as such popular methods of processing and storing data as OLAP (On-Line Analytical Processing) and creating data warehouses (Data Warehousing).

    In this article we will look at the basic concepts and principles underlying database management systems. We will discuss the relational data model, the concept of referential integrity and principles of data normalization, as well as data design tools. Then we will explain what DBMSs are, what objects can be contained in databases, and how queries are made against these objects.

    Basic Relational Database Concepts

    Let's start with the basic concepts of DBMS and brief introduction into the theory of relational databases - the most popular method of data storage today.

    Relational data model

    Relational data model was proposed by Dr. E.F. Codd, a renowned database researcher, in 1969 while he was an IBM employee. The basic concepts of this model were first published in 1970. “A Relational Model of Data for Large Shared Data Banks”, CACM, 1970, 13 N 6).

    A relational database is a data warehouse containing a set of two-dimensional tables. A set of tools for managing such storage is called relational database management system (RDBMS). An RDBMS may contain utilities, applications, services, libraries, application creation tools, and other components.

    Any relational database table consists of lines(also called records) And columns(also called fields). In this series we will use both pairs of terms.

    The rows of the table contain information about the facts presented in it (or documents, or people, in a word - about objects of the same type). At the intersection of a column and a row are the specific values ​​​​of the data contained in the table.

    The data in the tables meets the following principles:

    1. Each value contained at the intersection of a row and a column must be atomic(that is, not divided into several values).
    2. Data values ​​in the same column must belong to the same type available for use in a given DBMS.
    3. Each record in the table is unique, that is, there are no two records in the table with a completely matching set of values ​​for its fields.
    4. Each field has a unique name.
    5. The sequence of fields in the table is not important.
    6. The sequence of entries is also immaterial.

    Despite the fact that table rows are considered unordered, any database management system allows you to sort rows and columns in selections from it in the way the user needs.

    Since the sequence of columns in a table is not important, they are referred to by name, and these names are unique for a given table (but do not have to be unique for the entire database).

    So now we know that relational databases are made up of tables. To illustrate some theoretical points and to create examples, we need to select some kind of database. In order not to “reinvent the wheel,” we will use the NorthWind database included with Microsoft SQL Server and Microsoft Access.

    Now let's look at the relationships between tables.

    Keys and connections

    Let's take a look at a snippet of the Customers table from the NorthWind database (we've removed fields from it that are not essential to illustrating the relationships between the tables).

    Because the rows in a table are unordered, we need a column (or set of columns) to uniquely identify each row. Such a column (or set of columns) is called primary key (primary key). The primary key of any table must contain unique non-empty values ​​for each row.

    If the primary key has more than one column, it is called composite primary key (composite primary key).

    A typical database usually consists of several related tables. Fragment of the Orders table.

    The CustomerID field of this table contains the ID of the customer who placed the order. If we want to know the name of the company that placed the order, we must look for the same customer ID value in the CustomerID field of the Customers table and read the value of the CompanyName field in the found row. In other words, we need to link two tables, Customers and Orders, using the CustomerID field. A column that points to a record in another table that is associated with by this entry, called foreign key (foreign key). As you can see, in the case of the Orders table, the foreign key is the CustomerID column (Fig. 1).

    In other words, a foreign key is a column or set of columns whose values ​​match the existing values ​​of the primary key of another table.

    This relationship between tables is called communication (relationship). A relationship between two tables is established by assigning the foreign key values ​​of one table to the primary key values ​​of the other.

    If each customer in the Customers table can place only one order, the two tables are said to be related by one-to-one (one-to-one relationship). If each customer in the Customers table can place zero, one, or many orders, the two tables are said to be related by one-to-many (one-to-many relationship) or ratio master-detail. Similar relationships between tables are most often used. In this case, the table containing the foreign key is called detail table, and a table containing a primary key that defines the possible values ​​of a foreign key is called master table.

    A group of related tables is called scheme databases ( database schema). Information about tables, their columns (names, data type, field length), primary and foreign keys, as well as other database objects is called metadata (metadata).

    Any manipulation of data in databases such as selecting, inserting, deleting, updating data, changing or selecting metadata is called request to the database ( query). Typically, queries are formulated in some language, which can be either standard for different DBMSs or dependent on a specific DBMS.

    Referential integrity

    We already said above that the primary key of any table must contain unique non-empty values ​​for a given table. This statement is one of the rules referential integrity (referential integrity). Some (but not all) DBMSs can control the uniqueness of primary keys. If the DBMS controls the uniqueness of primary keys, then if you try to assign a value to a primary key that already exists in another record, the DBMS will generate a diagnostic message, usually containing the phrase primary key violation. This message can later be transmitted to the application through which the end user manipulates the data.

    If two tables are related by relationship master-detail, foreign key detail- the table should contain only those values ​​that already exist among the primary key values master- tables. If the correctness of foreign key values ​​is not controlled by the DBMS, we can talk about a violation of referential integrity. In this case, if we delete a record from the Customers table that has at least one associated with it detail- entry in the Orders table, this will lead to the Orders table containing records of orders placed by someone unknown. If the DBMS controls the correctness of the values ​​of foreign keys, then when you try to assign a value to a foreign key that is not among the values ​​of the primary keys of the master table, or when you delete or modify records in the master table, leading to a violation of referential integrity, the DBMS will generate a diagnostic message, usually containing the phrase foreign key violation, which can later be passed to the user application.

    Most modern DBMSs, such as Microsoft Access 97, Microsoft Access 2000, and Microsoft SQL Server 7.0, are capable of monitoring compliance with referential integrity rules, if any are described in the database. For this purpose, such DBMSs use various database objects (we will discuss them a little later). In this case, all attempts to violate referential integrity rules will be suppressed with the simultaneous generation of diagnostic messages or exceptions ( database exceptions).

    Introduction to Data Normalization

    The data design process is the definition of metadata in accordance with the objectives of the information system in which the future database will be used. Details on how to perform domain analysis and create entity-relationship diagrams ( ERD - entity-relationship diagrams) and data models are beyond the scope of this cycle. Those interested in these issues can refer, for example, to the book by K. J. Date “Introduction to Database Systems” (Dialectics, Kyiv, 1998).

    In this article we will discuss only one of the basic principles of data design - the principle normalization.

    Normalization is the process of reorganizing data by eliminating repeating groups and other contradictions in data storage in order to bring tables to a form that allows consistent and correct editing of data.

    Normalization theory is based on the concept of normal forms. A table is said to be in a given normal form if it satisfies a certain set of requirements. In theory there are five normal forms, but in practice only the first three are usually used. Moreover, the first two normal forms are essentially intermediate steps to bring the database into third normal form.

    First normal form

    Let's illustrate the normalization process with an example using data from the NorthWind database. Let's assume we record all ordered products in the following table. The structure of this table is as follows (Fig. 2).

    For a table to comply with first normal form, all its field values ​​must be atomic, and

    all records are unique. Therefore, any relational table, including the OrderedProducts table, by definition, is already in first normal form.

    However, this table contains redundant data, for example, the same customer information is repeated in the record for each product ordered. Data redundancy results in data modification anomalies—problems that occur when records are added, changed, or deleted. For example, when editing data in the OrderedProducts table, the following problems may occur:

    • A specific customer's address can only be contained in the database when the customer has ordered at least one product.
    • When you delete a record of an ordered product, information about the order itself and about the customer who placed it are simultaneously deleted.
    • If, God forbid, the customer changes his address, all records about the products he ordered will have to be updated.

    Some of these problems can be solved by aligning the database second normal form.

    Second normal form

    It is said that a relational table is in second normal form, if it is in first normal form and its non-key fields completely dependent from the entire primary key.

    The OrderedProducts table is in first normal form, but not second normal form, because the CustomerID, Address, and OrderDate fields depend only on the OrderID field, which is part of the composite primary key (OrderID, ProductID).

    To move from first normal form to second normal form, you need to follow these steps:

    1. Determine what parts the primary key can be broken into so that some of the non-key fields depend on one of these parts ( these parts do not have to consist of one column!).
    2. Create a new table for each such part of the key and the group of fields that depend on it and move them to this table. Part of the former primary key will become the primary key of the new table.
    3. Remove fields from the source table that have been moved to other tables, except those that will become foreign keys.

    For example, to bring the OrderedProducts table to second normal form, you need to move the fields CustomerID, Address and OrderDate to a new table (let's call it OrdersInfo), and the OrderID field will become the primary key of the new table (Fig. 3).

    As a result, the new tables will look like this. However, tables that are in second normal form but not third normal form still contain data modification anomalies. Here they are, for example, for the OrdersInfo table:

    • A specific customer's address can still only be contained in the database when the customer has ordered at least one product.
    • Deleting an order entry in the OrdersInfo table will result in deleting the entry for the customer itself.
    • If the customer has changed the address, several records will have to be updated (although, as a rule, there are fewer of them than in the previous case).

    These anomalies can be eliminated by moving to third normal form.

    Third normal form

    A relational table is said to be in third normal form, if it is in second normal form and all its non-key fields depend only on the primary key.

    The OrderDetails table is already in third normal form. The non-key field Quantity is entirely dependent on the composite primary key (OrderID, ProductID). However, the OrdersInfo table is not in third normal form, since it contains a dependency between non-key fields (it is called transitive dependence- transitivedependency) - the Address field depends on the CustomerID field.

    To move from second normal form to third normal form, you need to follow these steps:

    • Define all fields (or groups of fields) on which other fields depend.
    • Create a new table for each such field (or group of fields) and the group of fields that depend on it and move them to this table. The field (or group of fields) on which all other moved fields depend will become the primary key of the new table.
    • Remove the moved fields from the original table, leaving only those that will become foreign keys.

    To bring the OrdersInfo table to third normal form, create a new Customers table and move the CustomerID and Address fields to it. We will remove the Address field from the source table, and leave the CustomerID field - now it is a foreign key (Fig. 4).

    So, after bringing the original table to third normal form, there were three tables - Customers, Orders and OrderDetails.

    Benefits of Normalization

    Normalization eliminates data redundancy, which allows you to reduce the amount of stored data and get rid of the data change anomalies described above. For example, after reducing the database discussed above to third normal form, the following improvements are evident:

    • Customer address information can be stored in the database, even if it is only a potential customer who has not yet placed an order.
    • You can delete information about an ordered product without fear of deleting customer and order information.

    Changing a customer's address or order registration date now only requires changing one record.

    How databases are designed

    Typically, modern DBMSs contain tools that allow you to create tables and keys. There are also utilities supplied separately from the DBMS (and even servicing several different DBMSs simultaneously) that allow you to create tables, keys and relationships.

    Another way to create tables, keys and relationships in a database is to write a so-called DDL script (DDL - Data Definition Language; we'll talk about it a little later).

    Finally, there is another way that is becoming more and more popular - the use of special tools called CASE tools (CASE stands for Computer-Aided System Engineering). There are several types of CASE tools, but the most commonly used tools for creating databases are entity-relationship diagrams (E/R diagrams). With the help of these tools, the so-called logical a data model that describes facts and objects to be registered in it (in such models, table prototypes are called entities, and fields are called their attributes). After establishing relationships between entities, defining attributes and performing normalization, a so-called physical a data model for a specific DBMS, in which all tables, fields and other database objects are defined. After this, you can generate either the database itself or a DDL script to create it.

    List of currently most popular CASE tools.

    Tables and fields

    Tables are supported by all relational DBMSs, and their fields can store data of different types. The most common data types.

    Indexes

    A little higher we talked about the role of primary and foreign keys. In most relational DBMSs, keys are implemented using objects called indexes, which can be defined as a list of record numbers indicating in what order to provide them.

    We already know that records in relational tables are unordered. However, any record at a particular point in time has a very specific physical location in the database file, although this may change during the process of editing data or as a result of the “internal activities” of the DBMS itself.

    Suppose at some point in time the records in the Customers table were stored in this order.

    Let's say we need to get this data ordered by the CustomerID field. Omitting the technical details, we can say that the index on this field is the sequence of record numbers in accordance with which they need to be displayed, that is:

    1,6,4,2,5,3

    If we want to sort records by the Address field, the sequence of record numbers will be different:

    5,4,1,6,2,3

    Storing indexes requires significantly less space than storing differently sorted versions of the table itself.

    If we need to find data about customers whose CustomerID begins with the characters "BO", we can use the index to find the location of these records (in this case 2 and 5 (obviously, in the index the numbers of these records are consecutive), and then read exactly the second and fifth records, instead of scanning the entire table. Thus, the use of indexes reduces the time of data retrieval.

    We have already said that the physical location of records can change during the process of editing data by users, as well as as a result of manipulations with database files carried out by the DBMS itself (for example, data compression, garbage collection, etc.). If corresponding changes occur in the index, it is called supported and such indexes are used in most modern DBMSs. The implementation of such indexes leads to the fact that any change in data in a table entails a change in the indexes associated with it, and this increases the time required by the DBMS to carry out such operations. Therefore, when using such DBMSs, you should create only those indexes that are really needed, and be guided by which queries will be encountered most often.

    Restrictions and rules

    Most modern server DBMSs contain special objects called restrictions(constraints), or rules(rules). These objects contain information about restrictions placed on possible field values. For example, using such an object, you can set the maximum or minimum value for a given field, and after this the DBMS will not allow you to save a record in the database that does not satisfy this condition.

    In addition to the constraints associated with setting the range of data changes, there are also reference constraints (for example, a master-detail relationship between the Customers and Orders tables can be implemented as a constraint requiring that the value of the CustomerId field (foreign key) in the Orders table be equal to one of the existing values ​​of the CustomerId field of the Customers table.

    Note that not all DBMSs support restrictions. In this case, you can either use other objects (for example, triggers) to implement similar rule functionality, or store these rules in client applications that work with this database.

    Submissions

    Almost all relational DBMSs support views. This object is a virtual table that provides data from one or more real tables. In reality, it does not contain any data, but only describes their source.

    Often such objects are created to store complex queries in databases. In fact, view is a stored query.

    The creation of views in most modern DBMSs is carried out by special visual tools that allow you to display the necessary tables on the screen, establish connections between them, select displayed fields, introduce restrictions on records, etc.

    Often these objects are used to provide data security, for example, by allowing data to be viewed through them without providing access to the tables directly. In addition, some view objects can return different data depending, for example, on the username, which allows him to receive only the data that interests him.

    Triggers and stored procedures

    Triggers and stored procedures, supported in most modern server DBMSs, are used to store executable code.

    A stored procedure is a special type of procedure that is executed by a database server. Stored procedures are written in a procedural language that depends on the specific DBMS. They can call each other, read and modify data in tables, and can be called from a client application running the database.

    Stored procedures are typically used to perform frequently occurring tasks (for example, reconciling a balance sheet). They can have arguments, return values, error codes, and sometimes sets of rows and columns (this data set is sometimes called a dataset). However, the latter type of procedures is not supported by all DBMSs.

    Triggers also contain executable code, but unlike procedures, they cannot be called from a client application or stored procedure. A trigger is always associated with a specific table and is executed when the event to which it is associated (for example, inserting, deleting, or updating a record) occurs while editing that table.

    In most DBMSs that support triggers, you can define multiple triggers that execute when the same event occurs, and determine the order of execution.

    Objects for generating primary keys

    Very often, primary keys are generated by the DBMS itself. This is more convenient than generating them in client application, since in multi-user work, generating keys using a DBMS is the only way to avoid duplication of keys and obtain their consistent values.

    Different DBMSs use different objects to generate keys. Some of these objects store an integer and the rules by which the next value is generated, usually done using triggers. Such objects are supported, for example, in Oracle (in which case they are called sequences) and in IB Database (in which case they are called generators).

    Some DBMSs support special field types for primary keys. When adding records, such fields are filled automatically with sequential values ​​(usually integers). In the case of Microsoft Access and Microsoft SQL Server, such fields are called Identity fields, and in the case of Corel Paradox, they are called Autoincrement fields.

    Users and roles

    Preventing unauthorized access to data is a serious problem that can be solved in different ways. The simplest is password protection of either the entire table or some of its fields (this mechanism is supported, for example, in Corel Paradox).

    Currently, another method of protecting data is more popular - creating a list of users with user names and passwords. In this case, any database object is owned by a specific user, and that user grants permission to other users to read or modify data from that object, or to modify the object itself. This method is used in all server and some desktop DBMSs (for example, Microsoft Access).

    Some DBMSs, mainly server ones, support not only user list, but also roles. A role is a set of privileges. If a specific user receives one or more roles, and with them all the privileges defined for this role.

    Database queries

    Modification and selection of data, changing metadata and some other operations are carried out using queries. Most modern DBMSs (and some application development tools) contain tools for generating such queries.

    One way to manipulate data is called “queries by example” (QBE). QBE is a tool for visually linking tables and selecting which fields to display in the query result.

    In most DBMSs (with the exception of some desktop ones), visual construction of a query using QBE leads to the generation of query text using a special query language SQL (Structured Query Language). You can also write the query directly in SQL.

    Cursors

    Often the result of a query is a set of rows and columns (dataset). Unlike a relational table, the rows in such a set are ordered, and their order is determined by the original query (and sometimes by the presence of indexes). Therefore, we can define the current row in such a set and a pointer to it, which is called a cursor.

    Most modern DBMSs support so-called bi-directional cursors, which allow you to navigate through the resulting data set both forward and backward. However, some DBMSs only support unidirectional cursors, which allow only forward navigation through a data set.

    SQL language

    Structured Query Language (SQL) is a non-procedural language used to formulate database queries in most modern DBMSs and is currently an industry standard.

    The non-procedural nature of the language means that it can specify what needs to be done with the database, but it cannot describe the algorithm for this process. All algorithms for processing SQL queries are generated by the DBMS itself and do not depend on the user. The SQL language consists of a set of statements that can be divided into several categories:

    • Data Definition Language (DDL) - a data definition language that allows you to create, delete and change objects in databases
    • Data Manipulation Language (DML) - a data management language that allows you to modify, add and delete data in existing database objects
    • Data Control Languages ​​(DCL) - the language used to control user privileges
    • Transaction Control Language (TCL) - a language for managing changes made by groups of operators
    • Cursor Control Language (CCL) - statements for defining a cursor, preparing SQL statements for execution, and some other operations.

    We will tell you more about the SQL language in one of the following articles in this series.

    User Defined Functions

    Some DBMSs allow the use of user-defined functions (UDF-User-Defined Functions). These functions are typically stored in external libraries and must be registered in the database before they can be used in queries, triggers, and stored procedures.

    Because user-defined functions are contained in libraries, they can be created using any development tool that allows you to create libraries for the platform on which the DBMS runs.

    Transactions

    A Transaction is a group of operations on data that are either performed together or canceled together.

    Committing a transaction means that all operations included in the transaction have been successfully completed and the result of their work has been saved in the database.

    Rollback of a transaction means that all already completed operations that were part of the transaction are canceled and all database objects affected by these operations are returned to their original state. To implement the ability to roll back transactions, many DBMSs support writing to log files, which allow you to restore the original data during a rollback.

    A transaction can consist of several nested transactions.

    Some DBMSs support two-phase commit, a process that allows transactions to be carried out on multiple databases belonging to the same DBMS.

    To support distributed transactions (that is, transactions over databases managed by different DBMSs), there are special tools called transaction monitors.

    Conclusion

    In this article, we discussed the basic concepts of building relational DBMSs, the basic principles of data design, and also talked about what objects can be created in databases.

    In the next article we will introduce our readers to the most popular desktop DBMSs: dBase, Paradox, Access, Visual FoxPro, Works and discuss their main capabilities.

    ComputerPress 3"2000

    DBMS functions.

    DBMS functions are of high and low level.

    High level functions:

    1. Data Definition – using this function, it is determined what information will be stored in the database (type, properties of data and how they will be related to each other).

    2. Data processing. Information can be processed in different ways: sampling, filtering, sorting, combining one information with another, calculating totals.

    3. Data management. Using this function, you specify who is allowed to view the data, correct it or add new information, and also define the rules for collective access.

    Low level functions:

    1. Managing data in external memory;

    2. Managing RAM buffers;

    3. Transaction management;

    4. Entering a change log into the database;

    5. Ensuring the integrity and security of the database.

    Transaction is an indivisible sequence of operations that is monitored by the DBMS from beginning to completion, and in which, if one operation is not completed, the entire sequence is canceled.

    DBMS log – a special database or part of the main database, inaccessible to the user and used to record information about all changes to the database.

    Introduction to DBMS Log is designed to ensure reliable storage in the database in the presence of hardware failures and failures, as well as errors in software.

    Database integrity is a property of a database, meaning that it contains complete, consistent and adequately reflecting information about the subject area.

    Classification of DBMS.

    DBMS can be classified:

    1. By type of program:

    a. Database servers (for example, MS SQL Server, InterBase (Borland)) – are intended for organizing data processing centers in computer networks and implement database management functions requested by client programs using SQL statements (i.e. programs that respond to requests);

    b. Database clients – programs that request data. PFDBMS, spreadsheets, word processors, and email programs can be used as client programs;

    c. Fully functional databases (MS Access, MS Fox Pro) – a program that has a developed interface that allows you to create and modify tables, enter data, create and format queries, develop reports and print them.

    2. According to the DBMS data model (as well as the database):

    a. Hierarchical – based on a tree structure for storing information and reminiscent of a computer file system; the main disadvantage is the inability to implement the many-to-many relationship;

    b. Network – which replaced hierarchical ones and did not last long because the main drawback was the difficulty of developing serious applications. The main difference between a network and a hierarchical one is that in a hierarchical structure “record – descendant” has only one ancestor, but in a network descendant it can have any number of ancestors;

    c. Relational – the data of which is placed in tables, between which there are certain connections;

    d. Object-oriented – they store data in the form of objects and the main advantage when working with them is that an object-oriented approach can be applied to them;

    e. Hybrid, i.e. object-relational – combine the capabilities of relational and object-oriented databases. An example of such a database is Oracle (previously it was relational).

    3. Depending on the location of the individual parts of the DBMS, they are distinguished:

    a. local – all parts of which are located on one computer;

    b. network.

    Networks include:

    - with file-server organization;

    With this organization, all data is located on one computer, which is called a file server, and which is connected to the network. When searching necessary information the entire file is transferred, including a lot of redundant information. And only when creating a local copy is the required record found.

    - with a client-server organization;

    The database server receives a request from the client and searches the data for the desired entry and transfers it to the client. A request to the server is formed in the structured query language SQL, which is why database servers are called SQL servers.

    - distributed DBMS contain several tens and hundreds of servers located over a large area.

    Basic provisions of the relational database model.

    Relational database is a database in which all data is organized in the form of tables, and all operations on this data are reduced to operations on tables.

    Features of relational databases:

    1. Data is stored in tables consisting of columns and rows;

    2. At the intersection of each column and row there is one value;

    3. Each column - field has its own name, which serves as its name - attribute, and all values ​​in one column have the same type;

    4. Columns are arranged in a specific order, which is specified when creating the table, in contrast to rows, which are arranged in a random order. The table may not have a single row, but it must have at least one column.

    Relational database terminology:

    Relational database element Presentation form
    1. Database Set of tables
    2. Database schema Set of table headers
    3. Attitude Table
    4. Relationship diagram Table column header row
    5. Essence Description of object properties
    6. Attribute Column header
    7. Domain Set of valid attribute values
    8. Primary key A unique identifier that uniquely identifies each record in the table
    9. Data type Type of element values ​​in the table
    10. Cortege String (record)
    11. Cardinality Number of rows in the table
    12. Degree of relationship Number of fields
    13. Body of relationship Set of relation tuples

    When designing a relational database, data is placed in several tables. Relationships are established between tables using keys. When linking tables, a main and an additional (subordinate) table are distinguished.

    There are the following types of relationships between tables:

    1. 1:1 relationship (one to one) means that each record in the main table corresponds to one record in the additional table and, conversely, each record in the additional table corresponds to one record in the main table.

    2. Communication type 1:M (one to many) means that each record in the main table corresponds to several records in the additional table and, conversely, each record in the additional table corresponds to only one record in the main table.

    3. Relationship type M:1 (many to one) means that one or more records in the main table correspond to only one record in the secondary table.

    4. M:M relationship (many to many) – this is when several records of the main table correspond to several records of the additional table and vice versa.

    5. Basic components of MS Access.

    The main components (objects) of MS Access are:

    1. Tables;

    3. Forms;

    4. Reports;

    5. Macros:

    Modules

    Table is an object designed to store data in the form of records (rows) and fields (columns). Each field contains a different part of a record, and each table is used to store information about one specific issue.

    Request – a question about data stored in tables, or instructions for selecting records to be changed.

    Form is an object in which you can place controls intended for entering, displaying and changing data in table fields.

    Report is an object that allows you to present user-defined information in a certain form, view and print it.

    Macro – one or more macro commands that can be used to automate a specific task. A macro is the basic building block of a macro; a stand-alone instruction that can be combined with other macro instructions to automate the completion of a task.

    Module – a set of descriptions, instructions and procedures stored under one name. MS Access has three types of modules: form module, report module and general module. Form and report modules contain local program for forms and reports.

    6. Tables in MS Access.

    MS Access has the following methods for creating tables:

    1. Table mode;

    2. Constructor;

    3. Table Wizard;

    4. Import tables;

    5. Communication with tables.

    IN table mode The data is entered into an empty table. A table with 30 fields is provided for data entry. After saving it, MS Access itself decides what data type to assign to each field.

    Constructor provides the ability to independently create fields, select data types for fields, field sizes and set field properties.

    To define a field in mode Constructor are given:

    1. Field name , which in each table must have a unique name, which is a combination of letters, numbers, spaces and special characters, except for " .!” “ " The maximum name length is 64 characters.

    2. Data type defines the type and range of valid values, as well as the amount of memory allocated for this field.

    MS Access Data Types

    Data type Description
    Text Text and numbers, such as names and addresses, telephone numbers, postal codes (up to 255 characters).
    Memo field Long text and numbers, such as comments and explanations (up to 64,000 characters).
    Numerical General type data for numerical data that allows mathematical calculations, with the exception of monetary calculations.
    Date/time Date and time values. The user can choose standard forms or create a custom format.
    Monetary Monetary values. For monetary calculations, it is not recommended to use numeric data types, because they may be rounded during calculations. Currency values ​​are always output with the specified number of decimal places.
    Counter Automatically assigned sequential numbers. Numbering starts from 1. The counter field is convenient for creating a key. This field is compatible with a numeric field whose Size property is set to Long Integer.
    Logical Values ​​"Yes/No", "True/False", "On/Off", one of two possible values.
    OLE Object Field Objects created in other programs that support the OLE protocol.

    3. The most important field properties:

    - Field size specifies the maximum size of data stored in the field.

    - Field Format is a format for displaying a given data type and sets the rules for presenting data when displaying it on the screen or printing it.

    - Field signature sets the text that is displayed in tables, forms, and reports.

    - Condition on value allows you to control input, sets restrictions on entered values, if conditions are violated, prohibits input and displays the text specified by the Error message property;

    - Error message sets the text of the message displayed on the screen when the restrictions specified by the Value Condition are violated.

    Control type– a property that is set on the Substitution tab in the table designer window. This property determines whether the field will be displayed in the table and in what form - as a field or combo box.

    Unique (primary) key tables can be simple or complex, including several fields.

    To define a key, select the fields that make up the key and click the button on the toolbar key field or the command is executed Edit/key field.


    ©2015-2019 site
    All rights belong to their authors. This site does not claim authorship, but provides free use.
    Page creation date: 2016-02-16