• Relational databases for dummies. Relational Database - Basic Concepts

    In relational databases, data is stored in the form of tables consisting of rows and columns. Each table has its own, predefined set of named fields. Table Columns relational base can contain scalar data of a fixed type, such as numbers, strings, or dates. Tables in a relational database can be related in a one-to-one or one-to-many relationship. The number of rows of records in the table is unlimited, and each record corresponds to a separate entity.

    Relational databases now occupy a dominant position. Hierarchical and network database structures are a thing of the past, giving way to relational databases, for which most modern DBMSs are built (MS SQL Server, MS Access, InterBase, FoxPro, PostgreSQL, Paradox and others).

    Details

    The relational model focuses on organizing data in the form of two-dimensional tables. Each relational table is two-dimensional array and has the following properties:

    • Each table element is one data element
    • Each column has its own unique name
    • There are no identical rows in the table
    • All columns in the table are homogeneous, that is, all elements in the column are of the same type
    • The order of rows and columns can be arbitrary

    Relational DBMSs, focused on implementing operational data processing systems, are less efficient in tasks analytical processing than multidimensional databases. This is due, firstly, to the presence of fairly strict restrictions imposed by the existing implementation of the SQL language. An example of such a real-world constraint is the assumption that the data in a relational database is unordered (or more accurately, randomly ordered). At the same time, their ordering requires additional time spent on sorting each time the database is accessed. In analytical systems, data is entered and retrieved in large portions. In turn, data, once it enters the database, remains unchanged for a long period of time. And here it is more effective to store data in the form of partially denormalized tables, in which not only detailed, but also pre-calculated aggregated values ​​can be stored to increase performance. And for navigation and sampling, specialized addressing and indexing methods, based on the assumption of low variability and low mobility of data in the database, can be used. This method of organizing data is sometimes called pre-computed, thereby emphasizing its difference from the normalized relational approach, which involves dynamic calculation various types results (aggregation) and establishing connections between details from different tables(connection operations).

    Main disadvantages

    In addition to the low efficiency, which was mentioned earlier, the disadvantages of traditional relational DBMSs include the fact that as the main and, often, the only mechanism that provides quick search and selecting individual rows in a table (or in tables linked through foreign keys), various modifications of indexes based on B-trees are usually used. This solution is effective only when processing small groups of records and high intensity of data modification in databases.

    Relational database management systems may never go away, but their days of dominance are certainly numbered, says Paul Creel, who wrote an article about it in InfoWorld in September 2011. He quotes analyst Robin Blore, who argues that the architecture of relational DBMSs is obsolete, since it was created in a bygone era and does not meet modern requirements.

    Relational DBMSs still dominate financial transaction processing systems, but companies today are increasingly using DBMSs new architecture NoSQL - horizontally scalable, distributed and developed in open source. Examples of such systems are Hadoop, MapReduce and VoltDB. According to Forrester analysts, about 75% of data in enterprises is either semi-structured information (XML, e-mail and EDI), or unstructured (text, images, audio and video), and only 5% of this data is stored in relational DBMSs, and the rest is stored in other types of databases or in the form of files, and is not subject to processing by relational systems.

    According to Blore, relational DBMS“they can die without anyone noticing” - for example, if Oracle simply replaces the SQL engine with NoSQL in its DBMS. The analyst believes that one of the existing columnar DBMSs could become such a mechanism.

    Appearance computer equipment in our modern times has marked an information revolution in all spheres of human activity. But in order to prevent all information from becoming unnecessary trash V global network Internet, a database system was invented in which materials are sorted, systematized, making them easy to find and submit for subsequent processing. There are three main types - relational, hierarchical, and network databases.

    Fundamental Models

    Returning to the emergence of databases, it is worth saying that this process was quite complex; it originated with the development of programmable information processing equipment. Therefore, it is not surprising that the number of their models is at the moment reaches more than 50, but the main ones are hierarchical, relational and network, which are still widely used in practice. What are they?

    Hierarchical has a tree structure and is made up of data from different levels, between which there are connections. Network model The database is a more complex pattern. Its structure resembles a hierarchical one, and its scheme is expanded and improved. The difference between them is that the descendant data of a hierarchical model can only have a connection with one ancestor, while a network model can have several of them. The structure of a relational database is much more complex. Therefore, it should be analyzed in more detail.

    Basic concept of a relational database

    This model was developed in the 1970s by Dr. Edgar Codd. It is a logically structured table with fields that describes the data, their relationships with each other, the operations performed on them, and most importantly, the rules that guarantee their integrity. Why is the model called relational? It is based on relationships (from the Latin relatio) between data. There are many definitions for this type of database. Relational tables of information are much easier to systematize and process than in a network or hierarchical model. How to do this? It is enough to know the features, structure of the model and properties of relational tables.

    The process of modeling and compiling basic elements

    In order to create your own DBMS, you should use one of the modeling tools, think about what information you need to work with, design tables and relational single and multiple relationships between data, fill in entity cells and set primary and foreign keys.

    Modeling tables and designing relational databases is done using free tools, such as Workbench, PhpMyAdmin, Case Studio, dbForge Studio. After detailed design, you should save the graphically ready relational model and translate it into ready-made SQL code. At this stage, you can begin working with data sorting, processing and systematization.

    Features, structure and terms associated with the relational model

    Each source describes its elements in its own way, so to reduce confusion I would like to give a small hint:

    • relationalTable = entity;
    • layout = attributes = field names = entity column headers;
    • entity instance = tuple = record = table string;
    • attribute value = entity cell = field .

    To move on to the properties of a relational database, you should know what basic components it consists of and what they are intended for.

    1. Essence. There can be one table in a relational database, or there can be a whole set of tables that characterize the described objects thanks to the data stored in them. They have a fixed number of fields and variable number records. Table relational model databases are made up of strings, attributes and layout.
    2. A record is a variable number of lines displaying data that characterizes the object being described. The numbering of records is carried out automatically by the system.
    3. Attributes are data that describe the columns of an entity.
    4. Field. Represents an entity column. Their number is a fixed value, set during table creation or modification.

    Now, knowing the constituent elements of the table, you can move on to the properties of the relational database model:

    • Relational database entities are two-dimensional. Thanks to this property, it is easy to perform various logical and mathematical operations with them.
    • The order of attribute values ​​and records in a relational table can be arbitrary.
    • A column within one relational table must have its own individual name.
    • All data in an entity column has a fixed length and the same type.
    • Any record is essentially considered one data item.
    • The components of the strings are one of a kind. There are no identical rows in a relational entity.

    Based on the properties, it is clear that the attribute values ​​must be of the same type and length. Let's look at the features of attribute values.

    Main characteristics of relational database fields

    Field names must be unique within one entity. Relational database attribute or field types describe what category of data is stored in entity fields. A relational database field must have a fixed size, measured in characters. The parameters and format of attribute values ​​determine how the data in them is corrected. There is also such a thing as a “mask” or “input template”. It is intended to define the data entry configuration for an attribute value. An error message must be issued if you enter something incorrect in a field. Also, some restrictions are imposed on the field elements - conditions for checking the accuracy and error-freeness of data entry. There is some required attribute value that must definitely be filled with data. Some attribute strings may be filled with NULL values. Blank data is allowed in field attributes. Like the error notification, there are values ​​that are filled in automatically by the system - this is the default data. An indexed field is designed to speed up the search for any data.

    Diagram of a two-dimensional relational database table

    To understand the model in detail using SQL, it is best to look at the diagram with an example. We already know what a relational database is. A record in each table is one data element. To prevent data redundancy, normalization operations must be performed.

    Basic rules for normalizing a relational entity

    1. The field name value for a relational table must be unique, one of a kind (first normal form - 1NF).

    2. For a table that is already cast to 1NF, the name of any non-identifying column must be dependent on the table's unique identifier (2NF).

    3. For an entire table that is already in 2NF, each non-identifying field cannot depend on an element of another unidentified value (3NF entity).

    Databases: relational relationships between tables

    There are 2 main relational tables:

    • "One-many". Occurs when one key record of table No. 1 corresponds to several instances of the second entity. A key icon at one end of a drawn line indicates that the entity is on the “one” side; the other end of the line is often marked with an infinity symbol.

    • A “many-many” relationship is formed when an explicit logical interaction occurs between several rows of one entity with a number of records of another table.
    • If a one-to-one concatenation occurs between two entities, this means that the key identifier of one table is present in the other entity, then one of the tables should be removed, it is redundant. But sometimes, purely for security reasons, programmers deliberately separate the two entities. Therefore, hypothetically, a one-to-one relationship could exist.

    Existence of keys in a relational database

    Primary and secondary keys define potential database relationships. Relational relationships in a data model can have only one potential key, and this will be the primary key. What is he like? A primary key is an entity column or set of attributes through which data can be accessed for a specific row. It must be unique, unique, and its fields cannot contain empty values. If the primary key consists of only one attribute, then it is called simple, otherwise it will be a component.

    In addition to the primary key, there is also a foreign key. Many people don't understand the difference between them. Let's look at them in more detail using an example. So, there are 2 tables: “Dean’s Office” and “Students”. The “Dean’s Office” entity contains the following fields: “Student ID”, “Full name” and “Group”. The “Students” table has attribute values ​​such as “Name”, “Group” and “GPA”. Since the student ID cannot be the same for several students, this field will be primary key. “Full name” and “Group” from the “Students” table can be the same for several people; they refer to the student ID number from the “Dean’s office” entity, so they can be used as a foreign key.

    Example relational database model

    For clarity, we give a simple example of a relational database model consisting of two entities. There is a table called "Dean's Office".

    It is necessary to make connections to create a full-fledged relational database. The entry “IN-41”, like “IN-72”, may appear more than once in the “Dean’s Office” sign, and in rare cases the last, first and patronymic names of students may coincide, so these fields cannot be made the primary key. Let's show the entity "Students".

    As we can see, the field types of relational databases are completely different. Present as digital recordings, and symbolic. Therefore, in the attribute settings you should specify the values ​​\u200b\u200binteger, char, vachar, date and others. In the table "Dean's office" unique value is only the student ID. This field can be taken as the primary key. Full name, group and phone number from the “Students” entity can be taken as a foreign key referencing the student ID. The connection has been established. This is an example of a one-to-one relationship model. Hypothetically, one of the tables is redundant; they can be easily combined into one entity. To prevent student ID numbers from becoming publicly known, it is entirely possible to have two tables.

    RELATIONAL DATABASE AND ITS FEATURES. TYPES OF RELATIONS BETWEEN RELATIONAL TABLES

    Relational database is a collection of interconnected tables, each of which contains information about objects of a certain type. A table row contains data about one object (for example, a product, a customer), and the table columns describe various characteristics these objects - attributes (for example, name, product code, customer information). Records, i.e. table rows, have the same structure - they consist of fields that store object attributes. Each field, i.e. column, describes only one characteristic of the object and has a strictly defined data type. All records have the same fields, only they display different information properties of the object.

    In a relational database, each table must have a primary key - a field or combination of fields that uniquely identifies each row in the table. If a key consists of several fields, it is called composite. The key must be unique and uniquely identify the entry. Using the key value, you can find a single record. Keys also serve to organize information in the database.

    Relational database tables must meet the requirements for normalizing relationships. Normalization of relations is a formal apparatus of restrictions on the formation of tables, which eliminates duplication, ensures consistency of data stored in the database, and reduces labor costs for maintaining the database.

    Let a Student table be created containing the following fields: group number, full name, student record number, date of birth, specialty name, faculty name. Such an organization of information storage will have a number of disadvantages:

    • duplication of information (the name of the specialty and faculty is repeated for each student), therefore, the volume of the database will increase;
    • the procedure for updating information in the table is complicated due to the need to edit each table entries.

    Table normalization is designed to address these shortcomings. Available three normal forms of relationships.

    First normal form. A relational table is reduced to first normal form if and only if none of its rows contains more than one value in any of its fields and none of its key fields is empty. So, if you need to obtain information from the Student table by the student’s name, then the Full Name field should be divided into Last Name, First Name, and Patronymic parts.

    Second normal form. A relational table is defined in second normal form if it satisfies the requirements of first normal form and all its fields that are not included in the primary key have a full functional dependence on the primary key. To reduce a table to second normal form, it is necessary to determine the functional dependence of the fields. A functional dependence of fields is a dependence in which in an instance of an information object a certain value of a key attribute corresponds to only one value of a descriptive attribute.

    Third normal form. A table is in third normal form if it satisfies the requirements of second normal form that none of its non-key fields is functionally dependent on any other non-key field. For example, in the Student table (group number, full name, grade book number, Date of birth, Headman) three fields - grade book number, group number, Headman are in transitive dependence. The group number depends on the grade book number, and the Headman depends on the group number. To eliminate the transitive dependency, it is necessary to transfer some of the fields of the Student table to another Group table. The tables will take the following form: Student (group number, full name, grade book number, date of birth), Group (group number, Headman).

    The following operations are possible on relational tables:

    • Merge tables with the same structure. The result is a common table: first the first, then the second (concatenation).
    • Intersection of tables with the same structure. Result - those records that are in both tables are selected.
    • Subtracting tables with the same structure. Result - those records are selected that are not in the subtracted one.
    • Sample (horizontal subset). Result - records that meet certain conditions are selected.
    • Projection (vertical subset). The result is a relation containing some of the fields from the source tables.
    • Cartesian product of two tables The resulting table's records are obtained by combining each record of the first table with each record of the other table.

    Relational tables can be related to each other, hence data can be retrieved from multiple tables simultaneously. Tables are linked to each other in order to ultimately reduce the size of the database. Each pair of tables is connected if they have identical columns.

    The following types exist information links:

    • one-to-one;
    • one-to-many;
    • many-to-many.

    One-to-one communication assumes that one attribute of the first table corresponds to only one attribute of the second table and vice versa.

    One-to-many communication assumes that one attribute of the first table corresponds to several attributes of the second table.

    Many-to-many communication assumes that one attribute of the first table corresponds to several attributes of the second table and vice versa.

    Relational database is a database based on a relational data model (RDM).

    RMD is based on the concept of relationship, or relation (relation - relationship, English, hence the term relational databases). To work with relational databases, relational DBMSs are used. The use of relational databases was proposed by Dr. Codd of IBM in 1970. These models are characterized by simplicity of data structure, user-friendly tabular representation and the ability to use the formal apparatus of relational algebra and relational calculus for data processing.

    In the RMBD the main structural unit is table (relation). The relational model is focused on organizing data in the form of two-dimensional tables. Each relational table is a two-dimensional array and has the following properties:

    Each table element is one data element;

    All columns in the table are homogeneous, i.e. all elements in a column have the same type (numeric, character, etc.) and length;

    Each column has a unique name;

    There are no identical rows in the table;

    The order of rows and columns can be arbitrary.

    The relationships are presented in the form of tables, the rows of which correspond to records, and the columns are attributes relationships, domains, fields. Each line stores data about one object, and each field characterizes one of the object’s parameters. Each table must have a unique database name.

    A field whose each value uniquely identifies the corresponding record is called a simple key (key field). If records are uniquely identified by the values ​​of several fields, then such a database table has a composite key. If there is no such field, then it must be introduced artificially. To link two relational tables, you must include the key of the first table as part of the key of the second table (the keys may coincide); otherwise, you need to enter a foreign key into the structure of the first table - the key of the second table.

    32 Basic database models (DBs)

    DB– a structured set of information related to one subject area or several related areas. All existing databases can be built on various principles, which are characterized by the concept of a database model.

    Database model determines the method of communication between objects in the database, the method of storing information on a medium (in computer memory), and the method of retrieving and presenting data. DB models: 1) hierarchical, 2) network, 3) relational.

    1) Hierarchical ( first floor. 60s) was intended for storing databases on paper and magnetic tapes. Communication structure between the data is based on Graph theory and is presented in the form of a tree (inverted). Diff. objects are created tree nodes, i.e. are on different hierarchy levels. Connections described in the categories of father-son or ancestor-descendant. Each node of the i-th level of the hierarchy relates to a node of the i-1 level (i>1), as a son relates to a father, or a father to a son, namely, a son can have one father, and a father can have one or more sons, i.e. . an object of a given i-th level relates to objects of the i+1 level as 1 relates to many (1:N, 1:∞). Flaws: 1) the user must know the structure of the tree, otherwise searching for data is difficult; 2) search required. The data always starts from the root, and then navigation is carried out along the branches of the tree.



    2) Network(second half of the 60s) to reduce the impact of the shortcomings of the previous model. Basic difference from hierarchical: there can be a connection between objects located both at the same hierarchy level and at different ones. This led to an increase in the speed of data retrieval. However, the essence flaw: The user must know the structure of such a tree.

    Basic lack of two models: very weak mathematical basis.

    3) Relational, which is based on the developed apparatus of two branches of mathematics: the theory of relations (sets) and the theory of predicates. Set theory is associated with the formalization of procedures for analyzing logical conditions. There is a two-dimensional set in it, which is called relation (relationship). In this model basic structural unit is a table (relation). Each table must have a name unique for this database in Russian or using Lat. letters

    A relational computer database, like any other database, is an IS, schematically represented:

    DBMS(DB control system) – specialized software tool(shell) or platform with the help of which the user implements all the provided functions (operations) on the data. Functions: input (insertion), modification (change), extraction (selection), deletion of data.

    ISBD has important component– database administrator, who is responsible for the safety and value of data, establishing various user access rights, etc.

    Each table consists of fields and rows. Each line stores data about one object, and each field character is one of the parameters (attributes) of this object. In a separate field m.b. data is only one type. One of the attributes or fields must identify each object in the table. This means that this field should not contain duplicate values ​​(each value is unique). If this condition is met, the field is called key(table data key). Every table must have a key field. This key is called the main key. If a key consists of the values ​​of more than one field, then it is called a composite key. Preference is given simple key. If it is not there, then it is introduced artificially (for example, a number).

    A database (DB) is a collection of information about objects, processes, events or phenomena related to a certain subject area, topic or task, organized in accordance with certain rules and maintained in computer memory. It is organized in such a way as to provide the information needs of users, as well as convenient storage of this collection of data, both as a whole and any part of it.

    A relational database is a set of interconnected tables, each of which contains information about objects of a certain type. Each row of the table contains data about one object (for example, a car, a computer, a client), and the columns of the table contain various characteristics of these objects - attributes (for example, engine number, processor brand, phone numbers of companies or clients).

    The rows of a table are called records. All table records have the same structure - they consist of fields (data elements) in which object attributes are stored (Fig. 1). Each record field contains one characteristic of the object and represents a specified data type (for example, text string, number, date). A primary key is used to identify records. A primary key is a set of table fields whose combination of values ​​uniquely identifies each record in the table.

    Rice. 1. Names of objects in the table

    Database management systems (DBMS) are used to work with data. Main functions of the DBMS:

    Data definition (description of database structure);

    Data processing;

    Data management.

    Development of the database structure is the most important task solved when designing a database. The structure of a database (the set, form and relationships of its tables) is one of the main design decisions when creating applications using a database. The database structure created by the developer is described in the DBMS data definition language.

    Any DBMS allows you to perform the following operations with data:

    Adding records to tables;

    Removing records from a table;

    Updating the values ​​of some fields in one or more records in database tables;

    Searches for one or more records that meet a specified condition.

    To perform these operations, a query mechanism is used. The result of executing queries is either a set of records selected according to certain criteria, or changes in tables. Queries to the database are formed in a language specially created for this purpose, which is called “language structured queries"(SQL - Structured Query Language).

    Data governance typically refers to protecting data from unauthorized access, supporting multi-user data processing, and ensuring data integrity and consistency.