Studying neural networks: where to start. Training a neural network. Backpropagation algorithm

In the chapter we became familiar with concepts such as artificial intelligence, machine learning and artificial neural networks.

In this chapter, I will describe in detail the model of an artificial neuron, talk about approaches to network training, and also describe some known types of artificial neural networks, which we will study in the following chapters.

Simplification

In the last chapter, I constantly talked about some serious simplifications. The reason for the simplifications is that no modern computers can't fast simulate complex systems such as our brain. In addition, as I already said, our brain is full of various biological mechanisms that are not related to information processing.

We need a model for converting the input signal into the output signal we need. Everything else doesn't bother us. Let's start simplifying.

Biological structure → diagram

In the previous chapter, you realized how complex biological neural networks and biological neurons are. Instead of drawing neurons as tentacled monsters, let's just draw diagrams.

Generally speaking, there are several ways graphic image neural networks and neurons. Here we will depict artificial neurons as circles.

Instead of a complex interweaving of inputs and outputs, we will use arrows indicating the direction of signal movement.

Thus, an artificial neural network can be represented as a collection of circles (artificial neurons) connected by arrows.

Electrical signals → numbers

In a real biological neural network, an electrical signal is transmitted from the network inputs to the outputs. It may change as it passes through the neural network.

An electrical signal will always be an electrical signal. Conceptually, nothing changes. But what then changes? The magnitude of this electrical signal changes (stronger/weaker). And any value can always be expressed as a number (more/less).

In our artificial neural network model, we do not need to implement the behavior of the electrical signal at all, since nothing will depend on its implementation anyway.

We will supply some numbers to the network inputs, symbolizing the magnitude of the electrical signal if it existed. These numbers will move through the network and change in some way. At the output of the network we will receive some resulting number, which is the response of the network.

For convenience, we will still call our numbers circulating in the network signals.

Synapses → connection weights

Let us recall the picture from the first chapter, in which the connections between neurons - synapses - were depicted in color. Synapses can strengthen or weaken the electrical signal passing through them.

Let's characterize each such connection with a certain number, called the weight of this connection. The signal passed through this connection, is multiplied by the weight of the corresponding connection.

This is a key point in the concept of artificial neural networks, I will explain it in more detail. Look at the picture below. Now each black arrow (connection) in this picture corresponds to a certain number \(w_i \) (the weight of the connection). And when the signal passes through this connection, its magnitude is multiplied by the weight of this connection.

In the above figure, not every connection has a weight simply because there is no space for labels. In reality, each \(i \) th connection has its own \(w_i \) th weight.

Artificial Neuron

We now move on to consider the internal structure of an artificial neuron and how it transforms the signal arriving at its inputs.

The figure below shows full model artificial neuron.

Don't be alarmed, there is nothing complicated here. Let's look at everything in detail from left to right.

Inputs, weights and adder

Each neuron, including artificial ones, must have some inputs through which it receives a signal. We have already introduced the concept of weights by which signals passing through the communication are multiplied. In the picture above, the weights are shown as circles.

The signals received at the inputs are multiplied by their weights. The signal of the first input \(x_1 \) is multiplied by the weight \(w_1 \) corresponding to this input. As a result, we get \(x_1w_1 \) . And so on until the \(n\) th input. As a result, on last login we get \(x_nw_n \) .

Now all products are transferred to the adder. Just based on its name, you can understand what it does. It simply sums all the input signals multiplied by the corresponding weights:

\[ x_1w_1+x_2w_2+\cdots+x_nw_n = \sum\limits^n_(i=1)x_iw_i \]

Mathematical help

Sigma - Wikipedia

When it is necessary to briefly write down a large expression consisting of a sum of repeating/same-type terms, the sigma sign is used.

Let's consider simplest option entries:

\[ \sum\limits^5_(i=1)i=1+2+3+4+5 \]

Thus, from below the sigma we assign the counter variable \(i \) a starting value, which will increase until it reaches the upper limit (in the example above it is 5).

The upper limit can also be variable. Let me give you an example of such a case.

Let us have \(n \) stores. Each store has its own number: from 1 to \(n\) . Every store makes a profit. Let's take some (no matter what) \(i \) th store. The profit from it is equal to \(p_i \) .

\[ P = p_1+p_2+\cdots+p_i+\cdots+p_n \]

As you can see, all terms of this sum are of the same type. Then they can be briefly written as follows:

\[ P=\sum\limits^n_(i=1)p_i \]

In words: “Sum up the profits of all stores, starting with the first and ending with \(n\) -th.” In the form of a formula, it is much simpler, more convenient and more beautiful.

The result of the adder is a number called the weighted sum.

Weighted sum(Weighted sum) (\(net \) ) - the sum of the input signals multiplied by their corresponding weights.

\[ net=\sum\limits^n_(i=1)x_iw_i \]

The role of the adder is obvious - it aggregates all input signals (of which there can be many) into one number - a weighted sum that characterizes the signal received by the neuron as a whole. Another weighted sum can be represented as the degree of general excitation of the neuron.

Example

To understand the role of the last component of an artificial neuron - the activation function - I will give an analogy.

Let's look at one artificial neuron. His task is to decide whether to go on vacation at sea. To do this, we supply various data to its inputs. Let our neuron have 4 inputs:

Trip cost
What's the weather like at sea?
Current work situation
Will there be a snack bar on the beach

We will characterize all these parameters as 0 or 1. Accordingly, if the weather at sea is good, then we apply 1 to this input. And so with all other parameters.

If a neuron has four inputs, then there must be four weights. In our example, weighting coefficients can be represented as indicators of the importance of each input, affecting general solution neuron. We distribute the input weights as follows:

It is easy to see that the factors of cost and weather at sea (the first two inputs) play a very important role. They will also play a decisive role when the neuron makes a decision.

Let us supply the following signals to the inputs of our neuron:

We multiply the weights of the inputs by the signals of the corresponding inputs:

The weighted sum for such a set of input signals is 6:

\[ net=\sum\limits^4_(i=1)x_iw_i = 5 + 0 + 0 + 1 =6 \]

This is where the activation function comes into play.

Activation function

It’s quite pointless to simply submit a weighted amount as output. The neuron must somehow process it and generate an adequate output signal. It is for these purposes that the activation function is used.

It converts the weighted sum into a certain number, which is the output of the neuron (we denote the output of the neuron by the variable \(out \) ).

For different types artificial neurons use a variety of activation functions. In general, they are denoted by the symbol \(\phi(net) \) . Specifying the weighted signal in parentheses means that the activation function takes the weighted sum as a parameter.

Activation function (Activation function)(\(\phi(net) \) ) is a function that takes a weighted sum as an argument. The value of this function is the output of the neuron (\(out \) ).

Single jump function

The simplest type of activation function. The output of a neuron can only be equal to 0 or 1. If the weighted sum is greater than a certain threshold \(b\) , then the output of the neuron is equal to 1. If lower, then 0.

How can it be used? Let's assume that we go to the sea only when the weighted sum is greater than or equal to 5. This means our threshold is 5:

In our example, the weighted sum was 6, which means the output signal of our neuron is 1. So, we are going to the sea.

However, if the weather at sea were bad and the trip was very expensive, but there was a snack bar and the work environment was normal (inputs: 0011), then the weighted sum would be equal to 2, which means the output of the neuron would be equal to 0. So, We're not going anywhere.

Basically, a neuron looks at a weighted sum and if it is greater than its threshold, then the neuron produces an output equal to 1.

Graphically, this activation function can be depicted as follows.

The horizontal axis contains the values of the weighted sum. On the vertical axis are the output signal values. As is easy to see, only two values of the output signal are possible: 0 or 1. Moreover, 0 will always be output from minus infinity up to a certain value of the weighted sum, called the threshold. If the weighted sum is equal to or greater than the threshold, then the function returns 1. Everything is extremely simple.

Now let's write this activation function mathematically. You've almost certainly come across the concept of a compound function. This is when we combine several rules under one function by which its value is calculated. In the form of a composite function, the single jump function will look like this:

\[ out(net) = \begin(cases) 0, net< b \\ 1, net \geq b \end{cases} \]

There is nothing complicated about this recording. The output of a neuron (\(out \) ) depends on the weighted sum (\(net \) ) as follows: if \(net \) (weighted sum) is less than some threshold (\(b \ ) ), then \(out \) (neuron output) is equal to 0. And if \(net \) is greater than or equal to the threshold \(b \) , then \(out \) is equal to 1 .

Sigmoid function

In fact, there is a whole family of sigmoid functions, some of which are used as activation functions in artificial neurons.

All these functions have some very useful properties, for which they are used in neural networks. These properties will become apparent once you see graphs of these functions.

So... the most commonly used sigmoid in neural networks is logistic function.

The graph of this function looks quite simple. If you look closely, you can see some resemblance English letter\(S \) , hence the name of the family of these functions.

And this is how it is written analytically:

\[ out(net)=\frac(1)(1+\exp(-a \cdot net)) \]

What is the parameter \(a \) ? This is some number that characterizes the degree of steepness of the function. Below are logistic functions with different parameters \(a \) .

Let's remember our artificial neuron, which determines whether it is necessary to go to the sea. In the case of the single jump function, everything was obvious. We either go to the sea (1) or not (0).

Here the case is closer to reality. We are not completely sure (especially if you are paranoid) - is it worth going? Then using the logistic function as an activation function will result in you getting a number between 0 and 1. Moreover, the larger the weighted sum, the closer the output will be to 1 (but will never be exactly equal to it). Conversely, the smaller the weighted sum, the closer the neuron's output will be to 0.

For example, the output of our neuron is 0.8. This means that he believes that going to the sea is still worth it. If his output were equal to 0.2, then this means that he is almost certainly against going to the sea.

What remarkable properties does the logistics function have?

it is a “compressive” function, that is, regardless of the argument (weighted sum), the output signal will always be in the range from 0 to 1
it is more flexible than the single jump function - its result can be not only 0 and 1, but any number in between
at all points it has a derivative, and this derivative can be expressed through the same function

It is because of these properties that the logistic function is most often used as an activation function in artificial neurons.

Hyperbolic tangent

However, there is another sigmoid - the hyperbolic tangent. It is used as an activation function by biologists to create a more realistic model of a nerve cell.

This function allows you to get output values of different signs (for example, from -1 to 1), which can be useful for a number of networks.

The function is written as follows:

\[ out(net) = \tanh\left(\frac(net)(a)\right) \]

In the above formula, the parameter \(a \) also determines the degree of steepness of the graph of this function.

And this is what the graph of this function looks like.

As you can see, it looks like a graph of a logistic function. The hyperbolic tangent has all the useful properties that the logistic function has.

What have we learned?

Now you have a complete understanding of the internal structure of an artificial neuron. I'll bring it again brief description his work.

A neuron has inputs. They receive signals in the form of numbers. Each input has its own weight (also a number). The input signals are multiplied by the corresponding weights. We get a set of “weighted” input signals.

The weighted sum is then converted activation function and we get neuron output.

Let us now formulate the shortest description of the operation of a neuron – its mathematical model:

Mathematical model of an artificial neuron with \(n \) inputs:

Where
\(\phi \) – activation function
\(\sum\limits^n_(i=1)x_iw_i \) – weighted sum, as the sum of \(n\) products of input signals by the corresponding weights.

Types of ANN

We have figured out the structure of an artificial neuron. Artificial neural networks consist of a collection of artificial neurons. A logical question arises - how to place/connect these same artificial neurons to each other?

As a rule, most neural networks have a so-called input layer, which performs only one task - distributing input signals to other neurons. The neurons in this layer do not perform any calculations.

Single-layer neural networks

In single-layer neural networks, signals from the input layer are immediately fed to the output layer. It performs the necessary calculations, the results of which are immediately sent to the outputs.

A single-layer neural network looks like this:

In this picture, the input layer is indicated by circles (it is not considered a neural network layer), and on the right is a layer of ordinary neurons.

Neurons are connected to each other by arrows. Above the arrows are the weights of the corresponding connections (weighting coefficients).

Single layer neural network (Single-layer neural network) - a network in which signals from the input layer are immediately fed to the output layer, which converts the signal and immediately produces a response.

Multilayer neural networks

Such networks, in addition to the input and output layers of neurons, are also characterized by a hidden layer (layers). Their location is easy to understand - these layers are located between the input and output layers.

This structure of neural networks copies the multilayer structure of certain parts of the brain.

It is no coincidence that the hidden layer got its name. The fact is that only relatively recently methods for training hidden layer neurons were developed. Before this, only single-layer neural networks were used.

Multilayer neural networks have much greater capabilities than single-layer ones.

The work of hidden layers of neurons can be compared to the work of a large factory. The product (output signal) at the plant is assembled in stages. After each machine some intermediate result is obtained. Hidden layers also transform input signals into some intermediate results.

Multilayer neural network (Multilayer neural network) - a neural network consisting of an input, an output and one (several) hidden layers of neurons located between them.

Direct distribution networks

You can notice one very interesting detail in the pictures of neural networks in the examples above.

In all examples, the arrows strictly go from left to right, that is, the signal in such networks goes strictly from the input layer to the output layer.

Direct distribution networks (Feedforward neural network) (feedforward networks) - artificial neural networks in which the signal propagates strictly from the input layer to the output layer. The signal does not propagate in the opposite direction.

Such networks are widely used and quite successfully solve a certain class of problems: forecasting, clustering and recognition.

However, no one forbids the signal to go in the opposite direction.

Feedback networks

In networks of this type, the signal can also go in the opposite direction. What's the advantage?

The fact is that in feedforward networks, the output of the network is determined by the input signal and weighting coefficients for artificial neurons.

And in networks with feedback the outputs of neurons can be returned to the inputs. This means that the output of a neuron is determined not only by its weights and input signal, but also by previous outputs (since they returned to the inputs again).

The ability of signals to circulate in a network opens up new, amazing possibilities for neural networks. Using such networks, you can create neural networks that restore or complement signals. In other words, such neural networks have the properties short term memory(like a person).

Feedback networks (Recurrent neural network) - artificial neural networks in which the output of a neuron can be fed back to its input. More generally, this means the ability to propagate a signal from outputs to inputs.

Neural network training

Now let's look at the issue of training a neural network in a little more detail. What is it? And how does this happen?

What is network training?

An artificial neural network is a collection of artificial neurons. Now let's take, for example, 100 neurons and connect them to each other. It is clear that when we apply a signal to the input, we will get something meaningless at the output.

This means we need to change some network parameters until the input signal is converted into the output we need.

What can we change in a neural network?

Changing the total number of artificial neurons makes no sense for two reasons. Firstly, increasing the number of computing elements as a whole only makes the system heavier and more redundant. Secondly, if you gather 1000 fools instead of 100, they still won’t be able to answer the question correctly.

The adder cannot be changed, since it performs one strictly defined function - adding. If we replace it with something or remove it altogether, then it will no longer be an artificial neuron at all.

If we change the activation function of each neuron, we will get a neural network that is too heterogeneous and uncontrollable. In addition, in most cases, neurons in neural networks are of the same type. That is, they all have the same activation function.

There is only one option left - change connection weights.

Neural network training (Training)- search for such a set of weighting coefficients in which the input signal, after passing through the network, is converted into the output we need.

This approach to the term “neural network training” also corresponds to biological neural networks. Our brain consists of a huge number of neural networks connected to each other. Each of them individually consists of neurons of the same type (the activation function is the same). We learn by changing synapses - elements that strengthen / weaken the input signal.

However there is one more important point. If you train a network using only one input signal, then the network will simply “remember the correct answer.” From the outside it will seem that she “learned” very quickly. And as soon as you give a slightly modified signal, expecting to see the correct answer, the network will produce nonsense.

In fact, why do we need a network that detects a face in only one photo? We expect the network to be able generalize some signs and recognize faces in other photographs too.

It is for this purpose that they are created training samples.

Training set (Training set) - a finite set of input signals (sometimes together with the correct output signals) from which the network is trained.

After the network is trained, that is, when the network produces correct results for all input signals from the training set, it can be used in practice.

However, before launching a freshly baked neural network into battle, the quality of its work is often assessed on the so-called test sample.

Test sample (Testing set) - a finite set of input signals (sometimes together with the correct output signals) by which the quality of the network is assessed.

We understood what “network training” is – choosing the right set of weights. Now the question arises - how can you train a network? In the most general case, there are two approaches that lead to different results: supervised learning and unsupervised learning.

Tutored training

The essence of this approach is that you give a signal as an input, look at the network’s response, and then compare it with a ready-made, correct response.

Important point. Do not confuse the correct answers with the known solution algorithm! You can trace the face in the photo with your finger (correct answer), but you won’t be able to tell how you did it (well-known algorithm). The situation is the same here.

Then, using special algorithms, you change the weights of the neural network connections and again give it an input signal. You compare its answer with the correct one and repeat this process until the network begins to respond with acceptable accuracy (as I said in Chapter 1, the network cannot give unambiguously accurate answers).

Tutored training (Supervised learning) is a type of network training in which its weights are changed so that the network’s answers differ minimally from the already prepared correct answers.

Where can I get the correct answers?

If we want the network to recognize faces, we can create a training set of 1000 photos (input signals) and independently select faces from it (correct answers).

If we want the network to predict price increases/declines, then the training sample must be made based on past data. As input signals, you can take certain days, the general state of the market and other parameters. And the correct answers are the rise and fall of prices in those days.

It is worth noting that the teacher, of course, is not necessarily a person. The fact is that sometimes the network has to be trained for hours and days, making thousands and tens of thousands of attempts. In 99% of cases, this role is performed by a computer, or more precisely, a special computer program.

Unsupervised learning

Unsupervised learning is used when we do not have the correct answers to the input signals. In this case, the entire training set consists of a set of input signals.

What happens when the network is trained in this way? It turns out that with such “training” the network begins to distinguish classes of signals supplied to the input. In short, the network begins clustering.

For example, you are demonstrating candy, pastries and cakes to the network. You do not regulate the operation of the network in any way. You simply feed its inputs data about this object. Over time, the network will begin to produce signals of three different types, which are responsible for the objects at the input.

Unsupervised learning (Unsupervised learning) is a type of network training in which the network independently classifies input signals. The correct (reference) output signals are not shown.

Conclusions

In this chapter, you learned everything about the structure of an artificial neuron, as well as a thorough understanding of how it works (and its mathematical model).

Moreover, you now know about different types of artificial neural networks: single-layer, multi-layer, feedforward and feedback networks.

You also learned about supervised and unsupervised network learning.

You already know the necessary theory. Subsequent chapters include consideration of specific types of neural networks, specific algorithms for their training, and programming practice.

Questions and tasks

You should know the material in this chapter very well, since it contains basic theoretical information on artificial neural networks. Be sure to achieve confident and correct answers to all the questions and tasks below.

Describe the simplifications of ANNs compared to biological neural networks.

1. The complex and intricate structure of biological neural networks is simplified and represented in the form of diagrams. Only the signal processing model is left.

2. The nature of electrical signals in neural networks is the same. The only difference is their size. We remove electrical signals, and instead we use numbers indicating the magnitude of the transmitted signal.

The activation function is often denoted by \(\phi(net) \) .

Write down a mathematical model of an artificial neuron.

An artificial neuron with \(n \) inputs converts an input signal (number) into an output signal (number) as follows:

\[ out=\phi\left(\sum\limits^n_(i=1)x_iw_i\right) \]

What is the difference between single-layer and multi-layer neural networks?

Single-layer neural networks consist of a single computational layer of neurons. The input layer sends signals directly to the output layer, which converts the signal and immediately produces the result.

Multilayer neural networks, in addition to input and output layers, also have hidden layers. These hidden layers carry out some internal intermediate transformations, similar to the stages of production of products in a factory.

What is the difference between feedforward networks and feedback networks?

Feedforward networks allow the signal to pass in only one direction - from inputs to outputs. Networks with feedback do not have these restrictions, and the outputs of neurons can be fed back into the inputs.

What is a training set? What is its meaning?

Before using the network in practice (for example, to solve current problems for which you do not have answers), you need to collect a collection of problems with ready-made answers, on which to train the network. This collection is called the training set.

If you collect too small a set of input and output signals, the network will simply remember the answers and the learning goal will not be achieved.

What is meant by network training?

Network training is the process of changing the weighting coefficients of the artificial neurons of the network in order to select a combination of them that converts the input signal into the correct output.

What is supervised and unsupervised learning?

When training a network with a teacher, signals are given to its inputs, and then its output is compared with a previously known correct output. This process is repeated until the required accuracy of answers is achieved.

If networks only supply input signals, without comparing them with ready outputs, then the network begins to independently classify these input signals. In other words, it performs clustering of input signals. This type of learning is called unsupervised learning.

Neural network without feedback - perceptron

Tasks for neural networks

Most problems for which neural networks are used can be considered as special cases of the following main problems.

· Approximation - constructing a function from a finite set of values (for example, time series forecasting)

· Building relationships on a variety of objects (for example, problems of recognizing images and sound signals).

· Distributed information retrieval and associative memory(for example, problems of finding implicit dependencies in large data sets).

· Filtering (for example, identifying signal changes that are “visible to the naked eye” but difficult to describe analytically).

· Information compression (for example, neural network implementations of compression algorithms for sounds, static and dynamic images).

· Identification dynamic systems and their management.

The multi-layer neural network with multiple outputs shown in the figure below is a perceptron.

The circuit can be supplemented with an adder, which, if necessary, combines the output signals of neurons into one common output.

The number of layers in a perceptron can vary, depending on the complexity of the problem. It has been mathematically proven (Kolmogorov’s theorem) that three full-fledged neural layers are enough to approximate any mathematical function(subject to the possibility of unlimitedly increasing the number of neurons in the hidden layer).

The perceptron operates in a discrete time mode - a static set of signals (input vector) is applied to the input, the combined state of the outputs (output vector) is assessed, then the next vector is applied to the input, etc. It is assumed that the signal in the perceptron propagates from input to output instantly , i.e. time delays in signal transmission from neuron to neuron, from layer to layer and associated dynamic transient processes none. Since the perceptron has no feedback (neither positive nor negative), then at each moment of time any input vector of values uniquely corresponds to a certain output vector, which will not change as long as the NS inputs remain unchanged.

Perceptron theory is the basis for many other types of artificial neural networks, and perceptrons themselves are the logical starting point for the study of artificial neural networks.

To train a neural network means to tell it what we want from it. This process is very similar to teaching a child the alphabet. After showing the child a picture of the letter “A”, we ask him: “What letter is this?” If the answer is incorrect, we tell the child the answer we would like him to give: “This is the letter A.” The child remembers this example along with the correct answer, that is, some changes occur in his memory in the right direction. We will repeat the process of presenting the letters over and over again until all 33 letters are firmly memorized. This process is called "training with a teacher" .

When training a neural network, we act in exactly the same way. Suppose we have a table - a database containing examples (a coded set of images of letters). By presenting an image of the letter “A” to the input of the neural network, we expect (ideally) that the signal level will be maximum (=1) at the output OUT1 (A is letter No. 1 in the alphabet of 33 letters) and minimum (=0) .

So the table called learning set , will look like (only the first line is filled in as an example):

Letter	Login vector	Desired output vector
X1	X2	…	X12	TARGET1	TARGET2	…	TARGET33
A
B
…
Yu
I

Collection of vectors for each example of the training set (table rows) is called teaching pair .

In practice, an untrained neural network will not perform as we would ideally expect, that is, for all or most examples, the error vectors will contain elements significantly different from zero.

Neural network training algorithm is a set of mathematical actions that allows, using the error vector, to calculate such corrections for the weights of the neural network so that the total error (to control the learning process, the sum of squared errors over all outputs is usually used) decreases. Applying these actions over and over again, we achieve a gradual reduction in the error for each example (A, B, C, etc.) of the training set.

After such cyclic repeated adjustment of the weights, the neural network will give correct (or almost correct) answers to all (or almost all) examples from the database, i.e., the total error values will reach zero or an acceptable small level for each training pair. In this case, they say that the “neural network is trained,” i.e., it is ready for use on new, unknown in advance , data.

IN general view The supervised learning algorithm will look like this:

1. Initialize synaptic weights with small random values.

2. Select the next training pair from the training set; submit the input vector to the network input.

3. Calculate the network output.

4. Calculate the difference between the network output and the required output (target vector of the training pair).

5. Adjust the network weights to minimize errors.

6. Repeat steps 2 to 5 for each pair of the training set until the error on the entire set reaches an acceptable level.

The specific type of mathematical operations performed at stage 5 determines the type of learning algorithm. For example, for single-layer perceptrons, a simple algorithm is used, based on the so-called. delta rule , for perceptrons with any number of layers is widely used backpropagation procedure , there is a known group of algorithms with interesting properties called stochastic learning algorithms etc. All known algorithms for training neural networks are essentially varieties of gradient optimization methods not linear function many variables. The main problem that arises during their practical implementation is that you can never know for sure that the resulting combination of synaptic weights is really the most effective in terms of minimizing the total error over the entire training set. This uncertainty is called the “local minima problem of the goal function.”

In this case, the goal function is understood as the selected integral scalar indicator , characterizing the quality of processing by the neural network of all examples of the training set - for example, the sum of standard deviations OUT from TARGET for each training pair. The lower the achieved value of the goal function, the higher the quality of the neural network on a given training set. Ideally (in practice, achievable only for the simplest problems), it is possible to find a set of synaptic weights such that .

Surface function target complex network is highly rugged and consists of hills, valleys, folds and ravines in high-dimensional space. A network trained using the gradient method can fall into a local minimum (shallow valley) when there is a much deeper minimum nearby. At a local minimum point, all directions point upward, and the algorithm is unable to escape from it.

Thus, if, as a result of an attempt to train a neural network, the required accuracy was not achieved, then the researcher faces two alternatives:

1. Assume that the process is trapped in a local minimum and try to apply some other type of learning algorithm for the same network configuration.

2. Assume that a global minimum of the goal function has been found for this specific network configuration and try to complicate the network - increase the number of neurons, add one or more layers, move from a fully connected to a partially connected network, taking into account a priori known dependencies in the structure of the training set, etc.

Algorithms called learning without a teacher . In this case, the network is tasked with independently finding groups of input vectors “similar to each other” in the presented set of examples, generating a high level at one of the outputs (without determining in advance which one). But even with this formulation of the problem, the problem of local minima also occurs, although in an implicit form, without a strict mathematical definition of the goal function (since the very concept of the goal function implies the presence of a given reference response of the network, i.e., a “teacher”) - “Has the neural network really learned to select clusters of input vectors in the best possible way for this particular configuration?”

4. Training a neural network.

4.1 General information about neural networks

Artificial neural networks are models based on modern ideas about the structure of the human brain and the information processing processes occurring in it. ANNs have already found wide application in tasks: information compression, optimization, pattern recognition, building expert systems, signal and image processing, etc.

Communication between biological and artificial neurons

Figure 20 – Structure of a biological neuron

Nervous system a person consists of a huge number of interconnected neurons, about 10 11; the number of connections is calculated at 10 15.

Let's schematically imagine a pair of biological neurons (Figure 20). A neuron has several input processes - dendrites, and one output - an axon. Dendrites receive information from other neurons, and the axon transmits it. The area where an axon connects to a dendrite (contact area) is called a synapse. The signals received by the synapses are brought to the body of the neuron, where they are summed. In this case, one part of the input signals is excitatory, and the other is inhibitory.

When the input influence exceeds a certain threshold, the neuron enters an active state and sends a signal along the axon to other neurons.

An artificial neuron is mathematical model biological neuron (Figure 21). Let's denote the input signal by x, and the set of input signals by the vector X = (x1, x2, ..., xN). The output signal of the neuron will be denoted by y.

Let's draw a functional diagram of a neuron.

Figure 21 – Artificial neuron

To indicate the exciting or inhibitory effect of the input, we introduce the coefficients w 1, w 1, ..., w N - for each input, that is, a vector

W = (w 1, w 1, …, w N), w 0 – threshold value. The input influences X, weighted on the vector W, are multiplied with the corresponding coefficient w, summed up and signal g is generated:

The output signal is some function of g

where F is the activation function. It can be of various types:

1) step threshold

In general:

2) linear, which is equivalent to the absence of a threshold element at all

F(g) = g

3) piecewise linear, obtained from linear by limiting the range of its change within , that is

4) sigmoidal

5) multi-threshold

6) hyperbolic tangent

F(g) = tanh(g)

Most often, input values are converted to the XÎ range. When w i = 1 (i = 1, 2,…, N) the neuron is a majority element. The threshold in this case takes the value w 0 = N/2.

Another version of a conventional image of an artificial neuron is shown in Figure 22

Figure 22 – Symbol artificial neuron

From a geometric point of view, a neuron with a linear activation function describes the equation of the line if the input has one value x 1

or plane, when the input is a vector of X values

Structure (architecture, topology) of neural networks

There are many ways to organize an ANN, depending on: the number of layers, the shape and direction of the connections.

Let us depict an example of the organization of neural networks (Figure 23).

Single layer structure Double layer structure with feedback loops

Two-layer structure Straight-bonded three-layer structure Straight-bonded

Figure 23 – Examples of neural network structures

Figure 24 shows a three-layer NS with direct connections. The layer of neurons that directly receives information from the external environment is called the input layer, and the layer that transmits information to the external environment is called the output layer. Any layer lying between them and not having contact with the external environment is called an intermediate (hidden) layer. There may be more layers. In multilayer networks, as a rule, neurons of one layer have an activation function of the same type.

Figure 24 – Three-layer neural network

When constructing a network, the initial data are:

– dimension of the input signal vector, that is, the number of inputs;

– dimension of the output signal vector. The number of neurons in the output layer is usually equal to the number of classes;

– formulation of the problem to be solved;

– accuracy of problem solving.

For example, when solving the problem of detecting a useful signal, a NN may have one or two outputs.

The creation or synthesis of a neural network is a problem that is currently not theoretically solved. It is private.

Training neural networks

One of the most remarkable properties of neural networks is their ability to learn. Despite the fact that the process of training a neural network differs from human training in the usual sense, at the end of such training similar results are achieved. The purpose of training a neural network is to configure it for a given behavior.

The most common approach in training neural networks is connectionism. It involves training the network by adjusting the values of the weight coefficients wij corresponding to the various connections between neurons. The matrix W of weight coefficients wij of the network is called a synaptic map. Here index i is serial number the neuron from which the connection comes, that is, the previous layer, and j is the number of the neuron of the subsequent layer.

There are two types of NN training: supervised learning and unsupervised learning.

Supervised learning consists of presenting the network with a sequence of trained pairs (examples) (Xi, Hi), i = 1, 2, ..., m images, which is called the training sequence. In this case, for each input image Xi, the network response Y i is calculated and compared with the corresponding target image Hi. The resulting mismatch is used by the learning algorithm to adjust the synaptic map in such a way as to reduce the mismatch error. Such adaptation is carried out by cyclically presenting the training sample until the mismatch error reaches a sufficiently low level.

Although the process of supervised learning is understandable and widely used in many neural network applications, it still does not fully correspond to the actual processes that occur in the human brain during the learning process. When learning, our brain does not use any images, but itself generalizes information coming from outside.

In the case of unsupervised learning, the training sequence consists only of input images Xi. The learning algorithm adjusts the weights so that similar input vectors correspond to identical output vectors, that is, it actually partitions the space of input images into classes. Moreover, before training, it is impossible to predict which output images will correspond to the classes of input images. It is possible to establish such a correspondence and give it an interpretation only after training.

NN training can be viewed as a continuous or discrete process. In accordance with this, learning algorithms can be described either differential equations, or finite-difference. In the first case, the NS is implemented on analogue, in the second - on digital elements. We will only talk about finite-difference algorithms.

In fact, a neural network is a specialized parallel processor or program that emulates a neural network on a serial computer.

Most NN learning algorithms (ALs) grew out of Hebb's concept. He proposed a simple unsupervised algorithm in which the value of the weight w ij corresponding to the connection between the i-th and j-th neurons increases if both neurons are in an excited state. In other words, during the learning process, connections between neurons are corrected in accordance with the degree of correlation of their states. This can be expressed as the following finite difference equation:

where w ij (t + 1) and w ij (t) are the weight values of connections between neuron i and neuron j before adjustment (at step t+1) and after adjustment (at step t), respectively; v i (t) – output of neuron i and output of neuron j at step t; v j (t) – output of neuron j at step t; α is the learning rate parameter.

Neural Network Training Strategy

Along with the training algorithm, the network training strategy is equally important.

One of the approaches is to sequentially train the network on a series of examples (Х i , H i) i = 1, 2, …, m, making up the training set. In this case, the network is trained to respond correctly first to the first image X 1, then to the second X 2, etc. However, in this strategy there is a danger that the network will lose previously acquired skills when learning each next example, that is, the network may “forget” previously presented examples. To prevent this from happening, the network must be trained on all examples of the training set at once.

X 1 =(X 11,…, X 1 N) can be taught 100 c 1

X 2 = (X 21,..., X 2 N) 100 qt 2 100 qt

……………………

X m = (X m 1,…, X mN) 100 c 3

Since solving the learning problem is very difficult, an alternative is to minimize the objective function of the form:

where l i are the parameters that determine the quality requirements for neural network training for each example, such that λ 1 + λ 2 + … + λ m = 1.

Practical part.

Let's create a training set:

P_o=cat(1, Mt, Mf);

Let's set the structure of the neural network for the detection task:

net = newff (minmax(P_o), , ("logsig", "logsig"), "trainlm", "learngdm");

net.trainParam.epochs = 100;% specified number of training cycles

net.trainParam.show = 5;% number of cycles to show intermediate results;

net.trainParam.min_grad = 0;% target gradient value

net.trainParam.max_fail = 5;% maximum permissible factor of excess of the test sample error compared to the achieved minimum value;

net.trainParam.searchFcn = "srchcha";% name of the one-dimensional optimization algorithm used

net.trainParam.goal = 0;% target training error

The newff function is designed to create a “classical” multilayer neural network with backpropagation training. This function contains several arguments. The first argument of the function is a matrix of minimum and maximum values training set P_o, which is determined using the expression minmax (P_o).

The second arguments of the function are specified in square brackets and determine the number and size of layers. The expression means that the neural network has 2 layers. In the first layer there are npr=10 neurons, and in the second layer there are 2. The number of neurons in the first layer is determined by the dimension of the input feature matrix. Depending on the number of features in the first layer there can be: 5, 7, 12 neurons. The dimension of the second layer (output layer) is determined by the problem being solved. In the tasks of detecting a useful signal against the background of microseism, classification into the first and second classes, 2 neurons are specified at the output of the neural network.

The third arguments of the function determine the type of activation function in each layer. The expression ("logsig", "logsig") means that each layer uses a sigmoid-logistic activation function, the range of which is (0, 1).

The fourth argument specifies the type of the neural network training function. The example specifies a training function that uses the Levenberg-Marquardt optimization algorithm - "trainlm".

The first half of the vectors of matrix T are initialized with the values (1, 0), and the subsequent ones – (0, 1).

net=newff (minmax(P_o), , ("logsig", "logsig"), "trainlm", "learngdm");

net.trainParam.epochs = 1000;

net.trainParam.show = 5;

net.trainParam.min_grad = 0;

net.trainParam.max_fail = 5;

net.trainParam.searchFcn = "srchcha";

net.trainParam.goal = 0;

Program for initializing the desired outputs of the neural network T:

n1=length(Mt(:, 1));

n2=length(Mf(:, 1));

T1=zeros (2, n1);

T2=zeros (2, n2);

T=cat(2, T1, T2);

Neural network training:

net = train(net, P_o, T);

Figure 25 – Neural network training schedule.

Let's control the neural network:

Y_k=sim(net, P_k);

The sim command transmits data from the control set P_k to the input of the neural net networks, and the results are written to the output matrix Y_k. The number of rows in the matrices P_k and Y_k is the same.

Pb=sum (round(Y_k (1,1:100)))/100

Estimation of the probability of correct detection of tracked vehicles Pb=1 alpha = sum (round(Y_k (1,110:157)))/110

False alarm probability estimate alpha =0

We determine the mean square control error using the desired and real outputs of the neural network Ek.

The value of the root mean square error of control is:

sqe_k = 2.5919e-026

Let's test the operation of the neural network. To do this, we will form a matrix of test signal characteristics:

h3=tr_t50-mean(tr_t50);

Mh1=MATRPRIZP (h3,500, N1, N2);

Mh1=Mh1 (1:50,:);

Y_t=sim(net, P_t);

Pb=sum (round(Y_t (1,1:100)))/100

Estimation of the probability of correct detection of tracked vehicles Pb=1

We find the difference between the desired and actual outputs of the neural network E and determine the root mean square error of testing.

The value of the root mean square error of testing is:

sqe_t = 3.185e-025

Conclusion: in this section we built a seismic signal detector model using a neural network with backpropagation training. The detection problem is solved with small errors, therefore the signs are suitable for detection.

This two-layer neural network can be used to build an object detection system.

Conclusion

The purpose of this course work was the study of information processing methods and their application to solve object detection problems.

During the work done, which was carried out in four stages, the following results were obtained:

1) Histograms of sample probability densities of signal amplitudes were constructed as random variables.

The distribution parameters were estimated: mathematical expectation, dispersion, standard deviation.

We made an assumption about the amplitude distribution law and tested the hypothesis using the Kolmogorov-Smirnov and Pearson tests at a significance level of 0.05. According to the Kolmogorov-Smirnov criterion, the distribution was selected correctly. According to the Pearson criterion, the distribution was selected correctly only for the background signal. For him, the hypothesis of normal distribution was accepted.

Took signals for implementation random functions and built correlation functions for them. By correlation functions determined that the signals have a random oscillatory nature.

2) We generated training and control data sets (for training and control of the neural network).

3) For the training matrix, the parameters of the distribution of features were estimated: mathematical expectation, dispersion, standard deviation. For each feature of the training matrix of the given classes, the distance was calculated and the feature with the maximum difference was selected. We calculated the decision threshold and plotted probability density curves on one graph. The decisive rule was formulated.

4) We trained a two-layer neural network to solve the classification problem. The probabilities of correct detection and false alarms were assessed. The same indicators were assessed using test signals.

Diseases resulting from respiratory paralysis. 4. Incendiary weapons An important place in the system of conventional weapons belongs to incendiary weapons, which are a set of weapons based on the use of incendiary substances. According to the American classification, incendiary weapons are classified as weapons of mass destruction. The ability of an incendiary...

5. Long-term continuous series of observations of the flux intensity and azimuthal distributions of atmospheric VSDs were obtained, which made it possible to trace the dynamics of thunderstorm activity in the world's thunderstorm centers. 5.1. Marine monitoring has shown that the main contribution to global thunderstorm activity comes from continental and island thunderstorm centers. Variations in the intensity of the pulse flow are good...

The coherence signal eliminates random, side-stream measurement results without losing the sensitivity of the frequency meter. Spectrum analyzers This already quite developed, but still promising type of radio monitoring equipment is intended for scanning frequency spectra modulated signals in various frequency ranges and displaying these spectra on the display/oscilloscope screen. In case...

The most important property of neural networks is their ability to learn from environmental data and improve their performance as a result of learning. Productivity increases occur over time according to certain rules. The neural network is trained through an interactive process of adjusting synaptic weights and thresholds. Ideally, a neural network gains knowledge about its environment at each iteration of the learning process.

There are quite a few types of activities associated with the concept of learning, so it is difficult to give this process an unambiguous definition. Moreover, the learning process depends on the point of view on it. This is what makes it almost impossible for any precise definition this concept. For example, the learning process from the point of view of a psychologist is fundamentally different from learning from the point of view of a school teacher. From a neural network perspective, the following definition can probably be used:

Training is a process in which the free parameters of a neural network are tuned by simulating the environment in which the network is embedded. The type of training is determined by the way these parameters are adjusted.

This definition of the neural network learning process assumes the following sequence of events:

The neural network receives stimuli from the external environment.
As a result of the first point, the free parameters of the neural network are changed.
After changing the internal structure, the neural network responds to excitations in a different way.

The above list of clear rules for solving the problem of training a neural network is called a learning algorithm. It is easy to guess that there is no universal learning algorithm suitable for all neural network architectures. There is only a set of tools represented by a variety of learning algorithms, each of which has its own advantages. Learning algorithms differ from each other in the way they adjust the synaptic weights of neurons. Another distinctive characteristic is the way the trained neural network communicates with the outside world. In this context, we talk about a learning paradigm associated with the model of the environment in which a given neural network operates.

There are two conceptual approaches to training neural networks: supervised learning and unsupervised learning.

Supervised training of a neural network assumes that for each input vector from the training set there is a required value of the output vector, called the target. These vectors form a training pair. The network weights are changed until an acceptable level of deviation of the output vector from the target is obtained for each input vector.

Unsupervised learning of a neural network is a much more plausible learning model in terms of the biological roots of artificial neural networks. The training set consists only of input vectors. The neural network training algorithm adjusts the network weights so that consistent output vectors are obtained, i.e. so that the presentation of sufficiently close input vectors gives identical outputs.

This article contains materials - mostly in Russian - for a basic study of artificial neural networks.

An artificial neural network, or ANN, is a mathematical model, as well as its software or hardware embodiment, built on the principle of organization and functioning of biological neural networks - networks of nerve cells of a living organism. The science of neural networks has existed for quite a long time, but it is precisely in connection with the latest achievements of scientific and technological progress this area is starting to gain popularity.

Books

Let's start the selection with the classic way of studying - through books. We have selected Russian-language books from a large number examples:

F. Wasserman, Neurocomputer technology: Theory and practice. 1992
The book sets out in a publicly accessible form the basics of building neurocomputers. The structure of neural networks and various algorithms for their configuration are described. Separate chapters are devoted to the implementation of neural networks.
S. Khaikin, Neural networks: Complete course. 2006
The main paradigms of artificial neural networks are discussed here. The presented material contains a strict mathematical justification for all neural network paradigms, is illustrated with examples, descriptions of computer experiments, contains many practical problems, as well as an extensive bibliography.

D. Forsythe, Computer Vision. Modern approach. 2004
Computer vision– this is one of the most popular areas at this stage of development of global digital computer technology. It is required in manufacturing, robot control, process automation, medical and military applications, satellite surveillance, and computer personal computers, in particular, searching for digital images.

Video

There is nothing more accessible and understandable than visual learning using video:

To understand what machine learning is in general, look here these two lectures from Yandex ShAD.
Introduction into the basic principles of neural network design - great for continuing your introduction to neural networks.
Course of lectures on the topic “Computer Vision” from the Moscow State University Computing Committee. Computer vision is the theory and technology of creating artificial systems that detect and classify objects in images and videos. These lectures can be considered an introduction to this interesting and complex science.

Educational resources and useful links

Artificial Intelligence Portal.
Laboratory “I am intelligence”.
Neural networks in Matlab.
Neural networks in Python (English):
- Classifying text using ;
- Simple .
Neural network on .

A series of our publications on the topic

We have previously published a course #neuralnetwork@tproger on neural networks. In this list, publications are arranged in order of study for your convenience.