• Training neural networks. Backpropagation algorithm

    In the chapter, we became familiar with concepts such as artificial intelligence, machine learning and artificial neural networks.

    In this chapter, I will describe in detail the artificial neuron model, talk about approaches to training the network, and also describe some well-known types of artificial neural networks that we will study in the following chapters.

    Simplification

    In the last chapter, I constantly talked about some serious simplifications. The reason for the simplifications is that no modern computers can't fast model such complex systems like our brain. In addition, as I already said, our brain is full of various biological mechanisms that are not related to information processing.

    We need a model for converting the input signal into the output signal we need. Everything else doesn't bother us. Let's start simplifying.

    Biological structure → diagram

    In the previous chapter, you realized how complex biological neural networks and biological neurons are. Instead of drawing neurons as tentacled monsters, let's just draw diagrams.

    Generally speaking, there are several ways graphic image neural networks and neurons. Here we will depict artificial neurons as circles.

    Instead of a complex interweaving of inputs and outputs, we will use arrows indicating the direction of signal movement.

    Thus, an artificial neural network can be represented as a collection of circles (artificial neurons) connected by arrows.

    Electrical signals → numbers

    In real biological neural network An electrical signal is transmitted from the network inputs to the outputs. It may change as it passes through the neural network.

    An electrical signal will always be an electrical signal. Conceptually, nothing changes. But what then changes? The magnitude of this electrical signal changes (stronger/weaker). And any value can always be expressed as a number (more/less).

    In our artificial neural network model, we do not need to implement the behavior of the electrical signal at all, since nothing will depend on its implementation anyway.

    We will supply some numbers to the network inputs, symbolizing the magnitude of the electrical signal if it existed. These numbers will move through the network and change in some way. At the output of the network we will receive some resulting number, which is the response of the network.

    For convenience, we will still call our numbers circulating in the network signals.

    Synapses → connection weights

    Let us recall the picture from the first chapter, in which the connections between neurons - synapses - were depicted in color. Synapses can strengthen or weaken the electrical signal passing through them.

    Let's characterize each such connection with a certain number, called the weight of this connection. The signal passed through this connection, is multiplied by the weight of the corresponding connection.

    This is a key point in the concept of artificial neural networks, I will explain it in more detail. Look at the picture below. Now each black arrow (connection) in this picture corresponds to a certain number ​\(w_i \) ​ (weight of the connection). And when the signal passes through this connection, its magnitude is multiplied by the weight of this connection.

    In the above figure, not every connection has a weight simply because there is no space for labels. In reality, each ​\(i \) ​th connection has its own ​\(w_i \) ​th weight.

    Artificial Neuron

    We now move on to consider the internal structure of an artificial neuron and how it transforms the signal arriving at its inputs.

    The figure below shows full model artificial neuron.

    Don't be alarmed, there is nothing complicated here. Let's look at everything in detail from left to right.

    Inputs, weights and adder

    Each neuron, including artificial ones, must have some inputs through which it receives a signal. We have already introduced the concept of weights by which signals passing through the communication are multiplied. In the picture above, the weights are shown as circles.

    The signals received at the inputs are multiplied by their weights. The signal of the first input ​\(x_1 \) ​ is multiplied by the weight ​\(w_1 \) ​ corresponding to this input. As a result, we get ​\(x_1w_1 \) ​. And so on until the ​\(n\) ​th input. As a result, on last login we get ​\(x_nw_n \) ​.

    Now all products are transferred to the adder. Just based on its name, you can understand what it does. It simply sums all the input signals multiplied by the corresponding weights:

    \[ x_1w_1+x_2w_2+\cdots+x_nw_n = \sum\limits^n_(i=1)x_iw_i \]

    Mathematical help

    Sigma - Wikipedia

    When it is necessary to briefly write down a large expression consisting of a sum of repeating/same-type terms, the sigma sign is used.

    Let's consider simplest option entries:

    \[ \sum\limits^5_(i=1)i=1+2+3+4+5 \]

    Thus, from below the sigma we assign the counter variable ​\(i \) ​ a starting value, which will increase until it reaches the upper limit (in the example above it is 5).

    The upper limit can also be variable. Let me give you an example of such a case.

    Let us have ​\(n \) stores. Each store has its own number: from 1 to ​\(n\) ​. Each store makes a profit. Let's take some (no matter what) ​\(i \) ​th store. The profit from it is equal to ​\(p_i \) ​.

    \[ P = p_1+p_2+\cdots+p_i+\cdots+p_n \]

    As you can see, all terms of this sum are of the same type. Then they can be briefly written as follows:

    \[ P=\sum\limits^n_(i=1)p_i \]

    In words: “Sum up the profits of all stores, starting with the first and ending with ​\(n\) ​-th.” In the form of a formula, it is much simpler, more convenient and more beautiful.

    The result of the adder is a number called a weighted sum.

    Weighted sum(Weighted sum) (​\(net \) ​) - the sum of the input signals multiplied by their corresponding weights.

    \[ net=\sum\limits^n_(i=1)x_iw_i \]

    The role of the adder is obvious - it aggregates all input signals (of which there can be many) into one number - a weighted sum that characterizes the signal received by the neuron as a whole. Another weighted sum can be represented as the degree of general excitation of the neuron.

    Example

    To understand the role of the last component of an artificial neuron - the activation function - I will give an analogy.

    Let's look at one artificial neuron. His task is to decide whether to go on vacation at sea. To do this, we supply various data to its inputs. Let our neuron have 4 inputs:

    1. Trip cost
    2. What's the weather like at sea?
    3. Current work situation
    4. Will there be a snack bar on the beach

    We will characterize all these parameters as 0 or 1. Accordingly, if the weather at sea is good, then we apply 1 to this input. And so with all other parameters.

    If a neuron has four inputs, then there must be four weights. In our example, weighting coefficients can be represented as indicators of the importance of each input, affecting general solution neuron. We distribute the input weights as follows:

    It is easy to see that the factors of cost and weather at sea (the first two inputs) play a very important role. They will also play a decisive role when the neuron makes a decision.

    Let us supply the following signals to the inputs of our neuron:

    We multiply the weights of the inputs by the signals of the corresponding inputs:

    The weighted sum for such a set of input signals is 6:

    \[ net=\sum\limits^4_(i=1)x_iw_i = 5 + 0 + 0 + 1 =6 \]

    This is where the activation function comes into play.

    Activation function

    It’s quite pointless to simply submit a weighted amount as output. The neuron must somehow process it and generate an adequate output signal. It is for these purposes that the activation function is used.

    It converts the weighted sum into some number, which is the output of the neuron (we denote the output of the neuron by the variable ​\(out \) ​).

    For different types artificial neurons are used by the most different functions activation. In general, they are denoted by the symbol ​\(\phi(net) \) ​. Specifying the weighted signal in parentheses means that the activation function takes the weighted sum as a parameter.

    Activation function (Activation function)(​\(\phi(net) \) ​) is a function that takes a weighted sum as an argument. The value of this function is the output of the neuron (​\(out \) ​).

    Single jump function

    The simplest type of activation function. The output of a neuron can only be equal to 0 or 1. If the weighted sum is greater than a certain threshold ​\(b\) ​, then the output of the neuron is equal to 1. If lower, then 0.

    How can it be used? Let's assume that we go to the sea only when the weighted sum is greater than or equal to 5. This means our threshold is 5:

    In our example, the weighted sum was 6, which means the output signal of our neuron is 1. So, we are going to the sea.

    However, if the weather at sea were bad and the trip was very expensive, but there was a snack bar and the work environment was normal (inputs: 0011), then the weighted sum would be equal to 2, which means the output of the neuron would be equal to 0. So, We're not going anywhere.

    Basically, a neuron looks at a weighted sum and if it is greater than its threshold, then the neuron produces an output equal to 1.

    Graphically, this activation function can be depicted as follows.

    The horizontal axis contains the values ​​of the weighted sum. On the vertical axis are the output signal values. As is easy to see, only two values ​​of the output signal are possible: 0 or 1. Moreover, 0 will always be output from minus infinity up to a certain value of the weighted sum, called the threshold. If the weighted sum is equal to or greater than the threshold, then the function returns 1. Everything is extremely simple.

    Now let's write this activation function mathematically. You've almost certainly come across the concept of a compound function. This is when we combine several rules under one function by which its value is calculated. In the form of a composite function, the single jump function will look like this:

    \[ out(net) = \begin(cases) 0, net< b \\ 1, net \geq b \end{cases} \]

    There is nothing complicated about this recording. The output of a neuron (​\(out \) ​) depends on the weighted sum (​\(net \) ​) as follows: if ​\(net \) ​ (weighted sum) is less than some threshold (​\(b \ ) ​), then ​\(out \) ​ (neuron output) is equal to 0. And if ​\(net \) ​ is greater than or equal to the threshold ​\(b \) ​, then ​\(out \) ​ is equal to 1 .

    Sigmoid function

    In fact, there is a whole family of sigmoid functions, some of which are used as activation functions in artificial neurons.

    All these features have some very beneficial properties, for which they are used in neural networks. These properties will become apparent once you see graphs of these functions.

    So... the most commonly used sigmoid in neural networks is logistic function.

    The graph of this function looks quite simple. If you look closely, you can see some resemblance English letter​\(S \) ​, which is where the name of the family of these functions comes from.

    And this is how it is written analytically:

    \[ out(net)=\frac(1)(1+\exp(-a \cdot net)) \]

    What is the parameter ​\(a \) ​? This is some number that characterizes the degree of steepness of the function. Below are logistic functions with different parameters ​\(a \) ​.

    Let's remember our artificial neuron, which determines whether it is necessary to go to the sea. In the case of the single jump function, everything was obvious. We either go to the sea (1) or not (0).

    Here the case is closer to reality. We are not completely sure (especially if you are paranoid) - is it worth going? Then using the logistic function as an activation function will result in you getting a number between 0 and 1. Moreover, the larger the weighted sum, the closer the output will be to 1 (but will never be exactly equal to it). Conversely, the smaller the weighted sum, the closer the neuron's output will be to 0.

    For example, the output of our neuron is 0.8. This means that he believes that going to the sea is still worth it. If his output were equal to 0.2, then this means that he is almost certainly against going to the sea.

    What remarkable properties does the logistics function have?

    • it is a “compressive” function, that is, regardless of the argument (weighted sum), the output signal will always be in the range from 0 to 1
    • it is more flexible than the single jump function - its result can be not only 0 and 1, but any number in between
    • at all points it has a derivative, and this derivative can be expressed through the same function

    It is because of these properties that the logistic function is most often used as an activation function in artificial neurons.

    Hyperbolic tangent

    However, there is another sigmoid - the hyperbolic tangent. It is used as an activation function by biologists to create a more realistic model of a nerve cell.

    This function allows you to get output values ​​of different signs (for example, from -1 to 1), which can be useful for a number of networks.

    The function is written as follows:

    \[ out(net) = \tanh\left(\frac(net)(a)\right) \]

    In the above formula, the parameter ​\(a \) ​ also determines the degree of steepness of the graph of this function.

    And this is what the graph of this function looks like.

    As you can see, it looks like a graph of a logistic function. The hyperbolic tangent has all the useful properties that the logistic function has.

    What have we learned?

    Now you have a complete understanding of the internal structure of an artificial neuron. I'll bring it again brief description his work.

    A neuron has inputs. They receive signals in the form of numbers. Each input has its own weight (also a number). The input signals are multiplied by the corresponding weights. We get a set of “weighted” input signals.

    The weighted sum is then converted activation function and we get neuron output.

    Let us now formulate the shortest description of the operation of a neuron - its mathematical model:

    Mathematical model of an artificial neuron with ​\(n \) ​ inputs:

    Where
    ​\(\phi \) ​ – activation function
    \(\sum\limits^n_(i=1)x_iw_i \)​ – weighted sum, as the sum of ​\(n\) ​ products of input signals by the corresponding weights.

    Types of ANN

    We have figured out the structure of an artificial neuron. Artificial neural networks consist of a collection of artificial neurons. A logical question arises - how to place/connect these same artificial neurons to each other?

    As a rule, most neural networks have a so-called input layer, which performs only one task - distributing input signals to other neurons. The neurons in this layer do not perform any calculations.

    Single-layer neural networks

    In single-layer neural networks, signals from the input layer are immediately fed to the output layer. It performs the necessary calculations, the results of which are immediately sent to the outputs.

    A single-layer neural network looks like this:

    In this picture, the input layer is indicated by circles (it is not considered a neural network layer), and on the right is a layer of ordinary neurons.

    Neurons are connected to each other by arrows. Above the arrows are the weights of the corresponding connections (weighting coefficients).

    Single layer neural network (Single-layer neural network) - a network in which signals from the input layer are immediately fed to the output layer, which converts the signal and immediately produces a response.

    Multilayer neural networks

    Such networks, in addition to the input and output layers of neurons, are also characterized by a hidden layer (layers). Their location is easy to understand - these layers are located between the input and output layers.

    This structure of neural networks copies the multilayer structure of certain parts of the brain.

    It is no coincidence that the hidden layer got its name. The fact is that only relatively recently methods for training hidden layer neurons were developed. Before this, only single-layer neural networks were used.

    Multilayer neural networks have much greater capabilities than single-layer ones.

    The work of hidden layers of neurons can be compared to the work of a large factory. The product (output signal) at the plant is assembled in stages. After each machine some intermediate result is obtained. Hidden layers also transform input signals into some intermediate results.

    Multilayer neural network (Multilayer neural network) - a neural network consisting of an input, an output and one (several) hidden layers of neurons located between them.

    Direct distribution networks

    You can notice one very interesting detail in the pictures of neural networks in the examples above.

    In all examples, the arrows strictly go from left to right, that is, the signal in such networks goes strictly from the input layer to the output layer.

    Direct distribution networks (Feedforward neural network) (feedforward networks) - artificial neural networks in which the signal propagates strictly from the input layer to the output layer. The signal does not propagate in the opposite direction.

    Such networks are widely used and quite successfully solve a certain class of problems: forecasting, clustering and recognition.

    However, no one forbids the signal to go to reverse side.

    Feedback networks

    In networks of this type, the signal can also go in the opposite direction. What's the advantage?

    The fact is that in feedforward networks, the output of the network is determined by the input signal and weighting coefficients for artificial neurons.

    And in networks with feedback the outputs of neurons can be returned to the inputs. This means that the output of a neuron is determined not only by its weights and input signal, but also by previous outputs (since they returned to the inputs again).

    The ability of signals to circulate in a network opens up new, amazing possibilities for neural networks. Using such networks, you can create neural networks that restore or complement signals. In other words, such neural networks have the properties short term memory(like a person).

    Feedback networks (Recurrent neural network) - artificial neural networks in which the output of a neuron can be fed back to its input. More generally, this means the ability to propagate a signal from outputs to inputs.

    Neural network training

    Now let's look at the issue of training a neural network in a little more detail. What is it? And how does this happen?

    What is network training?

    An artificial neural network is a collection of artificial neurons. Now let's take, for example, 100 neurons and connect them to each other. It is clear that when we apply a signal to the input, we will get something meaningless at the output.

    This means we need to change some network parameters until input signal is not converted into the output we need.

    What can we change in a neural network?

    Changing the total number of artificial neurons makes no sense for two reasons. Firstly, increasing the number of computing elements as a whole only makes the system heavier and more redundant. Secondly, if you gather 1000 fools instead of 100, they still won’t be able to answer the question correctly.

    The adder cannot be changed, since it performs one strictly defined function - adding. If we replace it with something or remove it altogether, then it will no longer be an artificial neuron at all.

    If we change the activation function of each neuron, we will get a neural network that is too heterogeneous and uncontrollable. In addition, in most cases, neurons in neural networks are of the same type. That is, they all have the same activation function.

    There is only one option left - change connection weights.

    Neural network training (Training)- search for such a set of weighting coefficients in which the input signal, after passing through the network, is converted into the output we need.

    This approach to the term “neural network training” also corresponds to biological neural networks. Our brain consists of a huge number of neural networks connected to each other. Each of them individually consists of neurons of the same type (the activation function is the same). We learn by changing synapses - elements that strengthen / weaken the input signal.

    However there is one more important point. If you train a network using only one input signal, then the network will simply “remember the correct answer.” From the outside it will seem that she “learned” very quickly. And as soon as you give a slightly modified signal, expecting to see the correct answer, the network will produce nonsense.

    In fact, why do we need a network that detects a face in only one photo? We expect the network to be able generalize some signs and recognize faces in other photographs too.

    It is for this purpose that they are created training samples.

    Training set (Training set) - a finite set of input signals (sometimes together with the correct output signals) from which the network is trained.

    After the network is trained, that is, when the network produces correct results for all input signals from the training set, it can be used in practice.

    However, before launching a freshly baked neural network into battle, the quality of its work is often assessed on the so-called test sample.

    Test sample (Testing set) - a finite set of input signals (sometimes together with the correct output signals) by which the quality of network operation is assessed.

    We understood what “network training” is – choosing the right set of weights. Now the question arises - how can you train a network? In the most general case, there are two approaches that lead to different results: supervised learning and unsupervised learning.

    Tutored training

    The essence of this approach is that you give a signal as an input, look at the network’s response, and then compare it with a ready-made, correct response.

    Important point. Do not confuse the correct answers with the known solution algorithm! You can trace the face in the photo with your finger (correct answer), but you won’t be able to tell how you did it (well-known algorithm). The situation is the same here.

    Then, using special algorithms, you change the weights of the neural network connections and again give it an input signal. You compare its answer with the correct one and repeat this process until the network begins to respond with acceptable accuracy (as I said in Chapter 1, the network cannot give unambiguously accurate answers).

    Tutored training (Supervised learning) is a type of network training in which its weights are changed so that the network’s answers differ minimally from the already prepared correct answers.

    Where can I get the correct answers?

    If we want the network to recognize faces, we can create a training set of 1000 photos (input signals) and independently select faces from it (correct answers).

    If we want the network to predict price increases/declines, then the training sample must be made based on past data. As input signals, you can take certain days, the general state of the market and other parameters. And the correct answers are the rise and fall of prices in those days.

    It is worth noting that the teacher, of course, is not necessarily a person. The fact is that sometimes the network has to be trained for hours and days, making thousands and tens of thousands of attempts. In 99% of cases, this role is performed by a computer, or more precisely, a special computer program.

    Unsupervised learning

    Unsupervised learning is used when we do not have the correct answers to the input signals. In this case, the entire training set consists of a set of input signals.

    What happens when the network is trained in this way? It turns out that with such “training” the network begins to distinguish classes of signals supplied to the input. In short, the network begins clustering.

    For example, you are demonstrating candy, pastries and cakes to the network. You do not regulate the operation of the network in any way. You simply feed its inputs data about this object. Over time, the network will begin to produce signals of three different types, which are responsible for the objects at the input.

    Unsupervised learning (Unsupervised learning) is a type of network training in which the network independently classifies input signals. The correct (reference) output signals are not shown.

    Conclusions

    In this chapter, you learned everything about the structure of an artificial neuron, as well as a thorough understanding of how it works (and its mathematical model).

    Moreover, you now know about various types artificial neural networks: single-layer, multi-layer, as well as feedforward networks and networks with feedback.

    You also learned about supervised and unsupervised network learning.

    You already know the necessary theory. Subsequent chapters include consideration of specific types of neural networks, specific algorithms for their training, and programming practice.

    Questions and tasks

    You should know the material in this chapter very well, since it contains basic theoretical information on artificial neural networks. Be sure to achieve confident and correct answers to all the questions and tasks below.

    Describe the simplifications of ANNs compared to biological neural networks.

    1. The complex and intricate structure of biological neural networks is simplified and represented in the form of diagrams. Only the signal processing model is left.

    2. The nature of electrical signals in neural networks is the same. The only difference is their size. We remove electrical signals, and instead we use numbers indicating the magnitude of the transmitted signal.

    The activation function is often denoted by ​\(\phi(net) \) ​.

    Write down a mathematical model of an artificial neuron.

    An artificial neuron with ​\(n \) ​ inputs converts an input signal (number) into an output signal (number) as follows:

    \[ out=\phi\left(\sum\limits^n_(i=1)x_iw_i\right) \]

    What is the difference between single-layer and multi-layer neural networks?

    Single-layer neural networks consist of a single computational layer of neurons. The input layer sends signals directly to the output layer, which converts the signal and immediately produces the result.

    Multilayer neural networks, in addition to input and output layers, also have hidden layers. These hidden layers carry out some internal intermediate transformations, similar to the stages of production of products in a factory.

    What is the difference between feedforward networks and feedback networks?

    Feedforward networks allow the signal to pass in only one direction - from inputs to outputs. Networks with feedback do not have these restrictions, and the outputs of neurons can be fed back into the inputs.

    What is a training set? What is its meaning?

    Before using the network in practice (for example, to solve current problems for which you do not have answers), you need to collect a collection of problems with ready-made answers, on which to train the network. This collection is called the training set.

    If you collect too small a set of input and output signals, the network will simply remember the answers and the learning goal will not be achieved.

    What is meant by network training?

    Network training is the process of changing the weighting coefficients of the artificial neurons of the network in order to select a combination of them that converts the input signal into the correct output.

    What is supervised and unsupervised learning?

    When training a network with a teacher, signals are given to its inputs, and then its output is compared with a previously known correct output. This process is repeated until the required accuracy of answers is achieved.

    If networks only supply input signals, without comparing them with ready outputs, then the network begins to independently classify these input signals. In other words, it performs clustering of input signals. This type of learning is called unsupervised learning.

    Now that it has become clear what exactly we want to build, we can move on to the question “how to build such a neural network.” This issue is resolved in two stages: 1. Selecting the type (architecture) of the neural network. 2. Selection of weights (training) of the neural network. At the first stage, we should choose the following: * which neurons we want to use (number of inputs, transfer functions); * how to connect them together; * what to take as inputs and outputs of a neural network. This task at first glance seems immense, but, fortunately, we do not have to invent a neural network from scratch - there are several dozen different neural network architectures, and the effectiveness of many of them has been mathematically proven. The most popular and studied architectures are multilayer perceptron, general regression neural network, Kohonen neural networks and others. You can soon read about all these architectures in a special section of this textbook.

    At the second stage, we should “train” the selected neural network, that is, select such values ​​of its weights so that it works in the desired way. An untrained neural network is like a child - it can be taught anything. In neural networks used in practice, the number of weights can be several tens of thousands, so training is a truly complex process. For many architectures, special learning algorithms have been developed that allow you to configure the weights of the neural network in a certain way. The most popular of these algorithms is the method backpropagation Error Back Propagation, used, for example, to train a perceptron.

    Training neural networks

    To train a neural network means to tell it what we want from it. This process is very similar to teaching a child the alphabet. After showing the child a picture of the letter “A”, we ask him: “What letter is this?” If the answer is incorrect, we tell the child the answer we would like him to give: “This is the letter A.” The child remembers this example along with the correct answer, that is, some changes occur in his memory in the right direction. We will repeat the process of presenting the letters over and over again until all 33 letters are firmly memorized. This process is called “supervised learning.”

    When training a neural network, we act in exactly the same way. We have some database containing examples (a set of handwritten images of letters). By presenting the image of the letter “A” to the input of the neural network, we receive from it some answer, not necessarily correct. We also know the correct (desired) answer - in in this case we would like the signal level to be maximum at the output of the neural network labeled “A”. Typically, the desired output in a classification problem is the set (1, 0, 0, ...), where 1 is at the output labeled “A”, and 0 is at all other outputs. By calculating the difference between the desired response and the actual response of the network, we get 33 numbers - the error vector. The error backpropagation algorithm is a set of formulas that allows one to calculate the required corrections for the weights of a neural network using the error vector. We can present the same letter (as well as different images of the same letter) to the neural network many times. In this sense, learning is more like repeating exercises in sports - training.

    It turns out that after repeated presentation of examples, the weights of the neural network stabilize, and the neural network gives correct answers to all (or almost all) examples from the database. In this case, they say that “the neural network has learned all the examples,” “the neural network is trained,” or “the neural network is trained.” IN software implementations you can see that during the learning process, the magnitude of the error (the sum of squared errors over all outputs) gradually decreases. When the error reaches zero or an acceptable small level, the training is stopped, and the resulting neural network is considered trained and ready for use on new data. It is important to note that all the information that a neural network has about a problem is contained in a set of examples. Therefore, the quality of neural network training directly depends on the number of examples in the training set, as well as on how completely these examples describe the given task.

    For example, it makes no sense to use a neural network to predict a financial crisis if crises are not represented in the training set. It is believed that to fully train a neural network, at least several dozen (and better than hundreds) examples. Let us repeat once again that training neural networks is a complex and knowledge-intensive process. Neural network training algorithms have various parameters and settings, the management of which requires an understanding of their influence.


    Once the neural network is trained, we can use it to solve useful problems. The most important feature of the human brain is that, once it has learned a certain process, it can act correctly in situations in which it was not exposed during the learning process. For example, we can read almost any handwriting, even if we see it for the first time in our lives. Likewise, a neural network that has been properly trained can, with a high probability, correctly respond to new data that has not been previously presented to it. For example, we can draw the letter "A" in a different handwriting and then ask our neural network to classify the new image. The weights of the trained neural network store quite a lot of information about the similarities and differences of letters, so you can count on the correct answer for a new version of the image. Examples of ready-made neural networks

    The processes of learning and applying neural networks described above can be seen in action right now. Ward Systems Group has prepared several simple programs, which are written based on the NeuroWindows library. Each of the programs allows the user to independently specify a set of examples and train a specific neural network on this set. Then you can offer new examples to this neural network and observe its work.

    MODERN INFORMATION TECHNOLOGIES/2. Computer Science and Programming

    Zolotukhina Irina Andreevna, master's student

    Kostanay State University named after A. Baitursynov, Kazakhstan.

    Methods and algorithms for training neural networks.

    Annotation:This article analyzes neural networks, why they are so relevant, consider the types of neural network algorithms, and the areas of application of networks.

    Key words:neuron, perceptron, Rosenblatt method, Hebb method, generalization error, learning error, learning algorithm.

    Neural networks(or artificial neural networks) is one of the most interesting areas of research in the field artificial intelligence based on simulation and reproduction nervous system person. Scientists are especially interested in such processes as: the ability of the nervous system to learn, correct errors, and make decisions, which should allow us to simulate the functioning of the human brain.

    Artificial neural networks learn by analyzing positive and negative influences. They consist of neurons, which are named by analogy with their biological prototype.

    The model of an artificial neuron was first proposed by American scientists Warren McCulloch and his student Walter Pitts in 1943.

    Depending on the functions performed by neurons in the network, three types can be distinguished:

    · input neurons to which a vector encoding an input effect or image of the external environment is supplied; they usually do not carry out computational procedures;

    · intermediate neurons that form the basis of neural networks, transformations in which are performed according to expressions (1) and (1.1);

    · output neurons, whose output values ​​represent the outputs of the neural network; transformations in them are also carried out according to expressions (1) and (1.1).

    Figure 1. Structure of a formal neural network

    (1)

    y = f(s)(1.1)

    Where

    · w i, – weight of the synapse, i = 1...n;

    · b– offset value;

    · s– summation result;

    · x, – component of the input vector (input signal),

    · x i = 1...n;

    · at– neuron output signal;

    · n– number of neuron inputs;

    · f– nonlinear transformation (activation function).

    To input signal (s) nonlinear converter responds with an output signal f(s), which represents the output of the neuron.

    Advantages of the neural network approach when solving problems information technology unlike others (such as von Neumann architecture):

    · parallelism of information processing;

    · uniform and effective teaching principle;

    · reliability of operation;

    · ability to solve informal problems.

    Applications and problems solved by artificial neural networks

    Artificial neural networks have found their application in various fields of technology. Further increases in computer performance are increasingly associated with the development of these networks, in particular, with neurocomputers, which are based on an artificial neural network.

    Range of problems solved by neural networks:

    · pattern recognition;

    · speech recognition and synthesis;

    · aerospace image recognition;

    · signal processing in the presence of high noise;

    · forecasting;

    · optimization;

    · forecasting securities quotes and exchange rates;

    · credit card fraud prevention;

    · game on the stock exchange;

    · spam filtering;

    · real estate valuation;

    · assessing the financial condition of enterprises and the risk of non-repayment of loans;

    · radar signal processing;

    · security and video surveillance systems;

    · traffic control on expressways and railways;

    · diagnostics in medicine;

    · management of complex objects;

    · extracting knowledge from large volumes of data in business, finance and scientific research;

    · real-time control and that's not all.

    Education

    Training refers to improving system performance by analyzing input data. Moreover, training takes place according to certain rules.

    There are two main approaches to learning: “with a teacher” and “without a teacher” (self-learning). At training with a teacher the neural network has the correct answers (network outputs) for each input example. Together they are called a training pair. The weights are adjusted so that the network produces answers as close as possible to the known correct answers, minimizing error. The vectors of the training set are presented sequentially, errors are calculated and weights are adjusted for each vector until the error across the entire training set reaches an acceptable level. Unsupervised learning does not require knowledge of the correct answers to each example of the training sample. This reveals the internal data structure or dependencies between patterns in the data system, allowing patterns to be categorized.

    Tutored training

    Many input neurons are supplied to the input of the artificial neural networkX- input vector for the trained neural network.

    Let's define the error function E. Usually this is the mean square error,

    ,

    Where

    · P- number of examples processed by the neural network;

    · y i-exit;

    · d i- desired (ideal) output of the neural network.

    The procedure for training a neural network is reduced to the procedure for correcting connection weights. The purpose of the weight correction procedure is to minimize the error function E.

    General scheme of training with a teacher:

    1 Before starting training, weight coefficients are set in some way, for example - randomly.

    2 At the first stage, training examples are submitted to the input in a certain order. At each iteration, the error is calculated for the training example E L(learning error) and the weights are corrected using a certain algorithm. The purpose of the weight correction procedure is to minimize the error E L.

    3 At the second stage of training, the correct operation is checked. Test cases are submitted to the input in a certain order. At each iteration, the error for the test case is calculated E G(generalization error is an error that the trained model shows on examples that did not participate in the training process). If the result is unsatisfactory, then many training examples are modified and the training cycle is repeated.

    If after several iterations of the learning algorithm the learning error E L drops almost to zero, while the generalization error E G decreases at the beginning and then begins to increase, then this is a sign of the retraining effect. In this case, training must be stopped.

    Figure 2. Overfitting effect

    Based on this algorithm, neural network training is based on the Rosenblatt method.

    Rosenblatt method

    This method was proposed by F. Rosenblatt in the 60s XX century For a neural network called a perceptron. The perceptron has a threshold activation function; its diagram is shown in Fig. 1.

    Figure 3. Single-layer perceptron

    The Rosenblatt training procedure for a single-layer perceptron can be represented as follows:

    ,

    Where

    · x i- i-th input of the neural network;

    · d j- desired (ideal) j-th output of the neural network;

    · a - coefficient (learning speed) 0< a≤1

    The weighting coefficients are changed only if the actual output value does not match the ideal output value. Below is a description of the perceptron training algorithm.

    1. We set all weights equal to zero.

    2. We carry out a cycle of presenting examples. For each example, the following procedure is performed.

    2.1. If the network gave the correct answer, then go to step 2.4.

    2.2. If a unit was expected at the output of the perceptron, but a zero was received, then the weights of the connections through which the unit signal passed are reduced by one.

    2.3. If a zero was expected at the output of the perceptron, but a unit was received, then the weights of the connections through which a single signal passed are increased by one.

    2.4. Let's move on to the next example. If the end of the training set is reached, then go to step 3, otherwise we return to step 2.1.

    3. If during the execution of the second step of the algorithm, step 2.2 or 2.3 was performed at least once and no looping occurred, then proceed to step 2. Otherwise, the training is completed.

    This algorithm does not have a learning loop tracking mechanism. This mechanism can be implemented in different ways. The most economical in terms of using additional memory is as follows.

    4. k=1; m=0. We remember the weights of the connections.

    5. After the cycle of image presentations, we compare the weights of the connections with the remembered ones. If the current weights coincide with the remembered ones, then a loop occurs. Otherwise, go to step 3.

    6. m=m+1. If m<k, then move on to the second step.

    7. k=2k; m=0. We remember the weights of the connections and move on to step 2.

    Since the length of the cycle is finite, then for a sufficiently large k looping will be detected.

    Learning without a teacher.

    The main feature that makes learning without a teacher attractive is its “independence.” The learning process, as in the case of supervised learning, consists of adjusting the weights of synapses. Some algorithms, however, change the structure of the network, that is, the number of neurons and their relationships, but such transformations would be more correctly called a broader term - self-organization, and they will not be considered within the scope of this article. Obviously, the adjustment of synapses can only be carried out on the basis of the information available in the neuron, that is, its state and existing weight coefficients. Based on this consideration and, more importantly, by analogy with the well-known principles of self-organization of nerve cells, Hebbian learning algorithms were constructed.

    Essentially, Hebb proposed that the synaptic connection between two neurons is strengthened if both neurons are excited. This can be thought of as the strengthening of a synapse according to the correlation of the levels of excitatory neurons connected by a given synapse. For this reason, the Hebb learning algorithm is sometimes called a correlation algorithm.

    The idea of ​​the algorithm is expressed by the following equality:

    ,

    Where

    · yi(n-1)– output value of the neuron i layer (n-1),

    · yj(n)– output value of the neuron j layer n;

    · wij(t)and w ij(t-1) is the weighting coefficient of the synapse connecting these neurons at iterations t And t-1 respectively;

    · a– learning rate coefficient.

    There is also a differential Hebbian learning method, represented by the formula

    ,(2)

    Here y i (n-1) (t) And y i (n-1) (t-1)– output value of neuron i of layer n-1, respectively, at iterations t and t-1;

    y j (n) (t)And y j (n) (t-1)– the same for the neuron j layer n.

    As can be seen from formula (2), the synapses that connect those neurons whose outputs have most dynamically changed towards an increase learn most strongly.

    The complete learning algorithm using the above formulas will look like this:

    1. At the initialization stage, all weighting coefficients are assigned small random values.

    2. An input image is supplied to the network inputs, and excitation signals are distributed across all layers according to the principles of classical feedforward networks, that is, for each neuron, a weighted sum of its inputs is calculated, to which the activation (transfer) function of the neuron is then applied, resulting in its output value is obtained yi(n), i=0...M i-1, where M i– number of neurons in a layer i; n=0...N-1, a N– number of layers in the network.

    3. Based on the obtained output values ​​of neurons, according to formula (1) or (2), the weight coefficients are changed.

    4. Loop from step 2 until the network output values ​​stabilize with the specified accuracy. The use of this new method for determining the completion of training, which is different from that used for the backpropagation network, is due to the fact that the synapse values ​​that can be adjusted are essentially unlimited.

    At the second step of the cycle, all images from the input set are presented alternately.

    It should be noted that the type of responses to each class of input images is not known in advance and will be an arbitrary combination of states of neurons in the output layer, due to the random distribution of weights at the initialization stage. At the same time, the network is able to generalize similar images, classifying them into the same class. Testing the trained network allows you to determine the topology of the classes in the output layer. To bring the responses of the trained network to a convenient representation, you can supplement the network with one layer, which, for example, according to the training algorithm of a single-layer perceptron, must be forced to map the output reactions of the network into the required images.

    It should be noted that unsupervised learning is much more sensitive to the choice of optimal parameters than supervised learning. Firstly, its quality strongly depends on the initial values ​​of synapses. Secondly, learning is critical to the choice of learning radius and the rate of its change. And finally, of course, the nature of the change in the learning coefficient itself is very important. In this regard, the user will most likely need to carry out preliminary work to select optimal network training parameters.

    Despite some implementation difficulties, unsupervised learning algorithms have found widespread and successful application. In fact, the most complex artificial neural networks known today - the cognitron and neocognitron - also operate according to the unsupervised learning algorithm. They coped very well with the task of recognizing images subject to displacements in position, noise, and shape distortion. However, the neocognitron could not cope with the task when the image was rotated by a certain angle.

    In conclusion, we can say that scientists are currently researching artificial neural networks and the stability of certain configurations, but not all problems can be solved by neural networks. Despite the fact that an artificial neuron is a model of a biological neuron, it is far from perfect and requires significant work and new discoveries in the field of artificial intelligence. Neural networks are not able to learn like humans.However, based on the above material, it is possible to create real-life systems for pattern recognition, information compression, automated control, expert assessments and much more.

    Literature:

    1. V.V. Kruglov, V.V. Borisov “Artificial neural networks”, 2002.

    2. Rumelhart D. E., Hinton G. E., Williams R. D. Learning internal reprentation by error propagation in parallel distributed processing. – Cambrige: MA: MIT Press, 1986. – 91 p.

    The most important property of neural networks is their ability to learn from environmental data and improve their performance as a result of learning. Productivity increases occur over time according to certain rules. The neural network is trained through an interactive process of adjusting synaptic weights and thresholds. Ideally, a neural network gains knowledge about its environment at each iteration of the learning process.

    There are quite a few types of activities associated with the concept of learning, so it is difficult to give this process an unambiguous definition. Moreover, the learning process depends on the point of view on it. This is what makes it almost impossible for any precise definition of this concept to emerge. For example, the learning process from the point of view of a psychologist is fundamentally different from learning from the point of view of a school teacher. From a neural network perspective, the following definition can probably be used:

    Training is a process in which the free parameters of a neural network are tuned by simulating the environment in which the network is embedded. The type of training is determined by the way these parameters are adjusted.

    This definition of the neural network learning process assumes the following sequence of events:

    1. The neural network receives stimuli from the external environment.
    2. As a result of the first point, the free parameters of the neural network are changed.
    3. After changing the internal structure, the neural network responds to excitations in a different way.

    The above list of clear rules for solving the problem of training a neural network is called a learning algorithm. It is easy to guess that there is no universal learning algorithm suitable for all neural network architectures. There is only a set of tools represented by a variety of learning algorithms, each of which has its own advantages. Learning algorithms differ from each other in the way they adjust the synaptic weights of neurons. Another distinctive characteristic is the way the trained neural network communicates with the outside world. In this context, we talk about a learning paradigm associated with the model of the environment in which a given neural network operates.

    There are two conceptual approaches to training neural networks: supervised learning and unsupervised learning.

    Supervised training of a neural network assumes that for each input vector from the training set there is a required value of the output vector, called the target. These vectors form a training pair. The network weights are changed until an acceptable level of deviation of the output vector from the target is obtained for each input vector.

    Unsupervised learning of a neural network is a much more plausible learning model in terms of the biological roots of artificial neural networks. The training set consists only of input vectors. The neural network training algorithm adjusts the network weights so that consistent output vectors are obtained, i.e. so that the presentation of sufficiently close input vectors gives identical outputs.

    Neural network without feedback - perceptron

    Tasks for neural networks

    Most problems for which neural networks are used can be considered as special cases of the following main problems.

    · Approximation - constructing a function from a finite set of values ​​(for example, time series forecasting)

    · Building relationships on a variety of objects (for example, problems of recognizing images and sound signals).

    · Distributed information retrieval and associative memory (for example, tasks of finding implicit dependencies in large data sets).

    · Filtering (for example, identifying signal changes that are “visible to the naked eye” but difficult to describe analytically).

    · Information compression (for example, neural network implementations of compression algorithms for sounds, static and dynamic images).

    · Identification and management of dynamic systems.


    The multi-layer neural network with multiple outputs shown in the figure below is a perceptron.

    The circuit can be supplemented with an adder, which, if necessary, combines the output signals of neurons into one common output.

    The number of layers in a perceptron can vary, depending on the complexity of the problem. It has been mathematically proven (Kolmogorov’s theorem) that three full-fledged neural layers are enough to approximate any mathematical function (provided that it is possible to unlimitedly increase the number of neurons in the hidden layer).

    The perceptron operates in a discrete time mode - a static set of signals (input vector) is applied to the input, the combined state of the outputs (output vector) is assessed, then the next vector is applied to the input, etc. It is assumed that the signal in the perceptron propagates from input to output instantly , i.e., there are no time delays in signal transmission from neuron to neuron, from layer to layer and associated dynamic transient processes. Since the perceptron has no feedback (neither positive nor negative), then at each moment of time any input vector of values ​​uniquely corresponds to a certain output vector, which will not change as long as the NS inputs remain unchanged.

    Perceptron theory is the basis for many other types of artificial neural networks, and perceptrons themselves are the logical starting point for the study of artificial neural networks.

    To train a neural network means to tell it what we want from it. This process is very similar to teaching a child the alphabet. After showing the child a picture of the letter “A”, we ask him: “What letter is this?” If the answer is incorrect, we tell the child the answer we would like him to give: “This is the letter A.” The child remembers this example along with the correct answer, that is, some changes occur in his memory in the right direction. We will repeat the process of presenting the letters over and over again until all 33 letters are firmly memorized. This process is called "training with a teacher" .

    When training a neural network, we act in exactly the same way. Suppose we have a table - a database containing examples (a coded set of images of letters). By presenting an image of the letter “A” to the input of the neural network, we expect (ideally) that the signal level will be maximum (=1) at the output OUT1 (A is letter No. 1 in the alphabet of 33 letters) and minimum (=0) .

    So the table called learning set , will look like (as an example, only the first line is filled):

    Letter Login vector Desired output vector
    X1 X2 X12 TARGET1 TARGET2 TARGET33
    A
    B
    Yu
    I

    Collection of vectors for each example of the training set (table rows) is called teaching pair .

    In practice, an untrained neural network will not perform as we would ideally expect, that is, for all or most examples, the error vectors will contain elements significantly different from zero.

    Neural network training algorithm is a set of mathematical actions that allows, using the error vector, to calculate such corrections for the weights of the neural network so that the total error (to control the learning process, the sum of squared errors over all outputs is usually used) decreases. Applying these actions over and over again, we achieve a gradual reduction in the error for each example (A, B, C, etc.) of the training set.

    After such cyclic repeated adjustment of the weights, the neural network will give correct (or almost correct) answers to all (or almost all) examples from the database, i.e., the total error values ​​will reach zero or an acceptable small level for each training pair. In this case, they say that the “neural network is trained,” i.e., it is ready for use on new, unknown in advance , data.

    In general, the supervised learning algorithm will look like this:

    1. Initialize synaptic weights with small random values.

    2. Select the next training pair from the training set; submit the input vector to the network input.

    3. Calculate the network output.

    4. Calculate the difference between the network output and the required output (target vector of the training pair).

    5. Adjust the network weights to minimize errors.

    6. Repeat steps 2 to 5 for each pair of the training set until the error on the entire set reaches an acceptable level.

    The specific type of mathematical operations performed at stage 5 determines the type of learning algorithm. For example, for single-layer perceptrons, a simple algorithm is used, based on the so-called. delta rule , for perceptrons with any number of layers is widely used backpropagation procedure , there is a known group of algorithms with interesting properties called stochastic learning algorithms etc. All known algorithms for training neural networks are essentially varieties of gradient methods for optimizing a nonlinear function of many variables. The main problem that arises during their practical implementation is that you can never know for sure that the resulting combination of synaptic weights is really the most effective in terms of minimizing the total error over the entire training set. This uncertainty is called the “local minima problem of the goal function.”

    In this case, the goal function is understood as the selected integral scalar indicator , characterizing the quality of processing by the neural network of all examples of the training set - for example, the sum of standard deviations OUT from TARGET for each training pair. The lower the achieved value of the goal function, the higher the quality of the neural network on a given training set. Ideally (in practice, achievable only for the simplest problems), it is possible to find a set of synaptic weights such that .

    The target surface of a complex network is highly rugged and consists of hills, valleys, folds and ravines in high-dimensional space. A network trained using the gradient method can fall into a local minimum (shallow valley) when there is a much deeper minimum nearby. At a local minimum point, all directions point upward, and the algorithm is unable to escape from it.

    Thus, if, as a result of an attempt to train a neural network, the required accuracy was not achieved, then the researcher faces two alternatives:

    1. Assume that the process is trapped in a local minimum and try to apply some other type of learning algorithm for the same network configuration.

    2. Assume that a global minimum of the goal function has been found for a given specific network configuration and try to complicate the network - increase the number of neurons, add one or more layers, move from a fully connected to a partially connected network, taking into account a priori known dependencies in the structure of the training set, etc.

    Algorithms called learning without a teacher . In this case, the network is tasked with independently finding groups of input vectors “similar to each other” in the presented set of examples, generating a high level at one of the outputs (without determining in advance which one). But even with this formulation of the problem, the problem of local minima also occurs, although in an implicit form, without a strict mathematical definition of the goal function (since the very concept of the goal function implies the presence of a given reference response of the network, i.e., a “teacher”) - “Has the neural network really learned to select clusters of input vectors in the best possible way for this particular configuration?”