• Neural networks are a mathematical apparatus. Neural networks: what they are and how brands use them

    In the first half of 2016, the world heard about many developments in the field of neural networks - Google (Go network player AlphaGo), Microsoft (a number of services for image identification), startups MSQRD, Prisma and others demonstrated their algorithms.

    Bookmarks

    The editors of the site tell you what neural networks are, what they are needed for, why they have taken over the planet right now, and not years earlier or later, how much you can earn from them and who the main market players are. Experts from MIPT, Yandex, Mail.Ru Group and Microsoft also shared their opinions.

    What are neural networks and what problems can they solve?

    Neural networks- one of the directions in the development of artificial intelligence systems. The idea is to simulate as closely as possible the work of a human nervous system- namely, her ability to learn and correct mistakes. This is main feature any neural network - it is capable of learning independently and acting on the basis of previous experience, making fewer and fewer mistakes each time.

    The neural network imitates not only the activity, but also the structure of the human nervous system. Such a network consists of a large number of individual computing elements (“neurons”). In most cases, each “neuron” belongs to a specific layer of the network. The input data is sequentially processed at all layers of the network. The parameters of each “neuron” can change depending on the results obtained on previous sets of input data, thus changing the order of operation of the entire system.

    The head of the Mail.ru Search department at Mail.Ru Group, Andrey Kalinin, notes that neural networks are capable of solving the same problems as other machine learning algorithms, the difference lies only in the approach to training.

    All tasks that neural networks can solve are somehow related to learning. Among the main areas of application of neural networks are forecasting, decision making, pattern recognition, optimization, and data analysis.

    Director of technological cooperation programs at Microsoft in Russia, Vlad Shershulsky, notes that neural networks are now used everywhere: “For example, many large Internet sites use them to make reactions to user behavior more natural and useful to their audience. Neural networks underlie most modern systems speech recognition and synthesis, as well as image recognition and processing. They are used in some navigation systems, be it industrial robots or driverless cars. Algorithms based on neural networks protect information systems from attacks by intruders and help identify illegal content online."

    In the near future (5-10 years), Shershulsky believes, neural networks will be used even more widely:

    Imagine an agricultural combine, the actuators of which are equipped with many video cameras. It takes five thousand pictures per minute of each plant in its trajectory and, using a neural network, analyzes whether it is a weed, whether it is affected by disease or pests. And each plant is treated individually. Fiction? Not really anymore. And in five years it may become the norm. - Vlad Shershulsky, Microsoft

    Mikhail Burtsev, head of the laboratory of neural systems and deep learning at the MIPT Center for Living Systems, provides a tentative map of the development of neural networks for 2016-2018:

    • systems for recognizing and classifying objects in images;
    • voice interaction interfaces for the Internet of things;
    • service quality monitoring systems in call centers;
    • systems for identifying problems (including predicting maintenance time), anomalies, cyber-physical threats;
    • intellectual security and monitoring systems;
    • replacing some of the functions of call center operators with bots;
    • video analytics systems;
    • self-learning systems that optimize control material flows or location of objects (in warehouses, transport);
    • intelligent, self-learning control systems production processes and devices (including robotic ones);
    • the emergence of universal on-the-fly translation systems for conferences and personal use;
    • emergence of bot consultants technical support or personal assistants with functions similar to a person.

    Director of Technology Distribution at Yandex Grigory Bakunov believes that the basis for the spread of neural networks in the next five years will be the ability of such systems to make various decisions: “The main thing that neural networks now do for a person is to save him from unnecessary decision-making. So they can be used almost anywhere where not very intelligent decisions are made by a living person. In the next five years, it is this skill that will be exploited, which will replace human decision-making with a simple machine.”

    Why have neural networks become so popular right now?

    Scientists have been developing artificial neural networks for more than 70 years. The first attempt to formalize a neural network dates back to 1943, when two American scientists (Warren McCulloch and Walter Pitts) presented an article on the logical calculus of human ideas and neural activity.

    However, until recently, says Andrey Kalinin from Mail.Ru Group, the speed of neural networks was too low for them to become widespread, and therefore such systems were mainly used in developments related to computer vision, and in other areas other algorithms were used machine learning.

    A labor-intensive and time-consuming part of the neural network development process is its training. In order for a neural network to correctly solve the assigned problems, it is required to “run” its work on tens of millions of sets of input data. It is with the advent of various accelerated learning technologies that Andrei Kalinin and Grigory Bakunov associate the spread of neural networks.

    The main thing that has happened now is that various tricks have appeared that make it possible to create neural networks that are much less susceptible to retraining. - Grigory Bakunov, Yandex

    “Firstly, a large and publicly available array of labeled images (ImageNet) has appeared on which you can learn. Secondly, modern video cards make it possible to train neural networks and use them hundreds of times faster. Thirdly, ready-made, pre-trained neural networks have appeared that recognize images, on the basis of which you can create your own applications without having to spend a long time preparing the neural network for work. All this ensures a very powerful development of neural networks specifically in the field of image recognition,” notes Kalinin.

    What is the size of the neural network market?

    “Very easy to calculate. You can take any field that uses low-skill labor, such as call center agents, and simply subtract all human resources. I would say that we are talking about a multi-billion dollar market, even within a single country. It is easy to understand how many people in the world are employed in low-skilled jobs. So, even speaking very abstractly, I think we are talking about a hundred billion dollar market all over the world,” says Grigory Bakunov, director of technology distribution at Yandex.

    According to some estimates, more than half of the professions will be automated - this is the maximum volume by which the market for machine learning algorithms (and neural networks in particular) can be increased. - Andrey Kalinin, Mail.Ru Group

    “Machine learning algorithms are the next step in automating any processes, in the development of any software. Therefore, the market at least coincides with the entire software market, but rather exceeds it, because it becomes possible to make new intelligent solutions that are inaccessible to old software,” continues Andrey Kalinin, head of the Mail.ru Search department at Mail.Ru Group.

    Why neural network developers create mobile applications for the mass market

    In the last few months, several high-profile entertainment projects using neural networks have appeared on the market - this is a popular video service, which is a social Facebook network, And Russian applications for image processing (investments from Mail.Ru Group in June) and others.

    The abilities of their own neural networks were demonstrated by both Google (AlphaGo technology won against the champion in Go; in March 2016, the corporation sold at auction 29 paintings drawn by neural networks, etc.), and Microsoft (the CaptionBot project, which recognizes images in photographs and automatically generates captions for them ; the WhatDog project, which determines the breed of a dog from a photograph; the HowOld service, which determines the age of a person in a photo, and so on), and Yandex (in June, the team integrated a service for recognizing cars in photographs into the Avto.ru application; presented a musical recording recorded by neural networks album; in May she created the LikeMo.net project for drawing in the style of famous artists).

    Such entertainment services are created not to solve global problems, which neural networks are aimed at, but to demonstrate the capabilities of a neural network and conduct its training.

    "Games - characteristic feature our behavior as a species. On the one hand, game situations can be used to simulate almost all typical scenarios of human behavior, and on the other hand, game creators and, especially, players can get a lot of pleasure from the process. There is also a purely utilitarian aspect. A well-designed game not only brings satisfaction to the players: as they play, they train the neural network algorithm. After all, neural networks are based on learning by example,” says Vlad Shershulsky from Microsoft.

    “First of all, this is done to show the capabilities of the technology. There is really no other reason. If we are talking about Prisma, then it is clear why they did it. The guys built some kind of pipeline that allows them to work with pictures. To demonstrate this, they chose a fairly simple method of creating stylizations. Why not? This is just a demonstration of how the algorithms work,” says Grigory Bakunov from Yandex.

    Andrey Kalinin from Mail.Ru Group has a different opinion: “Of course, this is impressive from the public’s point of view. On the other hand, I wouldn't say that entertainment products can't be applied to more useful areas. For example, the task of stylizing images is extremely relevant for a number of industries (design, computer games, animation are just a few examples), and the full use of neural networks can significantly optimize the cost and methods of creating content for them.”

    Major players in the neural networks market

    As Andrey Kalinin notes, by and large, most of the neural networks on the market are not much different from each other. “Everyone’s technology is approximately the same. But using neural networks is a pleasure that not everyone can afford. To independently train a neural network and run many experiments on it, you need large training sets and a fleet of machines with expensive video cards. It is obvious that such opportunities exist large companies", he says.

    Among the main market players, Kalinin mentions Google and its division Google DeepMind, which created the AlphaGo network, and Google Brain. Microsoft has its own developments in this area - they are carried out by the Microsoft Research laboratory. The creation of neural networks is carried out at IBM, Facebook (a division of Facebook AI Research), Baidu (Baidu Institute of Deep Learning) and others. Many developments are being carried out at technical universities around the world.

    Yandex Technology Distribution Director Grigory Bakunov notes that interesting developments in the field of neural networks are also found among startups. “I would remember, for example, the company ClarifAI. This is a small startup, once made by people from Google. Now they are perhaps the best in the world at determining the content of a picture.” Such startups include MSQRD, Prisma, and others.

    In Russia, developments in the field of neural networks are carried out not only by startups, but also by large technology companies - for example, the Mail.Ru Group holding uses neural networks for processing and classifying texts in Search and image analysis. The company is also conducting experimental developments related to bots and conversational systems.

    Yandex is also creating its own neural networks: “Basically, such networks are already used in working with images and sound, but we are exploring their capabilities in other areas. Now we are doing a lot of experiments in using neural networks in working with text.” Developments are being carried out at universities: Skoltech, MIPT, Moscow State University, Higher School of Economics and others.

    Neural networks - class analytical methods, built on (hypothetical) principles of learning about thinking beings and the functioning of the brain, which make it possible to predict the values ​​of some shifts in new observations based on the results of other observations (for the same or other shifts) after going through the stage of so-called learning from the available data.

        1. Basic concepts about neural networks

    Most often, neural networks are used to solve the following problems:

      image classification - an indication that the input image, represented by a feature vector, belongs to one or more predefined classes;

      clustering - classification of images in the absence of a training sample with class labels;

      forecasting - predicting the value y(tn+1) with a known sequence y(t1), y(t2) ... y(tn);

      optimization - finding a solution that satisfies a system of constraints and maximizes or minimizes the objective function. Memory that is addressed by meaning (associative memory) - memory that is accessible when specified content is specified;

      control - calculation of such input influence on the system, for which the system operates along the desired trajectory.

    The structural basis of a neural network is a formal neuron. Neural networks arose from attempts to recreate the ability of biological systems to learn by modeling the low-root structure of the brain. To do this, the neural network model is based on an element that imitates, to a first approximation, the properties of a biological neuron - a formal neuron (hereinafter simply neuron). In the human body, neurons are special cells that are capable of transmitting electrochemical signals.

    A neuron has a branched structure for the input of information (dendrites), a nucleus and an output that branches (axon). When connected in a certain way, neurons form a neural network. Each neuron is characterized by a certain current state and has a group of synapses - unidirectional input connections connected to the outputs of other neurons, and also has an axon - the initial connection of a given neuron, after which the signal (disturbance or inhibition) is sent to the synapses of the following neurons (Fig. 8.1).

    Rice. 8.1. Structure of a formal neuron.

    Each synapse is characterized by the size of the synaptic connection or its weight wi, which is equivalent in physical content to electrical conductivity.

    The current state (activation level) of a neuron is determined if the weighted sum of its inputs is:

    (1)

    where a set of signals, designated x1, x2,..., xn, arrives at the input of the neuron, each signal increases by the corresponding weight w1, w2,...,wn, and forms its activation level - S. The output of the neuron is a function of its level activations:

    Y=f(S) (2)

    When neural networks operate, the principle of parallel signal processing is implemented. It is achieved by combining a large number of neurons into so-called layers and connecting in a certain way neurons of different layers, as well as, in some configurations, neurons of the same layer with each other, and the interaction of all neurons is processed layer by layer.

    R
    is. 8.2. Architecture of a neural network with n neurons in the input layer and three neurons in the source layer (single-layer perceptron).

    As an example of the simplest neural network, consider a single-layer perceptron with n neurons in the input layer and three neurons in the source layer (Fig. 8.2). When some signals arrive at n inputs, they pass through synapses to 3 source neurons. This system forms a single layer of the neural network and produces three initial signals:

    Obviously, all the weighting coefficients of the synapses of one layer of neurons can be combined into a matrix wj, each element of which wij specifies the value of the synaptic connection between the i-th neuron of the input layer and the j-th neuron of the source layer (3).

    (3)

    Thus, the process that occurs in a neural network can be written in matrix form:

    where x and y are the input and source vectors, respectively, f(v) is the activation function that is applied element-by-element to the components of vector v.

    The choice of neural network structure is carried out according to the characteristics and complexity of the task. Optimal configurations already exist for solving certain types of problems. If the problem cannot be reduced to any of the known types, the developer has to solve the difficult problem of synthesizing a new configuration.

    A possible classification of existing neural networks is:

    By type of input information:

      networks that analyze binary information;

      networks that operate with real numbers.

    By teaching method:

      networks that need to be trained before they can be used;

      networks that do not need previous training are able to learn on their own as they work.

    By the nature of information dissemination:

      unidirectional, in which information propagates only in one direction from one layer to another;

      recurrent networks, in which the original signal of an element can again arrive at this element and other network elements of this or the previous layer as an input signal.

    According to the method of converting input information:

      auto-associative;

      heteroassociative.

    Developing further the question of the possible classification of neural networks, it is important to note the existence of binary and analog networks. The former operate with binary signals, and the output of each neuron can take only two values: logical zero ("suspended" state) and logical one ("excited" state). Another classification divides neural networks into synchronous and asynchronous. In the first case, at each moment of time, only one neuron changes its state. In the second, the state changes immediately in a whole group of neurons, as a rule, in the entire layer.

    Networks can also be classified by the number of layers. In Fig. Figure 8.3 shows a two-layer perceptron derived from the perceptron in Fig. 8.2 by adding a second layer, which consists of two neurons.

    R
    is. 8.3. The architecture of a neural network with unidirectional signal propagation is a two-layer perceptron.

    If we consider the work of neural networks that solve the problem of image classification, then in general their work comes down to classifying (generalizing) input signals that belong to n-dimensional hyperspace into a certain number of classes. From a mathematical point of view, this occurs by dividing hyperspace into hyperplanes (notation for the case of a single-layer perceptron)

    , (5),

    Where k=1...m– class number.

    Each resulting scope is the scope of a separate class. The number of such classes for one perceptron-type neural network does not exceed 2m, where m is the number of network outputs. However, not all of them can be distributed by a given neural network.

    Good afternoon, my name is Natalia Efremova, and I am a research scientist at NtechLab. Today I will talk about the types of neural networks and their applications.

    First, I will say a few words about our company. The company is new, maybe many of you don’t yet know what we do. Last year we won the MegaFace competition. This is an international facial recognition competition. In the same year, our company was opened, that is, we have been on the market for about a year, even a little more. Accordingly, we are one of the leading companies in facial recognition and biometric image processing.

    The first part of my report will be directed to those who are unfamiliar with neural networks. I am directly involved deep learning. I have been working in this field for more than 10 years. Although it appeared a little less than a decade ago, there used to be some rudiments of neural networks that were similar to the deep learning system.

    Over the past 10 years, deep learning and computer vision have developed at an incredible pace. Everything that has been done that is significant in this area has happened in the last 6 years.

    I'll talk about practical aspects: where, when, what to use in terms of deep learning for image and video processing, for image and face recognition, since I work in a company that does this. I’ll tell you a little about emotion recognition and what approaches are used in games and robotics. I will also talk about the non-standard application of deep learning, something that is just emerging from scientific institutions and is still little used in practice, how it can be applied, and why it is difficult to apply.

    The report will consist of two parts. Since most are familiar with neural networks, first I will quickly cover how neural networks work, what biological neural networks are, why it is important for us to know how it works, what artificial neural networks are, and what architectures are used in what areas.

    I apologize right away, I will skip a little to English terminology, because I don’t even know most of what it is called in Russian. Perhaps you too.

    So, the first part of the report will be devoted to convolutional neural networks. I will tell you how convolutional neural network (CNN) and image recognition work using an example from facial recognition. I’ll tell you a little about recurrent neural networks (RNN) and reinforcement learning using the example of deep learning systems.

    As a non-standard application of neural networks, I will talk about how CNN works in medicine to recognize voxel images, how neural networks are used to recognize poverty in Africa.

    What are neural networks

    The prototype for creating neural networks was, oddly enough, biological neural networks. Many of you may know how to program a neural network, but where it came from, I think some do not know. Two-thirds of all sensory information that comes to us comes from the visual organs of perception. More than one-third of the surface of our brain is occupied by the two most important visual areas - the dorsal visual pathway and the ventral visual pathway.

    The dorsal visual pathway begins in the primary visual zone, at our crown, and continues upward, while the ventral pathway begins at the back of our head and ends approximately behind the ears. All the important pattern recognition that occurs in us, everything that carries meaning, that we are aware of, takes place right there, behind the ears.

    Why is this important? Because it is often necessary to understand neural networks. Firstly, everyone talks about this, and I’m already used to this happening, and secondly, the fact is that all the areas that are used in neural networks for image recognition came to us precisely from the ventral visual pathway, where each a small zone is responsible for its strictly defined function.

    The image comes to us from the retina, passes through a series of visual zones and ends in the temporal zone.

    In the distant 60s of the last century, when the study of the visual areas of the brain was just beginning, the first experiments were carried out on animals, because there was no fMRI. The brain was studied using electrodes implanted into various visual areas.

    The first visual area was studied by David Hubel and Torsten Wiesel in 1962. They conducted experiments on cats. The cats were shown various moving objects. What the brain cells responded to was the stimulus that the animal recognized. Even now many experiments are carried out in these draconian ways. But nevertheless, this is the most effective way to find out what every small cell in our brain is doing.

    In the same way, many more important properties of the visual areas were discovered, which we use in deep learning now. One of the most important properties is the increase in the receptive fields of our cells as we move from the primary visual areas to the temporal lobes, that is, the later visual areas. The receptive field is that part of the image that every cell of our brain processes. Each cell has its own receptive field. The same property is preserved in neural networks, as you probably all know.

    Also, as receptive fields increase, so do the complex stimuli that neural networks typically recognize.

    Here you see examples of the complexity of stimuli, the different two-dimensional shapes that are recognized in areas V2, V4 and various parts of the temporal fields in macaque monkeys. A number of MRI experiments are also being carried out.

    Here you can see how such experiments are carried out. This is a 1 nanometer part of the monkey's IT cortex zones when recognizing various objects. Where it is recognized is highlighted.

    Let's sum it up. An important property that we want to adopt from the visual areas is that the size of the receptive fields increases, and the complexity of the objects that we recognize increases.

    Computer vision

    Before we learned to apply this to computer vision, in general, it didn’t exist as such. In any case, it did not work as well as it works now.

    We transfer all these properties to the neural network, and now it’s working, if you don’t include a small digression on the datasets, which I’ll talk about later.

    But first, a little about the simplest perceptron. It is also formed in the image and likeness of our brain. The simplest element resembling a brain cell is a neuron. Has input elements that by default are arranged from left to right, occasionally from bottom to top. On the left are the input parts of the neuron, on the right are the output parts of the neuron.

    The simplest perceptron is capable of performing only the simplest operations. In order to perform more complex calculations, we need a structure with a large number hidden layers.

    In case computer vision we need even more hidden layers. And only then will the system meaningfully recognize what it sees.

    So, I will tell you what happens during image recognition using the example of faces.

    For us to look at this picture and say that it shows exactly the face of the statue is quite simple. However, before 2010, this was an incredibly difficult task for computer vision. Those who have dealt with this issue before this time probably know how difficult it was to describe the object that we want to find in the picture without words.

    We needed to do this in some geometric way, describe the object, describe the relationships of the object, how these parts can relate to each other, then find this image on the object, compare them and get what we recognized poorly. It was usually a little better than flipping a coin. Slightly better than chance level.

    This is not how it works now. We divide our image either into pixels or into certain patches: 2x2, 3x3, 5x5, 11x11 pixels - as is convenient for the creators of the system in which they serve as the input layer to the neural network.

    Signals from these input layers are transmitted from layer to layer using synapses, each layer having its own specific coefficients. So we pass from layer to layer, from layer to layer, until we get that we have recognized the face.

    Conventionally, all these parts can be divided into three classes, we will denote them X, W and Y, where X is our input image, Y is a set of labels, and we need to get our weights. How do we calculate W?

    Given our X and Y, this seems simple. However, what is indicated by an asterisk is a very complex nonlinear operation, which, unfortunately, does not have an inverse. Even with 2 given components of the equation, it is very difficult to calculate it. Therefore, we need to gradually, by trial and error, by selecting the weight W, make sure that the error decreases as much as possible, preferably so that it becomes equal to zero.

    This process occurs iteratively, we constantly reduce until we find the value of weight W that suits us sufficiently.

    By the way, not a single neural network that I worked with achieved an error equal to zero, but it worked quite well.

    This is the first network to win the international ImageNet competition in 2012. This is the so-called AlexNet. This is the network that first declared itself that convolutional neural networks exist, and since then convolutional neural networks have never given up their positions in all international competitions.

    Despite the fact that this network is quite small (it has only 7 hidden layers), it contains 650 thousand neurons with 60 million parameters. In order to iteratively learn to find the necessary weights, we need a lot of examples.

    The neural network learns from the example of a picture and a label. Just as we are taught in childhood “this is a cat, and this is a dog,” neural networks are trained on a large number of pictures. But the fact is that until 2010 there was no large enough data set that could teach such a number of parameters to recognize images.

    The largest databases that existed before this time were PASCAL VOC, which had only 20 object categories, and Caltech 101, which was developed at the California Institute of Technology. The last one had 101 categories, and that was a lot. Those who were unable to find their objects in any of these databases had to cost their databases, which, I will say, is terribly painful.

    However, in 2010, the ImageNet database appeared, which contained 15 million images, divided into 22 thousand categories. This solved our problem of training neural networks. Now everyone who has an academic address can easily go to the base’s website, request access and receive this base for training their neural networks. They respond quite quickly, in my opinion, the next day.

    Compared to previous data sets, this is a very large database.

    The example shows how insignificant everything that came before it was. Simultaneously with the ImageNet base, the ImageNet competition appeared, an international challenge in which all teams wishing to compete can take part.

    This year the winning network was created in China, it had 269 layers. I don’t know how many parameters there are, I suspect there are also a lot.

    Deep neural network architecture

    Conventionally, it can be divided into 2 parts: those who study and those who do not study.

    Black indicates those parts that do not learn; all other layers are capable of learning. There are many definitions of what is inside each convolutional layer. One of accepted notations- one layer with three components is divided into convolution stage, detector stage and pooling stage.

    I won’t go into details; there will be many more reports that will discuss in detail how this works. I'll tell you with an example.

    Since the organizers asked me not to mention many formulas, I threw them out completely.

    So, the input image falls into a network of layers, which can be called filters different sizes and the varying complexity of the elements they recognize. These filters make up their own index or set of features, which then goes into the classifier. Usually this is either SVM or MLP - multilayer perceptron, whichever is convenient for you.

    In the same way as a biological neural network, objects of varying complexity are recognized. As the number of layers increased, it all lost contact with the cortex, since there is a limited number of zones in the neural network. 269 ​​or many, many zones of abstraction, so only an increase in complexity, number of elements and receptive fields is maintained.

    If we look at the example of face recognition, then our receptive field of the first layer will be small, then a little larger, larger, and so on until finally we can recognize the entire face.

    From the point of view of what is inside our filters, first there will be inclined sticks plus a little color, then parts of faces, and then entire faces will be recognized by each cell of the layer.

    There are people who claim that a person always recognizes better than a network. Is this true?

    In 2014, scientists decided to test how well we recognize in comparison with neural networks. They took the 2 best networks at the moment - AlexNet and the network of Matthew Ziller and Fergus, and compared them with the response different zones the brain of a macaque, which was also taught to recognize certain objects. The objects were from the animal world so that the monkey would not get confused, and experiments were conducted to see who could recognize better.

    Since it is impossible to get a clear response from the monkey, electrodes were implanted into it and the response of each neuron was directly measured.

    It turned out that under normal conditions, brain cells responded as well as the state of the art model at that time, that is, Matthew Ziller’s network.

    However, with an increase in the speed of displaying objects and an increase in the amount of noise and objects in the image, the recognition speed and quality of our brain and the brain of primates drops significantly. Even the simplest convolutional neural network can recognize objects better. That is, officially neural networks work better than our brains.

    Classic problems of convolutional neural networks

    There are actually not many of them; they belong to three classes. Among them are tasks such as object identification, semantic segmentation, face recognition, human body part recognition, semantic edge detection, highlighting objects of attention in an image and highlighting surface normals. They can be roughly divided into 3 levels: from the lowest-level tasks to the highest-level tasks.

    Using this image as an example, let's look at what each task does.

    • Defining boundaries- This is the lowest-level task for which convolutional neural networks are already classically used.
    • Determining the vector to the normal allows us to reconstruct a three-dimensional image from a two-dimensional one.
    • Saliency, identifying objects of attention- this is what a person would pay attention to when looking at this picture.
    • Semantic segmentation allows you to divide objects into classes according to their structure, without knowing anything about these objects, that is, even before they are recognized.
    • Semantic boundary highlighting- this is the selection of boundaries divided into classes.
    • Highlighting human body parts.
    • And the highest level task is recognition of the objects themselves, which we will now consider using the example of facial recognition.

    Face recognition

    The first thing we do is run the face detector over the image in order to find a face. Next, we normalize, center the face and run it for processing in a neural network. After which we obtain a set or vector of features that uniquely describes the features of this face.

    Then we can compare this feature vector with all the feature vectors that are stored in our database and get a reference to specific person, in his name, in his profile - everything that we can store in the database.

    This is exactly how our FindFace product works - it free service, which helps you search for people’s profiles in the VKontakte database.

    In addition, we have an API for companies who want to try our products. We provide services for face detection, verification and user identification.

    We have now developed 2 scenarios. The first is identification, searching for a person in a database. The second is verification, this is a comparison of two images with a certain probability that this is the same person. In addition, we are currently developing emotion recognition, image recognition on video and liveness detection - this is an understanding of whether the person in front of the camera or a photograph is alive.

    Some statistics. When identifying, when searching through 10 thousand photos, we have an accuracy of about 95%, depending on the quality of the database, and a 99% accuracy of verification. And besides this, this algorithm is very resistant to changes - we don’t have to look at the camera, we can have some obstructing objects: glasses, sunglasses, a beard, a medical mask. In some cases, we can even overcome the incredible challenges for computer vision, like glasses and a mask.

    Very fast search, takes 0.5 seconds to process 1 billion photos. We have developed unique index quick search. We can also work with images low quality received from CCTV cameras. We can process all this in real time. You can upload photos via the web interface, via Android, iOS and search through 100 million users and their 250 million photos.

    As I already said, we took first place in the MegaFace competition - an analogue for ImageNet, but for face recognition. It has been running for several years, last year we were the best among 100 teams from around the world, including Google.

    Recurrent neural networks

    We use Recurrent neural networks when it is not enough for us to recognize only an image. In cases where it is important for us to maintain consistency, we need the order of what is happening, we use ordinary recurrent neural networks.

    This is used for natural language recognition, video processing, even used for image recognition.

    I won’t talk about natural language recognition - after my report there will be two more that will be aimed at natural language recognition. So I'll tell you about work recurrent networks using the example of emotion recognition.

    What are recurrent neural networks? This is approximately the same as ordinary neural networks, but with feedback. We need feedback to transmit the previous state of the system to the input of the neural network or to some of its layers.

    Let's say we process emotions. Even in a smile - one of the simplest emotions - there are several moments: from a neutral facial expression to the moment when we have a full smile. They follow each other sequentially. To understand this well, we need to be able to observe how this happens, and transfer what was on the previous frame to the next step of the system.

    In 2005, at the Emotion Recognition in the Wild competition, a team from Montreal presented a recurrent system specifically for recognizing emotions, which looked very simple. It only had a few convolutional layers and worked exclusively with video. This year they also added audio recognition and aggregated frame-by-frame data obtained from convolutional neural networks, audio signal data with the operation of a recurrent neural network (with state return) and received first place in the competition.

    Reinforcement learning

    The next type of neural networks, which has been used very often lately, but has not received as much publicity as the previous 2 types, is deep reinforcement learning.

    The fact is that in the previous two cases we use databases. We either have data from faces, or data from pictures, or data with emotions from videos. If we don’t have this, if we can’t film it, how can we teach a robot to pick up objects? We do this automatically - we don't know how it works. Another example: compiling large databases in computer games complicated, and not necessary, it can be done much simpler.

    Everyone has probably heard about the success of deep reinforcement learning in Atari and Go.

    Who has heard of Atari? Well, someone heard, okay. I think everyone has heard about AlphaGo, so I won’t even tell you what exactly happens there.

    What's going on at Atari? The architecture of this neural network is shown on the left. She learns by playing with herself in order to get the maximum reward. The maximum reward is the fastest possible outcome of the game with the highest possible score.

    At the top right is the last layer of the neural network, which depicts the entire number of states of the system, which played against itself for only two hours. Desirable outcomes of the game with the maximum reward are depicted in red, and undesirable ones are depicted in blue. The network builds a certain field and moves through its trained layers to the state it wants to achieve.

    In robotics the situation is a little different. Why? Here we have several difficulties. Firstly, we don't have many databases. Secondly, we need to coordinate three systems at once: the perception of the robot, its actions with the help of manipulators and its memory - what was done in the previous step and how it was done. In general, this is all very difficult.

    The fact is that not a single neural network, even deep learning at the moment, can cope with this task effectively enough, so deep learning is only a piece of what robots need to do. For example, Sergei Levin recently provided a system that teaches a robot to grab objects.

    Here are the experiments he conducted on his 14 robotic arms.

    What's going on here? In these basins that you see in front of you, there are various objects: pens, erasers, smaller and larger mugs, rags, different textures, different hardness. It is unclear how to teach a robot to capture them. For many hours, and even, like, weeks, the robots trained to be able to grab these objects, and databases were compiled about this.

    Databases are a kind of environmental response that we need to accumulate in order to be able to train the robot to do something in the future. In the future, robots will learn from this set of system states.

    Non-standard applications of neural networks

    Unfortunately, this is the end, I don’t have much time. I will tell you about those non-standard solutions that currently exist and which, according to many forecasts, will have some application in the future.

    Well, Stanford scientists recently came up with a very unusual application of a CNN neural network to predict poverty. What did they do?

    The concept is actually very simple. The fact is that in Africa the level of poverty goes beyond all imaginable and inconceivable limits. They don't even have the ability to collect social demographic data. Therefore, since 2005, we have no data at all about what is happening there.

    Scientists collected day and night maps from satellites and fed them to a neural network over a period of time.

    The neural network was pre-configured on ImageNet. That is, the first layers of filters were configured so that it could recognize some very simple things, for example, roofs of houses, to search for settlements on daytime maps. Then the daytime maps were compared with the nighttime maps illumination of the same area of ​​the surface in order to say how much money the population has to at least illuminate their houses during the night.

    Here you see the results of the forecast built by the neural network. The forecast was made at different resolutions. And you see - the very last frame - real data collected by the Ugandan government in 2005.

    You can see that the neural network made a fairly accurate forecast, even with a slight shift since 2005.

    Of course there were side effects. Scientists who engage in deep learning are always surprised to discover various side effects. For example, like the fact that the network has learned to recognize water, forests, large construction sites, roads - all this without teachers, without pre-built databases. In general, completely independently. There were certain layers that reacted, for example, to roads.

    And the last application I would like to talk about is semantic segmentation of 3D images in medicine. In general, medical imaging is complex area, which is very difficult to work with.

    There are several reasons for this.

    • We have very few databases. It’s not so easy to find a picture of a brain, what’s more, a damaged one, and it’s also impossible to take it from anywhere.
    • Even if we have such a picture, we need to take a medic and force him to manually place all the multi-layered images, which is very time-consuming and extremely inefficient. Not all doctors have the resources to do this.
    • Very high precision is required. The medical system cannot make mistakes. When recognizing, for example, cats were not recognized - no big deal. And if we did not recognize the tumor, then this is no longer very good. The requirements for system reliability are particularly stringent here.
    • Images are in three-dimensional elements - voxels, not pixels, which brings additional complexity to system developers.
    But how did this issue get around in this case? CNN was dual-stream. One part processed a more normal resolution, the other a slightly worse resolution in order to reduce the number of layers that we need to train. Due to this, the time required to train the network was slightly reduced.

    Where it is used: identifying damage after an impact, to look for a tumor in the brain, in cardiology to determine how the heart works.

    Here is an example for determining the volume of the placenta.

    Automatically it works well, but not well enough to be released into production, so it's just getting started. There are several startups to create such medical vision systems. In general, there will be a lot of startups in deep learning in the near future. They say that venture capitalists have allocated more budget to deep learning startups in the last six months than in the past 5 years.

    This area is actively developing, there are many interesting directions. We live in interesting times. If you are involved in deep learning, then it’s probably time for you to open your own startup.

    Well, I'll probably wrap it up here. Thank you very much.

    The issues of artificial intelligence and neural networks are currently becoming more popular than ever before. Many users are increasingly turning to us with questions about how neural networks work, what they are and what is the principle of their operation?

    These questions, along with their popularity, also have considerable complexity, since the processes are complex machine learning algorithms designed for various purposes, from analyzing changes to modeling the risks associated with certain actions.

    What are neural networks and their types?

    The first question that arises for those interested is, what is a neural network? In the classical definition, this is a certain sequence of neurons that are interconnected by synapses. Neural networks are a simplified model of biological analogues.

    A program with a neural network structure allows the machine to analyze input data and remember the result obtained from certain sources. Subsequently, such an approach makes it possible to retrieve from memory the result corresponding to the current data set, if it was already available in the experience of network cycles.

    Many people perceive a neural network as an analogue of the human brain. On the one hand, this judgment can be considered close to the truth, but, on the other hand, the human brain is too complex a mechanism for it to be possible to recreate it with the help of a machine even by a fraction of a percent. A neural network is, first of all, a program based on the principle of the brain, but in no way its analogue.

    A neural network is a bunch of neurons, each of which receives information, processes it and transmits it to another neuron. Each neuron processes the signal in exactly the same way.

    How then do you get different results? It's all about the synapses that connect neurons to each other. One neuron can have a huge number of synapses that strengthen or weaken the signal, and they have the ability to change their characteristics over time.

    It is the correctly selected parameters of synapses that make it possible to obtain the correct result of transforming the input data at the output.

    Having defined in general terms what a neural network is, we can identify the main types of their classification. Before proceeding with the classification, it is necessary to introduce one clarification. Each network has a first layer of neurons, called the input layer.

    It does not perform any calculations or transformations; its task is only one thing: to receive and distribute input signals to other neurons. This is the only layer that is common to all types of neural networks; their further structure is the criterion for the main division.

    • Single layer neural network. This is a structure for the interaction of neurons, in which, after the input data enters the first input layer, the final result is immediately transferred to the output layer. In this case, the first input layer is not considered, since it does not perform any actions other than reception and distribution, this has already been mentioned above. And the second layer performs all the necessary calculations and processing and immediately produces the final result. Input neurons are combined with the main layer by synapses that have different weighting coefficients, ensuring the quality of connections.
    • Multilayer neural network. As is clear from the definition, this type of neural network, in addition to the input and output layers, also has intermediate layers. Their number depends on the complexity of the network itself. It more closely resembles the structure of a biological neural network. These types of networks were developed quite recently; before that, all processes were implemented using single-layer networks. Respectively similar solution has much more capabilities than its ancestor. In the information processing process, each intermediate layer represents an intermediate stage of information processing and distribution.

    Depending on the direction of information distribution across synapses from one neuron to another, networks can also be classified into two categories.

    • Direct propagation networks or unidirectional, that is, a structure in which the signal moves strictly from the input layer to the output layer. Signal movement in the opposite direction is impossible. Such developments are quite widespread and currently successfully solve problems such as recognition, predictions or clustering.
    • Networks with feedback or recurrent. Such networks allow the signal to travel not only in the forward direction, but also in the reverse direction. What does this give? In such networks, the result of the output can be returned to the input based on this, the output of the neuron is determined by the weights and input signals, and is complemented by the previous outputs, which are again returned to the input. Such networks have a characteristic function short term memory, on the basis of which the signals are restored and supplemented during processing.

    These are not the only options for classifying networks.

    They can be divided into homogeneous and hybrid based on the types of neurons that make up the network. And also heteroassociative or autoassociative, depending on the network training method, with or without a teacher. You can also classify networks according to their purpose.

    Where are neural networks used?

    Neural networks are used to solve a variety of problems. If we consider tasks by degree of complexity, then the usual one is suitable for solving the simplest problems. computer program, more
    complex problems that require simple forecasting or approximate solution of equations, programs using statistical methods are used.

    But tasks of an even more complex level require a completely different approach. This applies in particular to pattern recognition, speech recognition or complex prediction. In a person’s head, such processes occur unconsciously, that is, while recognizing and remembering images, a person is not aware of how this process occurs, and accordingly cannot control it.

    It is precisely these problems that neural networks help solve, that is, they are created to carry out processes whose algorithms are unknown.

    Thus, neural networks are widely used in the following areas:

    • recognition, and this direction is currently the broadest;
    • predicting the next step, this feature is applicable in trading and stock markets;
    • classification of input data by parameters; this function is performed by credit robots, which are able to make a decision in approving a loan to a person, relying on an input set of different parameters.

    The capabilities of neural networks make them very popular. They can be taught many things, such as playing games, recognizing a certain voice, and so on. Based on the fact that artificial networks are built on the principle of biological networks, they can be taught all the processes that a person performs unconsciously.

    What is a neuron and a synapse?

    So what is a neuron in terms of artificial neural networks? This concept refers to a unit that performs calculations. It receives information from the input layer of the network, performs simple calculations with it, and feeds it to the next neuron.

    The network contains three types of neurons: input, hidden and output. Moreover, if the network is single-layer, then it does not contain hidden neurons. In addition, there are a variety of units called displacement neurons and context neurons.

    Each neuron has two types of data: input and output. In this case, the input data of the first layer is equal to the output data. In other cases, the neuron input receives the total information of the previous layers, then it goes through a normalization process, that is, all values ​​falling outside the desired range are transformed by the activation function.

    As mentioned above, a synapse is a connection between neurons, each of which has its own degree of weight. It is thanks to this feature that the input information is modified during the transmission process. During processing, the information transmitted by the synapse with a large weight will be dominant.

    It turns out that the result is influenced not by neurons, but by synapses that give a certain set of weights to the input data, since the neurons themselves perform exactly the same calculations every time.

    In this case, the weights are set in random order.

    Scheme of operation of a neural network

    To imagine the principle of operation of a neural network, no special skills are required. The input layer of neurons receives certain information. It is transmitted through synapses to the next layer, with each synapse having its own weight coefficient, and each subsequent neuron can have several incoming synapses.

    As a result, the information received by the next neuron is the sum of all data, each multiplied by its own weight coefficient. The resulting value is substituted into the activation function and output information is obtained, which is transmitted further until it reaches final exit. The first launch of the network does not give the correct results, since the network is not yet trained.

    The activation function is used to normalize the input data. There are many such functions, but there are several main ones that are most widely used. Their main difference is the range of values ​​in which they operate.

    • The linear function f(x) = x, the simplest of all possible, is used only for testing the created neural network or transmitting data in its original form.
    • Sigmoid is considered the most common activation function and has the form f(x) = 1 / 1+e-×; Moreover, the range of its values ​​is from 0 to 1. It is also called the logistic function.
    • To cover negative values, a hyperbolic tangent is used. F(x) = e²× - 1 / e²× + 1 - this is the form of this function and the range it has is from -1 to 1. If the neural network does not provide for the use of negative values, then it should not be used.

    In order to give the network the data with which it will operate, training sets are needed.

    Integration is a meter that increases with each training set.

    The epoch is an indicator of the training of a neural network; this indicator increases each time the network goes through a cycle of a full set of training sets.

    Accordingly, in order to train the network correctly, you need to perform sets, consistently increasing the epoch indicator.

    Errors will be identified during training. This is the percentage difference between the obtained and the desired result. This indicator should decrease as the epoch indicator increases, otherwise there is a developer error somewhere.

    What is a bias neuron and what is it for?

    In neural networks there is another type of neuron - a displacement neuron. It differs from the main type of neurons in that its input and output are equal to one in any case. Moreover, such neurons do not have input synapses.

    The arrangement of such neurons occurs one per layer and no more, and they cannot synapse with each other. It is not advisable to place such neurons on the output layer.

    What are they for? There are situations in which the neural network simply will not be able to find the right solution due to the fact that the desired point will be out of reach. This is precisely why such neurons are needed to be able to shift the definition area.

    That is, the weight of the synapse changes the bend of the function graph, while the displacement neuron allows for a shift along the X coordinate axis, so that the neural network can capture an area inaccessible to it without a shift. In this case, the shift can be carried out both to the right and to the left. Shift neurons are usually not marked schematically; their weight is taken into account by default when calculating the input value.

    Also, displacement neurons will allow you to get a result in the case when all other neurons produce 0 as an output parameter. In this case, regardless of the weight of the synapse, exactly this value will be transmitted to each subsequent layer.

    The presence of a displacement neuron will allow you to correct the situation and get a different result. The feasibility of using displacement neurons is determined by testing the network with and without them and comparing the results.

    But it is important to remember that to achieve results it is not enough to create a neural network. It also needs to be trained, which also requires special approaches and has its own algorithms. This process can hardly be called simple, since its implementation requires certain knowledge and effort.

    Let's begin our consideration of the material by introducing and defining the very concept of an artificial neural system.

    Can be thought of as an analog computing system that uses simple data processing elements, mostly connected in parallel to each other. Data processing elements perform very simple logical or arithmetic operations over your input data. The basis for the functioning of an artificial neural system is that weight coefficients are associated with each element of such a system. These weights represent the information stored in the system.

    Diagram of a typical artificial neuron

    A neuron can have many inputs, but only one output. The human brain contains approximately neurons, and each neuron can have thousands of connections to others. The input signals of the neuron are multiplied by weighting coefficients and added to obtain the total input of the neuron - I:

    Rice. 1.Typical artificial neuron

    The function that relates the output of a neuron to its inputs is called the firing function. It has the form of a sigmoid function θ . The formalization of the neuron response is that the original signal is sent to one of the boundaries upon receiving very small and very large input signals. In addition, each neuron has a threshold value associated with it - θ , which in the formula for calculating the output signal is subtracted from the total input signal. As a result, the output of a neuron, O, is often described as follows:

    Network structure with backpropagation" src="https://libtime.ru/uploads/images/00/00/01/2014/06/27/set-s-obratnym-rasprostraneniyem.png.pagespeed.ce.O_0jCrJsLr.png " alt="Backpropagation network structure" width="450" height="370">!}

    Rice. 2. Backpropagation network

    Backpropagation network, as a rule, is divided into three segments, although additional segments may also be formed. The segments (segment) located between the input and output segments are called hidden segments, since only the input and output segments are visually perceived by the outside world. Network that calculates the value logical operation“XOR” produces a true value at the output only in cases where not all of its inputs have true values ​​or not all of its inputs have false values. The number of nodes in a hidden sector may vary depending on the purpose of the project.

    Characteristics of neural networks

    It should be noted that neural networks do not require programming in the usual sense of the word. To train neural networks, special neural network training algorithms are used, such as counterpropagation and backpropagation. The programmer “programs” the network by specifying inputs and corresponding outputs. The network learns by automatically adjusting weights for synaptic connections between neurons.

    The weighting coefficients, together with the threshold values ​​of the neurons, determine the nature of the distribution of data through the network and, thereby, set the correct response to the data used in the training process. Training the network to get the right answers can be time consuming. How much depends on how many images must be learned during network training, as well as on the capabilities of the hardware and auxiliary devices used software. However, once training is completed, the network is able to provide answers at high speed.

    In its own way architecture artificial neural system different from others computing systems. In a classical information system, the ability to connect discrete information with memory elements is realized. For example, usually information system stores data about a specific object in a group of adjacent memory elements. Consequently, the ability to access and manipulate data is achieved by creating a one-to-one relationship between the attributes of an object and the addresses of the memory cells in which they are stored.

    In contrast to such systems, models of artificial neural systems are developed based on modern theories of brain functioning, according to which information is represented in the brain using weights. However, there is no direct correlation between a specific weight coefficient value and a specific element of stored information.

    This distributed representation of information is similar to the image storage and presentation technology used in holograms. According to this technology, the lines of the hologram act like diffraction gratings. With their help, when a laser beam passes through, the stored image is reproduced, however, the data themselves are not directly interpreted.


    Neural network as a means of solving a problem

    Neural network acts as an acceptable means of solving a problem when there is large number empirical data, but there is no algorithm that would be able to provide a sufficiently accurate solution with the required speed. In this context, the technology for representing artificial neural system data has significant benefits in front of others information technology. These advantages can be formulated as follows:

    1. The neural network memory is fault-tolerant. When individual parts of the neural network are removed, only a decrease in the quality of information occurs; it is retained, but not its complete disappearance. This happens because the information is stored in a distributed form.
    2. The quality of information in the neural network that is subject to reduction decreases gradually, in proportion to the part of the network that was removed. There is no catastrophic loss of information.
    3. Data in a neural network is stored naturally using associative memory. Associative memory is a memory in which it is enough to search for partially presented data in order to completely restore all the information. This is the difference between associative memory and ordinary memory, in which data is obtained by specifying the exact address of the corresponding memory elements.
    4. allow you to perform extrapolation and interpolation based on the information stored in them. That is, training allows you to give the network the ability to search for important features or relationships in the data. The network is then able to extrapolate and identify connections in the new data it receives. For example, in one experiment, a neural network was trained using a hypothetical example. After completing the training, the network acquired the ability to correctly answer questions for which no training was provided.
    5. Neural networks are plastic. Even after removing a certain number of neurons, the network can be retrained to its primary level (of course, if there are a sufficient number of neurons left in it). This feature is also characteristic of the human brain, in which individual parts may be damaged, but over time, with the help of training, a primary level of skills and knowledge is achieved.

    Thanks to such features, artificial neural systems become very attractive for use in robotic spacecraft, oil industry equipment, underwater vehicles, process control equipment and others technical devices , which must function for a long time without repair in an unfavorable environment. Artificial neural systems not only solve the problem of reliability, but also provide an opportunity to reduce operating costs due to their plasticity.

    However, in general, artificial neural systems are not very well suited for creating applications that require complex mathematical calculations or finding the optimal solution. In addition, the use of an artificial neural system will not the best option in case there is an algorithmic solution that has already provided a positive result due to practical application to solve such problems.