What is Big Data: characteristics, classification, examples. Encyclopedia of Marketing

Big data, or big data, is a concept used in information technology and marketing. The term “big data” is used to define the analysis and management of large volumes. Thus, big data is information that, due to its large volumes, cannot be processed traditional ways.

Modern life cannot be imagined without digital technologies. The world's data warehouses are constantly being replenished, and therefore it is also necessary to continuously change both the conditions for storing information and look for new ways to increase the volume of its media. Based on expert opinion, the increase big data and increasing growth rates are current realities. As already mentioned, information appears non-stop. Huge volumes of it are generated by information sites, various file sharing services and social networks, but this is only a small part of the total volume produced.

IDC Digital Universe, after conducting a study, stated that within 5 years the volume of data on the entire Earth will reach forty zettabytes. This means that for every person on the planet there will be 5200 GB of information.

Best article of the month

The first half of 2018 is ending - it’s time to sum up the interim results. Even if the company's commercial performance has increased compared to the previous period, make sure that there are no hidden difficulties in the company's work that could cause trouble.

To diagnose problems, fill out the checklists from our article and find out which side of the business to pay attention to.

It is common knowledge that people are not the main producers of information. The main source that brings information data are robots that continuously interact. These include the operating system of computers, tablets and mobile phones, intelligent systems, monitoring tools, surveillance systems, etc. Together, they set a rapid rate of increase in the amount of data, which means that the need to create both real and virtual servers is increasing. Taken together, this leads to the expansion and implementation of new data centers.

Most often, big data is defined as information that exceeds the volume of a PC's hard drive and cannot be processed by traditional methods that are used to process and analyze information with a smaller volume.

To summarize, big data processing technology ultimately comes down to 3 main areas, which, in turn, solve 3 types of problems:

Storing and managing huge amounts of data - up to hundreds of terabytes and petabytes in size - that relational databases cannot effectively use.
Organization of unstructured information - texts, images, videos and other types of data.
Big data analysis (big data analytics) - this covers ways of working with unstructured information, creating analytical data reports, and introducing predictive models.

Market of projectsbigdata is closely interconnected with the VA market - business analytics, the volume of which in 2012 amounted to about $100 billion, and includes network technologies, software, technical services and servers.

Automation of company activities, in particular income assurance (RA) solutions, is also inextricably linked with the use of big data technologies. Today, systems in this area contain tools that are used to detect inconsistencies and for in-depth data analysis, and also help identify possible losses or inaccuracies in information that could lead to a decrease in the results of the sector.

Russian companies confirm that there is a demand for big data technologies; they separately note that the main factors influencing the development of big data in Russia are an increase in the volume of data, rapid adoption of management decisions and an increase in their quality.

What role does big data play in marketing?

It's no secret that information is one of the main components of successful forecasting and development of a marketing strategy, if you know how to use it.

Big data analysis is indispensable in determining the target audience, its interests and activity. In other words, the skillful use of big data allows you to accurately predict the development of a company.

Using, for example, the well-known RTB auction model, with the help of big data analysis it is easy to make sure that advertising is displayed only to those potential buyers who are interested in purchasing a service or product.

Application big data in marketing:

Allows you to recognize potential buyers and attract the appropriate audience on the Internet.
Helps assess satisfaction.
Helps match the service offered to the needs of the buyer.
Facilitates the search and implementation of new methods to increase customer loyalty.
Simplifies the creation of projects that will subsequently be in demand.

A particular example is the Google.trends service. With its help, a marketer will be able to identify the forecast for the season regarding a particular product, the geography of clicks and fluctuations. Thus, by comparing the information received with the statistics of your own website, it is quite easy to draw up an advertising budget indicating the region and month.

Distribution of advertising budget: what is worth spending on

How and where to store big data big data

File system- this is where big data is organized and stored. All information is located on a large number of hard drives on the PC.

"Map"- map - keeps track of where each piece of information is directly stored.

In order to insure against unforeseen circumstances, it is customary to save each piece of information several times - it is recommended to do this three times.

For example, after collecting individual transactions in a retail chain, all information about each individual transaction would be stored on multiple servers and hard drives, and a “map” would index the file location for each specific transaction.

In order to organize data storage in large volumes, you can use standard technical equipment and publicly available software (for example, Hadoop).

Big data and business analytics: the difference between concepts

Today, business analysis is a descriptive process of results that were achieved over a specific time period. The current speed of processing big data makes the analysis predictive. You can rely on his recommendations in the future. Big data technologies make it possible to analyze a larger number of types of data compared to the tools and tools used in business analytics. This allows you not only to focus on warehouses where data is structured, but to use significantly wider resources.

Business analytics and big data are similar in many ways, but there are the following differences:

Big data is used to process a volume of information that is significantly larger than business analytics, which defines the very concept of big data.
With the help of big data, you can process quickly received and changing data, which leads to interactivity, i.e. in most cases, the speed of loading a web page is less than the speed of generating results.
Big data can be used when processing data that does not have a structure, work with which should begin only after ensuring its storage and collection. In addition, it is necessary to apply algorithms that can identify the main patterns in the created arrays.

The process of business analytics is not very similar to the work of big data. As a rule, business analytics tends to obtain results by adding specific values: an example is the annual sales volume, calculated as the sum of all paid invoices. In the process of working with big data, calculations are made by building a model step by step:

putting forward a hypothesis;
building a static, visual and semantic model;
testing the validity of the hypothesis based on the specified models;
putting forward the following hypothesis.

In order to complete the research cycle, it is necessary to interpret visual meanings (interactive knowledge-based queries). An adaptive machine learning algorithm can also be developed.

Expert opinion

You cannot blindly rely only on the opinions of analysts

Vyacheslav Nazarov,

General Director of the Russian representative office of Archos, Moscow

About a year ago, based on expert opinion, we launched a completely new tablet on the market, game console. Compactness and sufficient technical power have found recognition among fans of computer games. It should be noted that this group, despite its “narrowness,” had a fairly high purchasing power. At first, the new product collected a lot of positive reviews in the media and received an approving assessment from our partners. However, it soon became clear that tablet sales were quite low. The solution never found its mass popularity.

Error. Our flaw was that the interests of the target audience were not fully studied. Users who prefer to play on a tablet do not require super graphics as they mostly play on simple games. Serious gamers are already accustomed to playing on a computer on more advanced platforms. There was no massive advertising of our product, the marketing campaign was also weak, and ultimately, the tablet did not find its buyer in any of the specified groups.

Consequences. Production of the product had to be reduced by almost 40% compared to originally planned volumes. Of course, there were no big losses, nor were there any planned profits. However, this forced us to adjust some strategic objectives. The most valuable thing that we have irretrievably lost is our time.

Adviсe. You need to think forward. Product lines need to be thought two or three steps ahead. What does it mean? When launching a certain model range today, it is desirable to understand its fate tomorrow and have at least an approximate picture of what will happen to it in a year and a half. Of course, complete detail is unlikely, but basic plan still needs to be compiled.

And you shouldn’t trust analysts entirely. Experts’ assessments must be compared with one’s own statistical data, as well as with the operational situation on the market. If your product is not fully developed, you should not release it to the market, because for the buyer the first impression is the most important, and then convincing him will not be an easy task.

Very important advice in case of failure - speed of decision-making. You absolutely can’t just watch and wait. Solving a problem without delay is always much easier and cheaper than fixing a neglected one.

What problems does the big data system create?

There are three main groups of problems of big data systems, which in foreign literature are combined into 3V - Volume, Velocity and Variety, that is:

Volume.
Processing speed.
Lack of structure.

The issue of storing large volumes of information is associated with the need to organize certain conditions, that is, with the creation of space and opportunities. As for speed, it is associated not so much with slowdowns and braking when using outdated processing methods, but with interactivity: the faster the information processing process, the more productive the result.

The problem of unstructuredness comes from the separateness of sources, their format and quality. Successful integration and processing of big data requires both work on its preparation and analytical tools or systems.
The limit on the “magnitude” of the data also has a great influence. It is quite difficult to determine the value, and based on this, it is problematic to calculate what financial investments will be required and what technologies will be needed. However, for certain quantities, for example, terabytes, new processing methods are successfully used today, which are constantly being improved.
The lack of generally accepted principles for working with big data is another problem, which is complicated by the aforementioned heterogeneity of flows. To solve this problem, new methods of big data analysis are being created. Based on the statements of representatives of universities in New York, Washington and California, the creation of a separate discipline and even the science of big data is not far off. This is the main reason why companies are in no hurry to introduce projects related to big data. Another factor is high cost.
Difficulties also arise in the selection of data for analysis and the algorithm of actions. To date, there is no understanding of what data carries valuable information and requires big data analytics, and what data can be ignored. In this situation, one more thing becomes clear - there are not enough industry professionals on the market who can cope with in-depth analysis, make a report on solving the problem and, accordingly, thereby bring profit.
There is also a moral side to the question: is collecting data without the user’s knowledge different from a gross invasion of privacy? It is worth noting that data collection improves the quality of life: for example, continuous data collection in Google and Yandex systems helps companies improve their services depending on consumer needs. The systems of these services note every user click, his location and sites visited, all messages and purchases - and all this makes it possible to display advertising based on user behavior. The user did not consent to the collection of data: no such choice was provided. This leads to the next problem: how secure is the information stored? For example, information about potential buyers, the history of their purchases and transitions to various sites can help solve many business problems, but whether the platform that buyers use is safe is very important. controversial issue. Many people appeal to the fact that today not a single data storage facility - even military service servers - is sufficiently protected from hacker attacks.

Trade secrets: protection and penalties for disclosure

Step-by-step use of big data

Stage 1. Technological implementation of the company in a strategic project.

The tasks of technical specialists include preliminary elaboration of the development concept: analysis of development paths in areas that need it most.

To determine the composition and tasks, a conversation is held with customers, as a result of which the required resources are analyzed. At the same time, the organization decides to outsource all tasks completely or to create a hybrid team consisting of specialists from this and any other organizations.

According to statistics, a large number of companies use exactly this scheme: having a team of experts inside, monitoring the quality of work and forming a movement, and outside, directly testing hypotheses about the development of any direction.

Step 2: Finding a data scientist.

The manager assembles the staff of workers collectively. He is also responsible for the development of the project. HR employees play a direct role in creating the internal team.

First of all, such a team needs a data analyst engineer, also known as data scientist, who will deal with the task of forming hypotheses and analyzing an array of information. The correlations he identifies will be used in the future to establish new products and services.

Especially at the initial stages it is important task of the HR department. Its employees decide who exactly will do the work aimed at developing the project, where to get it and how to motivate it. It is not so easy to find a data analyst engineer, so this is a “piece product”.

Every serious company must have a specialist of this profile, otherwise the focus of the project is lost. Analytical engineer combined: developer, analyst and business analyst. In addition, he must have communication skills to demonstrate the results of his activities and a wealth of knowledge and skills to explain his thoughts in detail.

24 thoughts that start big changes in life

Search examples

1. A taxi company “Big Data” was organized in Moscow. Along the route, passengers answered tasks in the field of professional analytics. If the passenger answered most of the questions correctly, the company offered him a job. The main disadvantage of this type of personnel selection technique is the reluctance of the majority to participate in this type of project. Only a few people agreed to the interview.

2. Holding a special competition in business analytics with some kind of prize. A large Russian bank used this method. As a result, more than 1,000 people participated in the hackathon competition. Those who achieved the highest success in the competition were offered a job. Unfortunately, most of the winners did not express a desire to receive the position, since their motivation was only the prize. But still, several people agreed to work in the team.

3. Search among data specialists who understand business analytics and are able to restore order by building the correct algorithm of actions. The necessary skills of a specialist analyst include: programming, knowledge of Python, R, Statistica, Rapidminer and other knowledge that is no less important for a business analyst.

Stage 3. Creating a team for development.

A well-coordinated team is needed. When considering advanced analytics, such as company innovation, a manager will be required to create and develop business intelligence.

Research Engineer is engaged in constructing and testing hypotheses for the successful development of the chosen vector.

To the head it is necessary to organize the development of the chosen line of business, create new products and coordinate them with customers. His responsibilities, in addition, include the calculation of business cases.

Development Manager must work closely with everyone. The analytical engineer and business development manager identify the needs and opportunities for big data analysis through meetings with employees responsible for various areas of the project. After analyzing the situation, the manager creates cases, thanks to which the company will make decisions on the further development of a direction, service or product.

Development manager: requirements and job description

3 principles of working with bigdata

We can highlight the main methods of working with big data:

Horizontal scalability. Due to the fact that there must be a huge amount of data, any system that processes a large amount of information will be expandable. For example, if the volume of data has increased several times, the volume of hardware in the cluster has accordingly increased by the same amount.
Fault tolerance. Based on the principle of horizontal scalability, we can conclude that there are a large number of machines in the cluster. For example, the Hadoop cluster from Yahoo has more than 42,000 of them. All methods of working with big data must take into account possible malfunctions and look for ways to cope with problems without consequences.
Data locality. Data stored in large systems is distributed across a fairly large number of machines. Therefore, in a situation where data is stored on server No. 1 and processed on server No. 2, we cannot exclude the possibility that their transfer will cost more than processing. That is why during design, great attention is paid to ensuring that data is stored and processed on one computer.

All methods of working with big data, one way or another, adhere to these three principles.

How to use the big data system

Effective big data solutions for a wide variety of business areas are achieved through the many combinations of software and hardware that currently exist.

Important dignitybigdata- the ability to use new tools with those already used in this area. This plays a particularly important role in situations with cross-disciplinary projects. An example is multi-channel sales and customer support.

To work with big data, a certain sequence is important:

First, data is collected;
then the information is structured. For this purpose, dashboards are used ( Dashboards - structuring tools;
at the next stage, insights and contexts are created, on the basis of which recommendations for decision-making are formed. Due to the high costs of data collection, the main task is to determine the purpose of using the information obtained.

Example. Advertising agencies may use location information aggregated from telecommunications companies. This approach will provide targeted advertising. The same information is applicable in other areas related to the provision and sale of services and goods.

The information obtained in this way may be key in deciding whether to open a store in a particular area.

If we consider the case of using outdoor billboards in London, there is no doubt that today such an experience is only possible if a special measuring device is placed near each billboard. At the same time, mobile operators always know basic information about their subscribers: their location, marital status, and so on.

Another potential area of application for big data is collecting information about the number of visitors to various events.

Example. The organizers of football matches are not able to know the exact number of people who came to the match in advance. However, they would have received such information if they had used the information from the operators mobile communications: where potential visitors are located for a certain period of time - a month, a week, a day - before the match. It turns out that the organizers would have the opportunity to plan the location of the event depending on the preferences of the target audience.

Big data also provides incomparable benefits for the banking sector, which can use the processed data to identify unscrupulous cardholders.

Example. When a card holder reports its loss or theft, the bank has the opportunity to track the location of the card used for payment and the holder’s mobile phone to verify the veracity of the information. Thus, the bank representative has the opportunity to see that payment card and the holder's mobile phone are in the same zone. This means that the owner uses the card.

Thanks to the benefits of this kind of information, the use of information gives companies many new opportunities, and the big data market continues to develop.

The main difficulty in implementing big data is the complexity of calculating the case. This process is complicated by the presence large quantity unknown.

It is quite difficult to make any forecasts for the future, while data about the past is not always within reach. In this situation, the most important thing is planning your initial actions:

Defining a specific issue in solving which big data processing technology will be applied will help determine the concept and set the vector further actions. Having focused on collecting information specifically on this issue, it is also worth taking advantage of all available tools and methods to get a clearer picture. Moreover, this approach will greatly facilitate the decision-making process in the future.
The likelihood that a big data project will be implemented by a team without certain skills and experience is extremely low. The knowledge that needs to be used in such complex research is usually acquired through long labor, which is why previous experience is so important in this field. It is difficult to overestimate the influence of a culture of using information obtained through such research. They provide various opportunities, including the abuse of received materials. To use information for good, you should adhere to the basic rules of correct data processing.
Insights are the core value of technology. The market still experiences an acute shortage of strong specialists who have an understanding of the laws of doing business, the importance of information and the scope of its application. One cannot ignore the fact that data analysis is a key way to achieve set goals and business development; one must strive to develop specific model behavior and perception. In this case, big data will be beneficial and play a positive role in solving business management issues.

Successful cases of big data implementation

Some of the cases listed below were more successful in data collection, others - in big data analytics and ways to apply the data obtained during the study.

« Tinkoff Credit Systems» used the EMC2 Greenplum platform for massive parallel computing. Due to the continuous increase in the flow of card users in the bank, there was a need to make data processing faster. It was decided to use big data and work with unstructured information, as well as corporate information that was obtained from disparate sources. It did not escape the attention of their specialists that the analytical layer of the federal data warehouse is being introduced on the website of the Russian Federal Tax Service. Subsequently, on its basis, it is planned to organize a space that provides access to tax system data for subsequent processing and obtaining statistical data.
The Russian startup is worth considering separately Synqera, engaged in big data online analysis and developed the Simplate platform. The bottom line is that a large amount of data is processed, data about consumers, their purchases, age, mood and state of mind are analyzed. A chain of cosmetics stores installed sensors at checkouts that can recognize customer emotions. After determining the mood, information about the buyer and the time of purchase are analyzed. After this, the buyer receives targeted information about discounts and promotions. This solution increased consumer loyalty and was able to increase the seller's income.
We should also talk about a case study on the use of big data technologies in a company Dunkin'Donuts, which, similar to the previous example, used online analysis to increase profits. So, at retail outlets, displays displayed special offers, the contents of which changed every minute. The basis for substitutions in the text was both the time of day and the product in stock. From cash receipts, the company received information about which items were in greatest demand. This method allowed us to increase income and inventory turnover.

Thus, processing big data has a positive effect on solving business problems. An important factor, of course, is the choice of strategy and the use of the latest developments in the field of big data.

Company information

Archos. Field of activity: production and sale of electronic equipment. Territory: sales offices are open in nine countries (Spain, China, Russia, USA, France, etc.). Number of branch staff: 5 (in the Russian representative office).

"Big Data" is a topic that is actively discussed by technology companies. Some of them have become disillusioned with big data, while others, on the contrary, are making the most of it for business... A fresh analytical review of the domestic and global Big Data market, prepared by the Moscow Exchange together with IPOboard analysts, shows which trends are most relevant in the market now . We hope the information will be interesting and useful.

WHAT IS BIG DATA?

Key Features

Big Data is currently one of the key drivers of information technology development. This direction, relatively new for Russian business, has become widespread in Western countries. This is due to the fact that in the era of information technology, especially after the boom of social networks, a significant amount of information began to accumulate for each Internet user, which ultimately gave rise to the development of Big Data.

The term “Big Data” causes a lot of controversy; many believe that it only means the amount of accumulated information, but we should not forget about the technical side; this area includes storage technologies, computing, and services.

It should be noted that this area includes the processing of a large amount of information, which is difficult to process using traditional methods*.

Below is a comparison table between traditional and Big Data databases.

The field of Big Data is characterized by the following features:
Volume – volume, the accumulated database represents a large amount of information that is labor-intensive to process and store in traditional ways; they require a new approach and improved tools.
Velocity – speed, this attribute indicates both the increasing speed of data accumulation (90% of information was collected over the last 2 years) and the speed of data processing; real-time data processing technologies have recently become more in demand.
Variety – diversity, i.e. the ability to simultaneously process structured and unstructured information of various formats. The main difference between structured information is that it can be classified. An example of such information would be information about customer transactions.
Unstructured information includes video, audio files, free text, information coming from social networks. Today, 80% of information is unstructured. This information needs complex analysis to make it useful for further processing.
Veracity – reliability of data, users began to attach increasing importance to the reliability of available data. Thus, Internet companies have a problem in separating the actions carried out by a robot and a person on the company’s website, which ultimately leads to difficulties in data analysis.
Value – the value of the accumulated information. Big Data must be useful to the company and bring some value to it. For example, help in improving business processes, reporting or optimizing costs.

If the above 5 conditions are met, the accumulated volumes of data can be classified as large.

Areas of application of Big Data

The scope of use of Big Data technologies is extensive. Thus, with the help of Big Data, you can learn about customer preferences, the effectiveness of marketing campaigns, or conduct risk analysis. Below are the results of a survey by the IBM Institute on the areas of use of Big Data in companies.

As can be seen from the diagram, most companies use Big Data in the field of customer service, the second most popular area is operational efficiency; in the field of risk management, Big Data is less common at the moment.

It should also be noted that Big Data is one of the fastest growing areas of information technology; according to statistics, the total amount of data received and stored doubles every 1.2 years.
Between 2012 and 2014, the amount of data transferred monthly by mobile networks increased by 81%. According to Cisco estimates, in 2014 the volume of mobile traffic was 2.5 exabytes (a unit of measurement of the amount of information equal to 10^18 standard bytes) per month, and in 2019 it will be equal to 24.3 exabytes.
Thus, Big Data is an already established area of technology, even despite its relatively young age, which has become widespread in many areas of business and plays an important role in the development of companies.

Big Data Technologies

Technologies used for collecting and processing Big Data can be divided into 3 groups:

Software;
Equipment;
Services.

The most common data processing (DP) approaches include:
SQL – a structured query language that allows you to work with databases. WITH using SQL Data can be created and modified, and the data array is managed by an appropriate database management system.
NoSQL – the term stands for Not Only SQL (not only SQL). It includes a number of approaches aimed at implementing a database that differ from the models used in traditional relational DBMSs. They are convenient to use when the data structure is constantly changing. For example, to collect and store information on social networks.
MapReduce – calculation distribution model. Used for parallel computing on very large data sets (petabytes* or more). IN software interface It is not the data that is transferred to the program for processing, but the program to the data. Thus, the request is a separate program. The operating principle is sequential processing data using two methods Map and Reduce. Map selects preliminary data, Reduce aggregates it.
Hadoop – used to implement search and contextual mechanisms for high-load sites - Facebook, eBay, Amazon, etc. A distinctive feature is that the system is protected from failure of any of the cluster nodes, since each block has at least one copy of the data on another node.
SAP HANA – high-performance NewSQL platform for data storage and processing. Provides high speed of request processing. Another distinctive feature is that SAP HANA simplifies the system landscape, reducing the cost of supporting analytical systems.

TO technological equipment include:

servers;
infrastructure equipment.

Servers include data storage.
Infrastructure equipment includes platform acceleration tools, uninterruptible power supplies, server console sets, etc.

Services.
Services include services for building a database system architecture, arranging and optimizing infrastructure, and ensuring data storage security.

Software, hardware, and services together form comprehensive platforms for data storage and analysis. Companies such as Microsoft, HP, EMC offer services for the development, deployment and management of Big Data solutions.

Applications in industries

Big Data has become widespread in many business sectors. They are used in healthcare, telecommunications, trade, logistics, financial companies, as well as in government administration.
Below are some examples of Big Data applications in some of the industries.

Retail
The databases of retail stores can accumulate a lot of information about customers, inventory management systems, and supplies of commercial products. This information can be useful in all areas of store activity.

Thus, with the help of accumulated information, you can manage the supply of goods, their storage and sale. Based on the accumulated information, it is possible to predict the demand and supply of goods. Also, a data processing and analysis system can solve other problems of a retailer, for example, optimizing costs or preparing reporting.

Financial services
Big Data makes it possible to analyze the borrower’s creditworthiness and is also useful for credit scoring* and underwriting**. The introduction of Big Data technologies will reduce the time for reviewing loan applications. With the help of Big Data, it is possible to analyze the transactions of a specific client and offer banking services that are suitable for him.

Telecom
In the telecommunications industry, Big Data has become widespread among mobile operators.
Operators cellular communication Along with financial organizations, they have one of the most voluminous databases, which allows them to conduct the most in-depth analysis of the accumulated information.
The main purpose of data analysis is to retain existing customers and attract new ones. To do this, companies segment customers, analyze their traffic, and determine the social affiliation of the subscriber.

In addition to using Big Data for marketing purposes, technologies are used to prevent fraudulent financial transactions.

Mining and petroleum industries
Big Data is used both in the extraction of minerals and in their processing and marketing. Based on the information received, enterprises can draw conclusions about the efficiency of field development, monitor the schedule for major repairs and the condition of equipment, and forecast demand for products and prices.

According to a survey by Tech Pro Research, Big Data is most widespread in the telecommunications industry, as well as in engineering, IT, financial and government enterprises. According to the results of this survey, Big Data is less popular in education and healthcare. The survey results are presented below:

Examples of using Big Data in companies

Today, Big Data is being actively implemented in foreign companies. Companies such as Nasdaq, Facebook, Google, IBM, VISA, Master Card, Bank of America, HSBC, AT&T, Coca Cola, Starbucks and Netflix are already using Big Data resources.

The applications of the processed information are varied and vary depending on the industry and the tasks that need to be performed.
Next, examples of the application of Big Data technologies in practice will be presented.

HSBC uses Big Data technologies to combat fraudulent transactions with plastic cards. With the help of Big Data, the company increased the efficiency of the security service by 3 times, and the recognition of fraudulent incidents by 10 times. The economic effect from the introduction of these technologies exceeded $10 million.

Antifraud* VISA allows you to automatically identify fraudulent transactions; the system currently helps prevent fraudulent payments amounting to $2 billion annually.

Watson supercomputer IBM analyzes in real time the flow of data on monetary transactions. According to IBM, Watson increased the number of fraudulent transactions detected by 15%, reduced false positives by 50% and increased the amount of money protected from transactions of this nature by 60%.

Procter & Gamble using Big Data to design new products and create global marketing campaigns. P&G has created dedicated Business Spheres offices where information can be viewed in real time.
Thus, the company’s management had the opportunity to instantly test hypotheses and conduct experiments. P&G believes that Big Data helps in forecasting company performance.

Office supplies retailer OfficeMax Using Big Data technologies, they analyze customer behavior. Big Data analysis made it possible to increase B2B revenue by 13% and reduce costs by $400,000 per year.

According to Caterpillar , its distributors miss out on $9 to $18 billion in profits each year simply because they do not implement Big Data processing technologies. Big Data would allow customers to manage their fleet more efficiently by analyzing information coming from sensors installed on the machines.

Today it is already possible to analyze the condition of key components, their degree of wear, and manage fuel and maintenance costs.

Luxottica group is a manufacturer of sports glasses, such brands as Ray-Ban, Persol and Oakley. The company uses Big Data technologies to analyze the behavior of potential customers and “smart” SMS marketing. As a result of Big Data, Luxottica group identified more than 100 million of its most valuable customers and increased the effectiveness of its marketing campaign by 10%.

With the help of Yandex Data Factory, the game developers World of Tanks analyze the behavior of players. Big Data technologies made it possible to analyze the behavior of 100 thousand World of Tanks players using more than 100 parameters (information about purchases, games, experience, etc.). As a result of the analysis, a forecast of user outflow was obtained. This information allows you to reduce user departure and work with game participants in a targeted manner. The developed model turned out to be 20-30% more effective than standard gaming industry analysis tools.

German Ministry of Labor uses Big Data in work related to the analysis of incoming applications for unemployment benefits. So, after analyzing the information, it became clear that 20% of benefits were paid undeservedly. With the help of Big Data, the Ministry of Labor reduced costs by 10 billion euros.

Toronto Children's Hospital implemented the Project Artemis project. This is an information system that collects and analyzes data on babies in real time. The system monitors 1260 indicators of each child’s condition every second. Project Artemis makes it possible to predict the unstable condition of a child and begin the prevention of diseases in children.

OVERVIEW OF THE WORLD BIG DATA MARKET

Current state of the world market

In 2014, Big Data, according to the Data Collective, became one of the priority investment areas in the venture industry. According to the Computerra information portal, this is due to the fact that developments in this area have begun to bring significant results for their users. Over the past year, the number of companies with implemented projects in the field of big data management increased by 125%, and the market volume grew by 45% compared to 2013.

The majority of Big Data market revenue, according to Wikibon, in 2014 was made up of services, their share was equal to 40% of total revenue (see chart below):

If we consider Big Data for 2014 by subtype, the market will look like this:

According to Wikibon, applications and analytics accounted for 36% of Big Data revenue in 2014 from Big Data applications and analytics, 17% from computing equipment and 15% from data storage technologies. The least amount of revenue was generated by NoSQL technologies, infrastructure equipment and network provision for companies (corporate networks).

The most popular Big Data technologies are the in-memory platforms of SAP, HANA, Oracle, etc. The results of the T-Systems survey showed that they were chosen by 30% of the companies surveyed. The second most popular were NoSQL platforms (18% of users), companies also used analytical platforms from Splunk and Dell, they were chosen by 15% of companies. According to the survey results, Hadoop/MapReduce products turned out to be the least useful for solving Big Data problems.

According to an Accenture survey, in more than 50% of companies using Big Data technologies, Big Data costs range from 21% to 30%.
According to the following Accenture analysis, 76% of companies believe that these costs will increase in 2015, and 24% of companies will not change their budget for Big Data technologies. This suggests that in these companies Big Data has become an established area of IT, which has become an integral part of the company’s development.

The results of the Economist Intelligence Unit survey confirm the positive effect of implementing Big Data. 46% of companies say that using Big Data technologies they have improved customer service by more than 10%, 33% of companies have optimized inventory and improved the productivity of fixed assets, and 32% of companies have improved planning processes.

Big Data in different countries of the world

Today, Big Data technologies are most often implemented in US companies, but other countries around the world have already begun to show interest. In 2014, according to IDC, countries in Europe, the Middle East, Asia (excluding Japan) and Africa accounted for 45% of the market for software, services and equipment in the field of Big Data.

Also, according to the CIO survey, companies from the Asia-Pacific region are rapidly adopting new solutions in the field of Big Data analysis, secure storage and cloud technologies. Latin America is in second place in terms of the number of investments in the development of Big Data technologies, ahead of European countries and the USA.
Next, a description and forecasts for the development of the Big Data market in several countries will be presented.

China
The volume of information in China is 909 exabytes, which is equal to 10% of the total volume of information in the world, by 2020 the volume of information will reach 8060 exabytes, the share of information in global statistics will also increase, in 5 years it will be equal to 18%. The potential growth of China's Big Data has one of the fastest growing dynamics.

Brazil
At the end of 2014, Brazil accumulated information worth 212 exabytes, which is 3% of the global volume. By 2020, the volume of information will grow to 1600 exabytes, which will account for 4% of the world's information.

India
According to EMC, the volume of accumulated data in India at the end of 2014 is 326 exabytes, which is 5% of the total volume of information. By 2020, the volume of information will grow to 2800 exabytes, which will account for 6% of the world's information.

Japan
The volume of accumulated data in Japan at the end of 2014 is 495 exabytes, which is 8% of the total volume of information. By 2020, the volume of information will grow to 2,200 exabytes, but Japan’s market share will decrease and amount to 5% of the total volume of information in the whole world.
Thus, the Japanese market size will decrease by more than 30%.

Germany
According to EMC, the volume of accumulated data in Germany at the end of 2014 is 230 exabytes, which is 4% of the total volume of information in the world. By 2020, the volume of information will grow to 1100 exabytes and amount to 2%.
In the German market, a large share of revenue, according to Experton Group forecasts, will be generated by the services segment, the share of which in 2015 will be 54%, and in 2019 will increase to 59%; the shares of software and hardware, on the contrary, will decrease.

Overall, the market size will grow from 1.345 billion euros in 2015 to 3.198 billion euros in 2019, an average growth rate of 24%.
Thus, based on the analytics of CIO and EMC, we can conclude that the developing countries of the world in the coming years will become markets for the active development of Big Data technologies.

Main market trends

According to IDG Enterprise, in 2015, companies' spending on Big Data will average $7.4 million per company. large companies intend to spend approximately US$13.8 million, small and medium-sized – US$1.6 million.
Most of the investment will be in areas such as data analysis, visualization and data collection.
Based on current trends and market demand, investments in 2015 will be used to improve data quality, improve planning and forecasting, and increase data processing speed.
Companies in the financial sector, according to Bain Company’s Insights Analysis, will make significant investments, so in 2015 they plan to spend $6.4 billion on Big Data technologies, the average growth rate of investments will be 22% until 2020. Internet companies plan to spend $2.8 billion, an average growth rate of 26% for Big Data spending.
When conducting the Economist Intelligence Unit survey, priority areas for Big Data development in 2014 and in the next 3 years were identified, the distribution of responses is as follows:

According to IDC forecasts, market development trends are as follows:

In the next 5 years, costs for cloud solutions in the field of Big Data technologies will grow 3 times faster than costs for local solutions. Hybrid platforms for data storage will become in demand.
The growth of applications using sophisticated and predictive analytics, including machine learning, will accelerate in 2015, with the market for such applications growing 65% faster than applications that do not use predictive analytics.
Media analytics will triple in 2015 and will become a key driver of growth in the Big Data technology market.
The trend of introducing solutions for analyzing the constant flow of information that is applicable to the Internet of Things will accelerate.
By 2018, 50% of users will interact with services based on cognitive computing.

Market Drivers and Limiters

IDC experts identified 3 drivers of the Big Data market in 2015:

According to an Accenture survey, data security issues are now the main barrier to the implementation of Big Data technologies, with more than 51% of respondents confirming that they are worried about ensuring data protection and confidentiality. 47% of companies reported the impossibility of implementing Big Data due to limited budgets, 41% of companies indicated a lack of qualified personnel as a problem.

Wikibon predicts that the Big Data market will grow to $38.4 billion in 2015, up 36% year-on-year. In the coming years, there will be a decline in growth rates to 10% in 2017. Taking into account these forecasts, the market size in 2020 will be equal to 68.7 billion US dollars.

The distribution of the global Big Data market by business category will look like this:

As can be seen from the diagram, the majority of the market will be occupied by technologies in the field of improving customer service. Targeted marketing will be the second priority for companies until 2019; in 2020, according to Heavy Reading, it will give way to solutions to improve operational efficiency.
The segment “improving customer service” will also have the highest growth rate, with an increase of 49% annually.
The market forecast for Big Data subtypes will look like this:

The predominant market share, as can be seen from the diagram, is occupied by professional services, the highest growth rate will be for applications with analytics, their share will increase from the current 12% to 18% in 2020 and the volume of this segment will be equal to 12.3 billion US dollars. the share of computing equipment, on the contrary, will fall from 20% to 14% and amount to about 9.3 billion US dollars in 2020, the market for cloud technologies will gradually increase and in 2020 will reach 6.3 billion US dollars, the market share of solutions for data storage, on the contrary, will decrease from 15% in 2014 to 13% in 2020 and in monetary terms will be equal to 8.9 billion US dollars.
According to Bain & Company’s Insights Analysis forecast, the distribution of the Big Data market by industry in 2020 will be as follows:

The financial industry will spend $6.4 billion on Big Data with an average growth rate of 22% per year;
Internet companies will spend $2.8 billion and the average cost growth rate will be 26% over the next 5 years;
Public sector costs will be commensurate with the costs of Internet companies, but the growth rate will be lower - 22%;
The telecommunications sector will grow at a CAGR of 40% to reach US$1.2 billion in 2020;

Energy companies will invest a relatively small amount in these technologies - $800 million, but the growth rate will be one of the highest - 54% annually.
Thus, the largest share of the Big Data market in 2020 will be taken by companies in the financial industry, and the fastest growing sector will be energy.
Following analysts' forecasts, the total market size will increase in the coming years. Market growth will be achieved through the implementation of Big Data technologies in developing countries of the world, as can be seen from the graph below.

The projected market size will depend on how developing countries perceive Big Data technologies and whether they will be as popular as in developed countries. In 2014, developing countries of the world accounted for 40% of the volume of accumulated information. According to EMC's forecast, the current market structure, with a predominance of developed countries, will change as early as 2017. According to EMC analytics, in 2020 the share of developing countries will be more than 60%.
According to Cisco and EMC, developing countries around the world will work quite actively with Big Data, largely due to the availability of technology and the accumulation of a sufficient amount of information to the Big Data level. On the world map shown in next page, the forecast for the increase in volume and growth rate of Big Data by region will be shown.

ANALYSIS OF THE RUSSIAN MARKET

Current Status Russian market

According to the results of a study by CNews Analytics and Oracle, the level of maturity of the Russian Big Data market has increased over the past year. Respondents, representing 108 large enterprises from various industries, demonstrated a higher degree of awareness of these technologies, as well as an established understanding of the potential of such solutions for their business.
As of 2014, according to IDC, Russia has accumulated 155 exabytes of information, which is only 1.8% of the world's data. The volume of information by 2020 will reach 980 exabytes and occupy 2.2%. Thus, the average growth rate of information volume will be 36% per year.
IDC estimates the Russian market at $340 million, of which $100 million are SAP solutions, approximately $240 million are similar solutions from Oracle, IBM, SAS, Microsoft, etc.
The growth rate of the Russian Big Data market is no less than 50% per year.
It is predicted that positive dynamics will continue in this sector of the Russian IT market, even in conditions of general economic stagnation. This is due to the fact that businesses continue to demand solutions that improve operational efficiency, as well as optimize costs, improve forecasting accuracy and minimize possible risks companies.
The main service providers in the field of Big Data on the Russian market are:

Oracle
Microsoft
Cloudera
Hortonworks
Teradata.

Market overview by industry and experience in using Big Data in companies

According to CNews, in Russia only 10% of companies have begun to use Big Data technologies, when in the world the share of such companies is about 30%. Readiness for Big Data projects is growing in many sectors of the Russian economy, according to a report from CNews Analytics and Oracle. More than a third of the surveyed companies (37%) have started working with Big Data technologies, of which 20% are already using such solutions, and 17% are starting to experiment with them. The second third of respondents in present moment are considering this possibility.

In Russia, Big Data technologies are most popular in banking and telecoms, but they are also in demand in the mining industry, energy, retail, logistics companies and the public sector.
Next, examples of the use of Big Data in Russian realities will be considered.

Telecom
Telecom operators have some of the most voluminous databases, which allows them to conduct the most in-depth analysis of accumulated information.
One of the areas of application of Big Data technology is subscriber loyalty management.
The main purpose of data analysis is to retain existing customers and attract new ones. To do this, companies segment customers, analyze their traffic, and determine the social affiliation of the subscriber. In addition to using information for marketing purposes, telecom technologies are used to prevent fraudulent financial transactions.
One of the striking examples of this industry is VimpelCom. The company uses Big Data to improve the quality of service at the level of each subscriber, compile reports, analyze data for network development, combat spam and personalize services.

Banks
A significant proportion of Big Data users are specialists from the financial industry. One of the successful experiments was carried out at the Ural Bank for Reconstruction and Development, where the information base began to be used to analyze clients, the bank began to offer specialized loan offers, deposits and other services. Within a year of using these technologies, the company's retail loan portfolio grew by 55%.
Alfa-Bank analyzes information from social networks, processes loan applications, and analyzes the behavior of users of the company’s website.
Sberbank also began processing a massive amount of data to segment clients, prevent fraudulent activities, cross-sell, and manage risks. In the future, it is planned to improve the service and analyze customer actions in real time.
The All-Russian Regional Development Bank analyzes the behavior of plastic card holders. This makes it possible to identify transactions that are atypical for a particular client, thereby increasing the likelihood of detecting theft of funds from plastic cards.

Retail
In Russia, Big Data technologies have been implemented by both online and offline trading companies. Today, according to CNews Analytics, Big Data is used by 20% of retailers. 75% of retail professionals consider Big Data necessary for the development of a competitive company promotion strategy. According to Hadoop statistics, after the implementation of Big Data technology, profits in trading organizations increase by 7-10%.
M.Video specialists talk about improved logistics planning after the implementation of SAP HANA; also, as a result of its implementation, the preparation of annual reports was reduced from 10 days to 3, the speed of daily data loading was reduced from 3 hours to 30 minutes.
Wikimart uses these technologies to generate recommendations for site visitors.
One of the first offline stores to introduce Big Data analysis in Russia was Lenta. With the help of Big Data, retail began to study information about customers from cash register receipts. The retailer collects information to create behavioral models, which makes it possible to make more informed decisions at the operational and commercial level.

Oil and gas industry
In this industry, the scope of Big Data is quite wide. Big Data technologies can be used in the extraction of minerals from the subsoil. With their help, you can analyze the extraction process itself and the most effective ways to extract it, monitor the drilling process, analyze the quality of raw materials, as well as the processing and marketing of the final product. In Russia, Transneft and Rosneft have already begun to use these technologies.

Government bodies
In countries such as Germany, Australia, Spain, Japan, Brazil and Pakistan, Big Data technologies are used to solve national issues. These technologies help government authorities more effectively provide services to the population and provide targeted social support.
In Russia, these technologies began to be mastered by such government agencies as Pension Fund, Federal Tax Service and Compulsory Health Insurance Fund. The potential for implementing projects using Big Data is great; these technologies could help improve the quality of services, and, as a result, the standard of living of the population.

Logistics and transport
Big Data can also be used by transport companies. Using Big Data technologies, you can track your car fleet, take into account fuel costs, and monitor customer requests.
Russian Railways implemented Big Data technologies together with SAP. These technologies helped reduce the reporting preparation time by 43.5 times (from 14.5 hours to 20 minutes), and increase the accuracy of cost distribution by 40 times. Big Data was also introduced into planning and tariff regulation processes. In total, the companies use more than 300 systems based on SAP solutions, 4 data centers are involved, and the number of users is 220,000.

Main drivers and limiters of the market

The drivers for the development of Big Data technologies in the Russian market are:

Increased interest on the part of users in the capabilities of Big Data as a way to increase the competitiveness of a company;
Development of methods for processing media files at a global level;
Transfer of servers processing personal information to the territory of Russia, in accordance with the adopted law on the storage and processing of personal data;
Implementation of the industry plan for import substitution of software. This plan includes government support for domestic software manufacturers, as well as the provision of preferences for domestic IT products when purchasing at public expense.
In the new economic situation, when the dollar exchange rate has almost doubled, there will be a trend towards an increasing use of the services of Russian cloud service providers rather than foreign ones.
Creation of technology parks that contribute to the development of the information technology market, including the Big Data market;
State program for the implementation of grid systems based on Big Data technologies.

The main barriers to the development of Big Data in the Russian market are:

Ensuring data security and confidentiality;
Lack of qualified personnel;
Insufficient accumulated information resources to the Big Data level in most Russian companies;
Difficulties in introducing new technologies into established information systems of companies;
The high cost of Big Data technologies, which leads to a limited number of enterprises that have the opportunity to implement these technologies;
Political and economic uncertainty, which led to the outflow of capital and the freezing of investment projects in Russia;
Rising prices for imported products and a surge in inflation, according to IDC, are slowing down the development of the entire IT market.

Russian market forecast

As of today, the Russian Big Data market is not as popular as in developed countries. Most Russian companies show interest in it, but do not dare to take advantage of their opportunities.
Examples of large companies that have already benefited from the use of Big Data technologies are increasing awareness of the capabilities of these technologies.
Analysts also have quite optimistic forecasts regarding the Russian market. IDC believes that the Russian market share will increase over the next 5 years, unlike the German and Japanese markets.
By 2020, the volume of Big Data in Russia will grow from the current 1.8% to 2.2% of the global data volume. The amount of information will grow, according to EMC, from the current 155 exabytes to 980 exabytes in 2020.
At the moment, Russia continues to accumulate the volume of information to the level of Big Data.
According to a CNews Analytics survey, 44% of surveyed companies work with data of no more than 100 terabytes* and only 13% work with volumes above 500 terabytes.

Nevertheless, the Russian market, following global trends, will increase. As of 2014, IDC estimates the market size at $340 million.
The market growth rate in previous years was 50% per year; if it remains at the same level, then in 2018 the market volume will reach $1.7 billion. The share of the Russian market in the world market will be about 3%, increasing from the current 1.2%.

The most receptive industries to the use of Big Data in Russia include:

Retail and banks, for them, analysis of the customer base and assessment of the effect of marketing campaigns are primarily important;
Telecom – customer base segmentation and traffic monetization;
Public sector – reporting, analysis of applications from the public, etc.;
Oil companies – monitoring of work and planning of production and sales;
Energy companies – creation of intelligent electric power systems, operational monitoring and forecasting.

In developed countries, Big Data has become widespread in the fields of healthcare, insurance, metallurgy, Internet companies and manufacturing enterprises; most likely, in the near future, Russian companies from these areas will also appreciate the effect of introducing Big Data and will adapt these technologies in their industries.
In Russia, as well as in the world, in the near future there will be a trend towards data visualization, analysis of media files and the development of the Internet of things.
Despite the general stagnation of the economy, in the coming years, analysts predict further growth of the Big Data market, primarily due to the fact that the use of Big Data technologies gives its users a competitive advantage in terms of increasing the operational efficiency of the business, attracting additional flow of customers, minimizing risks and implementation of data forecasting technologies.
Thus, we can conclude that the Big Data segment in Russia is at the formation stage, but the demand for these technologies is increasing every year.

Main results of the market analysis

World market

At the end of 2014, the Big Data market is characterized by the following parameters:

market volume amounted to 28.5 billion US dollars, an increase of 45% compared to the previous year;
the majority of Big Data market revenue came from services, their share was equal to 40% of total revenue;
36% of revenue came from Big Data applications and analytics, 17% from computing equipment and 15% from data storage technologies;
The most popular for solving Big Data problems are in-memory platforms from companies such as SAP, HANA and Oracle.
the number of companies with implemented projects in the field of Big Data management increased by 125%;

The market forecast for the next years is as follows:

in 2015 the market volume will reach 38.4 billion US dollars, in 2020 – 68.7 billion US dollars;
the average growth rate will be 16% annually;
average company costs for Big Data technologies will be $13.8 million for large companies and $1.6 million for small and medium-sized businesses;
technologies will be most widespread in the areas of customer service and targeted marketing;
In 2017, the global market structure will change towards the predominance of user companies from developing countries.

Russian market

The Russian Big Data market is at the stage of formation, the results of 2014 are as follows:

market volume reached USD 340 million;
the average market growth rate in previous years was 50% annually;
the total volume of accumulated information was 155 exabytes;
10% of Russian companies began to use Big Data technologies;
Big Data technologies were more popular in the banking sector, telecoms, Internet companies and retail.

The Russian market forecast for the coming years is as follows:

the volume of the Russian market in 2015 will reach 500 million US dollars, and in 2018 – 1.7 billion US dollars;
the share of the Russian market in the global market will be about 3% in 2018;
the amount of accumulated data in 2020 will be 980 exabytes;
data volume will grow to 2.2% of global data volume in 2020;
Technologies of data visualization, media file analysis and the Internet of things will become most popular.

Based on the results of the analysis, we can conclude that the Big Data market is still in the early stages of development, and in the near future we will see its growth and expansion of the capabilities of these technologies.

Thank you for taking the time to read this voluminous work, subscribe to our blog - we promise many new interesting publications!

Column by HSE teachers about myths and cases of working with big data

Bookmarks

Teachers at the School of New Media at the National Research University Higher School of Economics Konstantin Romanov and Alexander Pyatigorsky, who is also the director of digital transformation at Beeline, wrote a column for the site about the main misconceptions about big data - examples of using the technology and tools. The authors suggest that the publication will help company managers understand this concept.

Myths and misconceptions about Big Data

Big Data is not marketing

The term Big Data has become very fashionable - it is used in millions of situations and with hundreds of different interpretations, often not related to what it is. Concepts are often substituted in people’s heads, and Big Data is confused with a marketing product. Moreover, in some companies Big Data is part of the marketing department. The result of big data analysis can indeed be a source for marketing activity, but nothing more. Let's see how it works.

If we identified a list of those who bought goods worth more than three thousand rubles in our store two months ago, and then sent these users some kind of offer, then this is typical marketing. We derive a clear pattern from the structural data and use it to increase sales.

However, if we combine CRM data with streaming information from, for example, Instagram, and analyze it, we will find a pattern: the person who reduced his activity on Wednesday evening and on whose last photo Kittens are depicted, a certain proposal should be made. This will already be Big Data. We found a trigger, passed it on to marketers, and they used it for their own purposes.

It follows from this that technology usually works with unstructured data, and even if the data is structured, the system still continues to look for hidden patterns in it, which marketing does not do.

Big Data is not IT

The second extreme of this story: Big Data is often confused with IT. This is due to the fact that in Russian companies, as a rule, IT specialists are the drivers of all technologies, including big data. Therefore, if everything happens in this department, the company as a whole gets the impression that this is some kind of IT activity.

In fact, there is a fundamental difference here: Big Data is an activity aimed at obtaining a specific product, which is not at all related to IT, although technology cannot exist without it.

Big Data is not always the collection and analysis of information

There is another misconception about Big Data. Everyone understands that this technology involves large amounts of data, but what kind of data is meant is not always clear. Anyone can collect and use information; now this is possible not only in films about, but also in any, even a very small company. The only question is what exactly to collect and how to use it to your advantage.

But it should be understood that Big Data technology will not be the collection and analysis of absolutely any information. For example, if you collect data about a specific person on social networks, it will not be Big Data.

What is Big Data really?

Big Data consists of three elements:

data;
analytics;
technologies.

Big Data is not just one of these components, but a combination of all three elements. People often substitute concepts: some believe that Big Data is just data, others believe that it is technology. But in fact, no matter how much data you collect, you can't do anything with it without necessary technologies and analysts. If there is good analytics, but no data, it’s even worse.

If we talk about data, this is not only texts, but also all the photos posted on Instagram, and in general everything that can be analyzed and used for different purposes and tasks. In other words, Data refers to huge volumes of internal and external data of various structures.

Analytics is also needed, because the task of Big Data is to build some patterns. That is, analytics is the identification of hidden dependencies and the search for new questions and answers based on the analysis of the entire volume of heterogeneous data. Moreover, Big Data poses questions that cannot be directly derived from this data.

When it comes to images, the fact that you post a photo of yourself wearing a blue T-shirt doesn't mean anything. But if you use photography for Big Data modeling, it may turn out that right now you should offer a loan, because in your social group such behavior indicates a certain phenomenon in action. Therefore, “bare” data without analytics, without identifying hidden and non-obvious dependencies is not Big Data.

So we have big data. Their array is huge. We also have an analyst. But how can we make sure that from this raw data we come up with a specific solution? To do this, we need technologies that allow us not only to store them (and this was impossible before), but also to analyze them.

Simply put, if you have a lot of data, you will need technologies, for example, Hadoop, which make it possible to store all the information in its original form for later analysis. This kind of technology arose in Internet giants, since they were the first to face the problem of storing a large amount of data and analyzing it for subsequent monetization.

In addition to tools for optimized and cheap data storage, you need analytical tools, as well as add-ons to the platform used. For example, a whole ecosystem of related projects and technologies has already formed around Hadoop. Here are some of them:

Pig is a declarative data analysis language.
Hive - data analysis using a language similar to SQL.
Oozie - Hadoop workflow.
Hbase is a database (non-relational), similar to Google Big Table.
Mahout - machine learning.
Sqoop - transferring data from RSDB to Hadoop and vice versa.
Flume - transferring logs to HDFS.
Zookeeper, MRUnit, Avro, Giraph, Ambari, Cassandra, HCatalog, Fuse-DFS and so on.

All of these tools are available to everyone for free, but there are also a number of paid add-ons.

In addition, specialists are needed: a developer and an analyst (the so-called Data Scientist). A manager is also needed who can understand how to apply this analytics to solve a specific problem, because in itself it is completely meaningless if it is not integrated into business processes.

All three employees must work as a team. A manager who gives a specialist Data Science task to find a certain pattern, he must understand that he will not always find exactly what he needs. In this case, the manager should listen carefully to what the Data Scientist found, since often his findings turn out to be more interesting and useful for the business. Your job is to apply this to a business and make a product out of it.

Despite the fact that now there are many different kinds of machines and technologies, the final decision always remains with the person. To do this, the information needs to be visualized somehow. There are quite a lot of tools for this.

The most telling example is geoanalytical reports. The Beeline company works a lot with the governments of different cities and regions. Very often, these organizations order reports like “Traffic congestion in a certain location.”

It is clear that such a report should reach government agencies in a simple and understandable form. If we provide them with a huge and completely incomprehensible table (that is, information in the form in which we receive it), they are unlikely to buy such a report - it will be completely useless, they will not get from it the knowledge that they wanted to receive.

Therefore, no matter how good the data scientists are and no matter what patterns they find, you will not be able to work with this data without good visualization tools.

Data sources

The array of data obtained is very large, so it can be divided into several groups.

Internal company data

Although 80% of the data collected belongs to this group, this source is not always used. Often this is data that seemingly no one needs at all, for example, logs. But if you look at them from a different angle, you can sometimes find unexpected patterns in them.

Shareware sources

This includes data from social networks, the Internet and everything that can be accessed for free. Why is it shareware free? On the one hand, this data is available to everyone, but if you are a large company, then obtaining it in the size of a subscriber base of tens of thousands, hundreds or millions of customers is no longer an easy task. Therefore, there are paid services to provide this data.

Paid sources

This includes companies that sell data for money. These may be telecoms, DMPs, Internet companies, credit bureaus and aggregators. In Russia, telecoms do not sell data. Firstly, it is economically unprofitable, and secondly, it is prohibited by law. Therefore, they sell the results of their processing, for example, geoanalytical reports.

Open data

The state is accommodating to businesses and gives them the opportunity to use the data they collect. This is developed to a greater extent in the West, but Russia in this regard also keeps up with the times. For example, there is an Open Data Portal of the Moscow Government, where information on various urban infrastructure facilities is published.

For residents and guests of Moscow, the data is presented in tabular and cartographic form, and for developers - in special machine-readable formats. While the project is working in a limited mode, it is developing, which means it is also a source of data that you can use for your business tasks.

Research

As already noted, the task of Big Data is to find a pattern. Often, research conducted around the world can become a fulcrum for finding a particular pattern - you can get a specific result and try to apply similar logic for your own purposes.

Big Data is an area in which not all the laws of mathematics apply. For example, “1” + “1” is not “2”, but much more, because by mixing data sources the effect can be significantly enhanced.

Product examples

Many people are familiar with the music selection service Spotify. It’s great because it doesn’t ask users what their mood is today, but rather calculates it based on the sources available to it. He always knows what you need now - jazz or hard rock. This is the key difference that provides it with fans and distinguishes it from other services.

Such products are usually called sense products - those that feel their client.

Big Data technology is also used in the automotive industry. For example, Tesla does this - their latest model has autopilot. The company strives to create a car that itself will take the passenger where he needs to go. Without Big Data, this is impossible, because if we use only the data that we receive directly, as a person does, then the car will not be able to improve.

When we drive a car ourselves, we use our neurons to make decisions based on many factors that we don’t even notice. For example, we may not realize why we decided not to immediately accelerate at a green light, but then it turns out that the decision was correct - a car rushed past you at breakneck speed, and you avoided an accident.

You can also give an example of using Big Data in sports. In 2002, the general manager of the Oakland Athletics baseball team, Billy Beane, decided to break the paradigm of how to recruit athletes - he selected and trained players “by the numbers.”

Usually managers look at the success of players, but in this case everything was different - in order to get results, the manager studied what combinations of athletes he needed, paying attention to individual characteristics. Moreover, he chose athletes who in themselves did not have much potential, but the team as a whole turned out to be so successful that they won twenty matches in a row.

Director Bennett Miller subsequently made a film dedicated to this story - “The Man Who Changed Everything” starring Brad Pitt.

Big Data technology is also useful in the financial sector. Not a single person in the world can independently and accurately determine whether it is worth giving someone a loan. In order to make a decision, scoring is performed, that is, a probabilistic model is built, from which one can understand whether this person will return the money or not. Further, scoring is applied at all stages: you can, for example, calculate that at a certain moment a person will stop paying.

Big data allows you not only to make money, but also to save it. In particular, this technology helped the German Ministry of Labor to reduce the cost of unemployment benefits by 10 billion euros, since after analyzing the information it became clear that 20% of benefits were paid undeservedly.

Technologies are also used in medicine (this is especially typical for Israel). With the help of Big Data, you can perform a much more accurate analysis than a doctor with thirty years of experience can do.

Any doctor, when making a diagnosis, relies only on his own own experience. When the machine does this, it comes from the experience of thousands of such doctors and all the existing case histories. It takes into account what material the patient’s house is made of, what area the victim lives in, what kind of smoke there is, and so on. That is, it takes into account a lot of factors that doctors do not take into account.

An example of the use of Big Data in healthcare is the Project Artemis project, which was implemented by the Toronto Children's Hospital. This is an information system that collects and analyzes data on babies in real time. The machine allows you to analyze 1260 health indicators of each child every second. This project is aimed at predicting the unstable condition of a child and preventing diseases in children.

Big data is also starting to be used in Russia: for example, Yandex has a big data division. The company, together with AstraZeneca and the Russian Society of Clinical Oncology RUSSCO, launched the RAY platform, intended for geneticists and molecular biologists. The project allows us to improve methods for diagnosing cancer and identifying predisposition to cancer. The platform will launch in December 2016.

The constant acceleration of data growth is an integral element of modern realities. Social networks, mobile devices, data from measuring devices, business information is just a few types of sources that can generate gigantic amounts of data.

Currently, the term Big Data has become quite common. Not everyone is still aware of how quickly and deeply technologies for processing large amounts of data are changing the most diverse aspects of society. Changes are taking place in various areas, giving rise to new problems and challenges, including in the field of information security, where its most important aspects such as confidentiality, integrity, availability, etc. should be in the foreground.

Unfortunately, many modern companies are resorting to Big Data technology without creating the proper infrastructure to securely store the vast amounts of data they collect and store. On the other hand, blockchain technology is currently rapidly developing, which is designed to solve this and many other problems.

What is Big Data?

In fact, the definition of the term is straightforward: “big data” means the management of very large volumes of data, as well as their analysis. If we look more broadly, this is information that cannot be processed by classical methods due to its large volumes.

The term Big Data itself appeared relatively recently. According to Google Trends, the active growth in popularity of the term occurred at the end of 2011:

In 2010, the first products and solutions directly related to big data processing began to appear. By 2011, most of the largest IT companies, including IBM, Oracle, Microsoft and Hewlett-Packard, are actively using the term Big Data in their business strategies. Gradually, information technology market analysts are beginning active research into this concept.

Currently, this term has gained significant popularity and is actively used in a variety of fields. However, it cannot be said with certainty that Big Data is some kind of fundamentally new phenomenon - on the contrary, big data sources have existed for many years. In marketing, these include databases of customer purchases, credit histories, lifestyles, and so on. Over the years, analysts have used this data to help companies predict future customer needs, assess risks, shape consumer preferences, and more.

Currently, the situation has changed in two aspects:

— more sophisticated tools and methods for analysis and comparison have appeared different sets data;
— analysis tools have been supplemented with many new data sources, due to the widespread transition to digital technologies, as well as new methods of data collection and measurement.

Researchers predict that Big Data technologies will be most actively used in manufacturing, healthcare, trade, government administration and in other diverse areas and industries.

Big Data is not just any specific array data, but a set of methods for processing them. The defining characteristic of big data is not only its volume, but also other categories that characterize labor-intensive data processing and analysis processes.

The initial data for processing can be, for example:

— logs of Internet user behavior;
— Internet of Things;
— social media;
— meteorological data;
— digitized books from major libraries;
— GPS signals from vehicles;
— information about transactions of bank clients;
— data on the location of mobile network subscribers;
— information about purchases in large retail chains, etc.

Over time, the volume of data and the number of its sources is constantly growing, and against this background new methods of information processing are emerging and existing ones are being improved.

Basic principles of Big Data:

— Horizontal scalability – data arrays can be huge and this means that the big data processing system must dynamically expand as their volumes increase.
— Fault tolerance – even if some equipment elements fail, the entire system must remain operational.
— Data locality. In large distributed systems, data is typically distributed across a significant number of machines. However, whenever possible and to save resources, data is often processed on the same server where it is stored.

For stable operation of all three principles and, accordingly, high efficiency of storing and processing big data, new breakthrough technologies are needed, such as, for example, blockchain.

Why do we need big data?

The scope of Big Data is constantly expanding:

— Big data can be used in medicine. Thus, a diagnosis can be made for a patient not only based on data from an analysis of the patient’s medical history, but also taking into account the experience of other doctors, information about the environmental situation of the patient’s area of residence, and many other factors.
— Big Data technologies can be used to organize the movement of unmanned vehicles.
— By processing large amounts of data, you can recognize faces in photos and videos.
— Big Data technologies can be used by retailers - trading companies can actively use data arrays from social networks to effectively configure their advertising campaigns, which can be maximally targeted to a particular consumer segment.
— This technology is actively used in organizing election campaigns, including for analyzing political preferences in society.
— The use of Big Data technologies is relevant for solutions of the income assurance (RA) class, which include tools for detecting inconsistencies and in-depth data analysis, allowing timely identification of probable losses or distortions of information that could lead to a decrease in financial results.
— Telecommunications providers can aggregate big data, including geolocation; in turn, this information may be of commercial interest to advertising agencies, which can use it to display targeted and local advertising, as well as to retailers and banks.
— Big data can play an important role in deciding to open a retail outlet in a certain location based on data about the presence of a powerful targeted flow of people.

Thus, the most obvious practical application of Big Data technology lies in the field of marketing. Thanks to the development of the Internet and the proliferation of all kinds of communication devices, behavioral data (such as the number of calls, shopping habits and purchases) is becoming available in real time.

Big data technologies can also be effectively used in finance, for sociological research and in many other areas. Experts say that all these big data opportunities are just visible part iceberg, since these technologies are used in much larger volumes in intelligence and counterintelligence, in military affairs, as well as in everything that is commonly called information wars.

In general terms, the sequence of working with Big Data consists of collecting data, structuring the information received using reports and dashboards, and then formulating recommendations for action.

Let's briefly consider the possibilities of using Big Data technologies in marketing. As you know, for a marketer, information is the main tool for forecasting and strategy development. Big data analysis has long been successfully used to determine the target audience, interests, demand and activity of consumers. Big data analysis, in particular, makes it possible to display advertising (based on the RTB auction model - Real Time Bidding) only to those consumers who are interested in a product or service.

The use of Big Data in marketing allows businessmen to:

— better recognize your consumers, attract a similar audience on the Internet;
— assess the degree of customer satisfaction;
— understand whether the proposed service meets expectations and needs;
— find and implement new ways to increase customer trust;
— create projects that are in demand, etc.

For example, the Google.trends service can indicate to a marketer a forecast of seasonal demand activity for a specific product, fluctuations and geography of clicks. If you compare this information with the statistical data collected by the corresponding plugin on your own website, you can draw up a plan for the distribution of the advertising budget, indicating the month, region and other parameters.

According to many researchers, the success of the Trump election campaign lies in the segmentation and use of Big Data. The team of the future US President was able to correctly divide the audience, understand its desires and show exactly the message that voters want to see and hear. Thus, according to Irina Belysheva from the Data-Centric Alliance, Trump’s victory was largely possible thanks to a non-standard approach to Internet marketing, which was based on Big Data, psychological and behavioral analysis and personalized advertising.

Trump's political strategists and marketers used a specially developed mathematical model, which made it possible to deeply analyze the data of all US voters and systematize them, making ultra-precise targeting not only by geographic characteristics, but also by the intentions, interests of voters, their psychotype, behavioral characteristics, etc. After To achieve this, marketers organized personalized communication with each group of citizens based on their needs, moods, political views, psychological characteristics and even skin color, using their own message for almost every individual voter.

As for Hillary Clinton, in her campaign she used “time-tested” methods based on sociological data and standard marketing, dividing the electorate only into formally homogeneous groups (men, women, African Americans, Latin Americans, poor, rich, etc.) .

As a result, the winner was the one who appreciated the potential of new technologies and methods of analysis. It is noteworthy that Hillary Clinton's campaign expenses were twice as much as her opponent's:

Data: Pew Research

Main problems of using Big Data

In addition to the high cost, one of the main factors hindering the implementation of Big Data in various areas is the problem of choosing the data to be processed: that is, determining which data needs to be retrieved, stored and analyzed, and which should not be taken into account.

Another problem with Big Data is ethical. In other words, a logical question arises: can such data collection (especially without the user’s knowledge) be considered a violation of privacy?

It's no secret that information stored in Google and Yandex search engines allows IT giants to constantly improve their services, make them user-friendly and create new ones. interactive applications. To do this, search engines collect user data about user activity on the Internet, IP addresses, geolocation data, interests and online purchases, personal data, email messages, etc. All this allows them to display contextual advertising in accordance with the user’s behavior on the Internet. In this case, users’ consent is usually not asked for this, and the opportunity to choose what information about themselves to provide is not given. That is, by default, everything is collected in Big Data, which will then be stored on the sites’ data servers.

This leads to the next important problem regarding the security of data storage and use. For example, is a particular analytical platform to which consumers automatically transfer their data secure? In addition, many business representatives note a shortage of highly qualified analysts and marketers who can effectively handle large volumes of data and solve specific business problems with their help.

Despite all the difficulties with the implementation of Big Data, the business intends to increase investments in this area. According to Gartner research, the leaders in industries investing in Big Data are media, retail, telecom, banking and service companies.

Prospects for interaction between blockchain and Big Data technologies

Integration with Big Data has a synergistic effect and opens up a wide range of new opportunities for business, including allowing:

— gain access to detailed information about consumer preferences, on the basis of which you can build detailed analytical profiles for specific suppliers, products and product components;
— integrate detailed data on transactions and consumption statistics of certain groups of goods by various categories of users;
— receive detailed analytical data on supply and consumption chains, control product losses during transportation (for example, weight loss due to drying and evaporation of certain types of goods);
— counteract product counterfeiting, increase the effectiveness of the fight against money laundering and fraud, etc.

Access to detailed data on the use and consumption of goods will significantly reveal the potential of Big Data technology for optimizing key business processes, reducing regulatory risks, revealing new opportunities for monetization and creating products that will best meet current consumer preferences.

As is known, representatives of the largest financial institutions are already showing significant interest in blockchain technology, including, etc. According to Oliver Bussmann, IT manager of the Swiss financial holding UBS, blockchain technology can “reduce transaction processing time from several days to several minutes” .

The potential for analysis from the blockchain using Big Data technology is enormous. Distributed ledger technology ensures the integrity of information, as well as reliable and transparent storage of the entire transaction history. Big Data, in turn, provides new tools for effective analysis, forecasting, economic modeling and, accordingly, opens up new opportunities for making more informed management decisions.

The tandem of blockchain and Big Data can be successfully used in healthcare. As is known, imperfect and incomplete data on a patient’s health greatly increases the risk of an incorrect diagnosis and incorrectly prescribed treatment. Critical data about the health of clients of medical institutions should be maximally protected, have the properties of immutability, be verifiable and should not be subject to any manipulation.

The information in the blockchain meets all of the above requirements and can serve as high-quality and reliable source data for in-depth analysis using new Big Data technologies. In addition, with the help of blockchain, medical institutions could exchange reliable data with insurance companies, justice authorities, employers, scientific institutions and other organizations that need medical information.

Big Data and information security

In a broad sense, information security is the protection of information and supporting infrastructure from accidental or intentional negative impacts of a natural or artificial nature.

In the field of information security, Big Data faces the following challenges:

— problems of data protection and ensuring their integrity;
— the risk of outside interference and leakage of confidential information;
— improper storage of confidential information;
— the risk of information loss, for example, due to someone’s malicious actions;
— risk of misuse of personal data by third parties, etc.

One of the main big data problems that blockchain is designed to solve lies in the area of information security. By ensuring compliance with all its basic principles, distributed registry technology can guarantee the integrity and reliability of data, and due to the absence of a single point of failure, blockchain makes the operation of information systems stable. Distributed ledger technology can help solve the problem of trust in data, as well as enable universal data sharing.

Information is a valuable asset, which means that ensuring the basic aspects of information security must be at the forefront. In order to survive the competition, companies must keep up with the times, which means that they cannot ignore the potential opportunities and advantages that blockchain technology and Big Data tools contain.

Big Data– this is not only the data itself, but also the technologies for processing and using it, methods for searching for the necessary information in large arrays. The problem of big data still remains open and vital for any systems that have been accumulating a wide variety of information for decades.

This term is associated with the expression "Volume, Velocity, Variety"– the principles on which work with big data is based. This is directly amount of information, speed of its processing And variety of information, stored in an array. Lately around three basic principles started adding one more - Value, which means value of information. That is, it must be useful and necessary in theoretical or practical terms, which would justify the costs of its storage and processing.

An example of a typical source of big data is social networks - every profile or public page represents one small drop in an unstructured ocean of information. Moreover, regardless of the amount of information stored in a particular profile, interaction with each user should be as fast as possible.

Big data is continuously accumulating in almost every area of human life. This includes any industry that involves either human interaction or computing. These include social media, medicine, banking, as well as device systems that receive numerous results from daily calculations. For example, astronomical observations, meteorological information and information from Earth sensing devices.

Information from all kinds of tracking systems in real time also goes to the servers of a particular company. Television and radio broadcasting, call databases of cellular operators - the interaction of each individual person with them is minimal, but in the aggregate all this information becomes big data.

Big data technologies have become integral to research and commerce. Moreover, they are beginning to take over the sphere of public administration - and everywhere the introduction of increasingly effective systems for storing and manipulating information is required.

The term “big data” first appeared in the press in 2008, when Nature editor Clifford Lynch published an article on the development of the future of science using technologies for working with large quantities of data. Until 2009, this term was considered only from the point of view of scientific analysis, but after the publication of several more articles, the press began to widely use the concept of Big Data - and continues to use it today.

In 2010, the first attempts to solve the growing problem of big data began to appear. Software products were released, the action of which was aimed at minimizing risks when using huge amounts of information.

By 2011, large companies such as Microsoft, Oracle, EMC and IBM became interested in big data - they became the first to use Big data developments in their development strategies, and quite successfully.

Universities began studying big data as a separate subject already in 2013 - now not only data science, but also engineering, coupled with computing subjects, are dealing with problems in this area.

The main methods of data analysis and processing include the following:

Class methods or deep analysis (Data Mining).

These methods are quite numerous, but they have one thing in common: the mathematical tools used in combination with achievements from the field of information technology.

Crowdsourcing.

This technique allows you to obtain data simultaneously from several sources, and the number of the latter is practically unlimited.

A/B testing.

From the entire volume of data, a control set of elements is selected, which is alternately compared with other similar sets where one of the elements was changed. Conducting such tests helps determine which parameter fluctuations have the greatest impact on the control population. Thanks to the volume of Big Data, it is possible to carry out a huge number of iterations, with each of them getting closer to the most reliable result.

Predictive analytics.

Specialists in this field try to predict and plan in advance how the controlled object will behave in order to make the most profitable decision in this situation.

Machine learning (artificial intelligence).

It is based on empirical analysis of information and the subsequent construction of self-learning algorithms for systems.

Network analysis.

The most common method for studying social networks is that after obtaining statistical data, the nodes created in the grid are analyzed, that is, the interactions between by individual users and their communities.

In 2017, when big data ceased to be something new and unknown, its importance not only did not decrease, but increased even more. Now experts are betting that big data analysis will become available not only to giant organizations, but also to small and medium-sized businesses. This approach is planned to be implemented using the following components:

Cloud storage.

Data storage and processing are becoming faster and more economical - compared to the costs of maintaining your own data center and possible expansion of staff, renting a cloud seems to be a much cheaper alternative.

Using Dark Data.

The so-called “dark data” is all non-digitized information about the company, which does not play a key role in its direct use, but can serve as a reason for switching to a new format for storing information.

Artificial Intelligence and Deep Learning.

Machine intelligence learning technology, which imitates the structure and operation of the human brain, is ideally suited for processing large amounts of constantly changing information. In this case, the machine will do everything that a person would do, but the likelihood of error is significantly reduced.

Blockchain

This technology makes it possible to speed up and simplify numerous online transactions, including international ones. Another advantage of Blockchain is that it reduces transaction costs.

Self-service and reduced prices.

In 2017, it is planned to introduce “self-service platforms” - these are free platforms where representatives of small and medium-sized businesses can independently evaluate the data they store and systematize it.

All marketing strategies are in one way or another based on the manipulation of information and analysis of existing data. That is why the use of big data can predict and make it possible to adjust the further development of the company.

For example, an RTB auction created on the basis of big data allows you to use advertising more effectively - a certain product will be shown only to that group of users who are interested in purchasing it.

What are the benefits of using big data technologies in marketing and business?

With their help, you can create new projects much faster, which are likely to become in demand among buyers.
They help to correlate the client’s requirements with the existing or designed service and thus adjust them.
Big data methods make it possible to assess the degree of current satisfaction of all users and each individual user.
Increased customer loyalty is achieved through big data processing methods.
Attracting your target audience online becomes easier thanks to the ability to control huge amounts of data.

For example, one of the most popular services for predicting the likely popularity of a product is Google.trends. It is widely used by marketers and analysts, allowing them to obtain statistics on the past use of a given product and a forecast for the next season. This allows company managers to more effectively distribute the advertising budget and determine in which area it is best to invest money.

Examples of using Big Data

The active introduction of Big Data technologies into the market and into modern life began just after world-famous companies with clients in almost every part of the globe began to use them.

These are social giants such as Facebook and Google, IBM, as well as financial institutions such as Master Card, VISA and Bank of America.

For example, IBM applies big data techniques to ongoing monetary transactions. With their help, 15% more fraudulent transactions were identified, which made it possible to increase the amount of protected funds by 60%. Problems with false alarms of the system were also resolved - their number was reduced by more than half.

The VISA company similarly used Big Data, tracking fraudulent attempts to perform a particular operation. Thanks to this, they save more than $2 billion annually from leakage.

The German Labor Ministry managed to cut costs by 10 billion euros by introducing a big data system into its work on issuing unemployment benefits. At the same time, it was revealed that a fifth of citizens receive these benefits without reason.

Big Data has not spared the gaming industry either. Thus, the World of Tanks developers conducted a study of information about all players and compared the available indicators of their activity. This helped predict the possible future outflow of players - based on the assumptions made, representatives of the organization were able to interact more effectively with users.

Notable organizations using big data also include HSBC, Nasdaq, Coca-Cola, Starbucks and AT&T.

The biggest problem with big data is the cost of processing it. This can include both expensive equipment and wage costs for qualified specialists capable of servicing huge amounts of information. Obviously, the equipment will have to be updated regularly so that it does not lose minimum functionality as the volume of data increases.

The second problem is again related to the large amount of information that needs to be processed. If, for example, a study produces not 2-3, but a numerous number of results, it is very difficult to remain objective and select from the general flow of data only those that will have a real impact on the state of any phenomenon.

Big Data privacy problem. With most customer service services moving to online data usage, it is very easy to become the next target for cybercriminals. Even simply storing personal information without making any online transactions can be fraught with undesirable consequences for cloud storage clients.

The problem of information loss. Precautionary measures require not to be limited to a simple one-time data backup, but to do at least 2-3 backups storage facilities. However, as the volume increases, the difficulties with redundancy increase - and IT specialists are trying to find the optimal solution to this problem.

Big data technology market in Russia and the world

As of 2014, 40% of the big data market volume is made up of services. Revenue from the use of Big Data in computer equipment is slightly inferior (38%) to this indicator. The remaining 22% comes from software.

The most useful products in the global segment for solving Big Data problems, according to statistics, are In-memory and NoSQL analytical platforms. 15 and 12 percent of the market, respectively, are occupied by Log-file analytical software and Columnar platforms. But Hadoop/MapReduce in practice cope with big data problems not very effectively.

Results of implementing big data technologies:

increasing the quality of customer service;
optimization of supply chain integration;
optimization of organization planning;
acceleration of interaction with clients;
increasing the efficiency of processing customer requests;
reduction in service costs;
optimization of processing client requests.

Best books on Big Data

Suitable for initial study of big data processing technologies - it introduces you to the matter easily and clearly. Makes it clear how the abundance of information has influenced everyday life and all its spheres: science, business, medicine, etc. Contains numerous illustrations, so it is perceived without much effort.

"Introduction to Data Mining" by Pang-Ning Tan, Michael Steinbach and Vipin Kumar

Also useful for beginners is a book on Big Data, which explains working with big data according to the principle “from simple to complex.” Highlights many important initial stage points: preparation for processing, visualization, OLAP, as well as some methods of data analysis and classification.

A practical guide to using and working with big data using the Python programming language. Suitable for both engineering students and professionals who want to deepen their knowledge.

"Hadoop for Dummies", Dirk Derus, Paul S. Zikopoulos, Roman B. Melnik

Hadoop is a project created specifically for working with distributed programs that organize the execution of actions on thousands of nodes simultaneously. Getting to know it will help you understand in more detail the practical application of big data.