Technological testing process. Why testing is necessary

| Planning lessons for the school year | Main stages of modeling

Lesson 2
Main stages of modeling

After studying this topic, you will learn:

What is modeling;
- what can serve as a prototype for modeling;
- what place does modeling occupy in human activity;
- what are the main stages of modeling;
- what is a computer model;
- What is a computer experiment?

Computer experiment

To give life to new design developments, introduce new technical solutions into production, or test new ideas, an experiment is needed. An experiment is an experience that is performed with an object or model. It consists of performing certain actions and determining how the experimental sample reacts to these actions.

At school you conduct experiments in biology, chemistry, physics, and geography lessons.

Experiments are carried out when testing new product samples at enterprises. Usually, a specially created installation is used for this, which allows an experiment to be carried out in laboratory conditions, or the real product itself is subjected to all kinds of tests (full-scale experiment). To study, for example, the operational properties of any unit or component, it is placed in a thermostat, frozen in special chambers, tested on vibration stands, dropped, etc. It’s good if it’s a new watch or vacuum cleaner - the loss is not great when destroyed. What if it’s an airplane or a rocket?

Laboratory and field experiments require large material costs and time, but their significance is nevertheless very great.

With development computer equipment A new unique research method has appeared - a computer experiment. In many cases, computer studies of models have come to help, and sometimes even replace experimental samples and test benches. The stage of conducting a computer experiment includes two stages: drawing up an experiment plan and conducting research.

Experimental plan

The experimental plan must clearly reflect the sequence of work with the model. The first point of such a plan is always testing the model.

Testing is the process of checking the correctness of the constructed model.

A test is a set of initial data that allows one to determine the correctness of the construction of the model.

To be sure of the correctness of the obtained modeling results, you need to: ♦ check the developed algorithm for constructing the model; ♦ make sure that the constructed model correctly reflects the properties of the original that were taken into account during the modeling.

To check the correctness of the model construction algorithm, a test set of initial data is used, for which the final result is known in advance or predetermined in other ways.

For example, if you use calculation formulas in modeling, then you need to select several options for the initial data and calculate them “manually”. These are test tasks. Once the model is built, you test with the same variations of the input data and compare the simulation results with the conclusions obtained by calculation. If the results coincide, then the algorithm is developed correctly; if not, we need to look for and eliminate the reason for their discrepancy. Test data may not reflect the real situation at all and may not carry any semantic content. However, the results obtained during the testing process may lead you to think about changing the original information or symbolic model, primarily in the part where the semantic content is embedded.

To make sure that the constructed model reflects the properties of the original that were taken into account during the modeling, it is necessary to select a test example with real source data.

Conducting research

After testing, when you have confidence in the correctness of the constructed model, you can proceed directly to conducting research.

The plan must include an experiment or series of experiments that satisfy the modeling objectives. Each experiment must be accompanied by an understanding of the results, which serves as the basis for analyzing the modeling results and making decisions.

The scheme for preparing and conducting a computer experiment is shown in Figure 11.7.

Rice. 11.7. Computer experiment scheme

Analysis of simulation results

The ultimate goal of modeling is making a decision, which should be made on the basis of a comprehensive analysis of the modeling results. This stage is decisive - either you continue the research or finish it. Figure 11.2 shows that the results analysis stage cannot exist independently. The findings often contribute to conducting an additional series of experiments, and sometimes to changing the problem.

The basis for developing a solution is the results of testing and experiments. If the results do not correspond to the goals of the task, it means that mistakes were made at the previous stages. This may be either an incorrect formulation of the problem, or an overly simplified construction of an information model, or an unsuccessful choice of a modeling method or environment, or a violation of technological techniques when building a model. If such errors are identified, then the model needs to be adjusted, that is, a return to one of the previous stages. The process is repeated until the experimental results meet the modeling goals.

The main thing is to always remember: an identified error is also a result. As popular wisdom says, you learn from mistakes. The great Russian poet A. S. Pushkin also wrote about this:

Oh, how many wonderful discoveries are prepared for us by the spirit of enlightenment And experience, the son of difficult mistakes, And genius, friend of paradoxes, And chance, God the inventor...

Test questions and assignments

1. Name the two main types of modeling problems.

2. In the famous “Problem Book” by G. Oster there is the following problem:

The evil witch, working tirelessly, turns 30 princesses a day into caterpillars. How many days will it take her to turn 810 princesses into caterpillars? How many princesses will have to be turned into caterpillars per day to complete the job in 15 days?
Which question can be classified as “what will happen if...” type, and which question can be classified as “how to do so that...”?

3. List the most well-known purposes of modeling.

4. Formalize the humorous problem from G. Oster’s “Problem Book”:

From two booths located at a distance of 27 km from one another, two pugnacious dogs jumped out towards each other at the same time. The first one runs at a speed of 4 km/h, and the second one runs at 5 km/h.
How long will it take for the fight to start?

5. Name as many characteristics of the object “pair of shoes” as possible. Compose information model object for different purposes:

■ choosing shoes for a hiking trip; ■ selection of a suitable shoe box; ■ purchase of shoe care cream.

6. What characteristics of a teenager are important for recommendations on choosing a profession?

7. For what reasons is the computer widely used in modeling?

8. Name the computer modeling tools you know.

9. What is a computer experiment? Give an example.

10. What is model testing?

11. What errors occur during the modeling process? What should you do when an error is discovered?

12. What is the analysis of simulation results? What conclusions are usually drawn?

Annotation: Basic testing concepts. Phases and stages of testing. Types of tests. Test Driven Development

Introduction

Testing is one of the most established methods quality assurance development software.

From a technical point of view In terms of testing, testing consists of executing an application on a certain set of source data and comparing the results obtained with previously known (reference) ones in order to establish the compliance of various properties and characteristics of the application with the ordered properties. As one of the main phases of the software product development process (Application Design - Code Development - Testing), testing is characterized by a fairly large contribution to the total labor intensity of product development. A widely known estimate of the distribution of labor intensity between the phases of creating a software product is: 40% -20% -40%.

From a mathematical point of view testing can be considered as the interpretation of a certain formula and testing its truth on certain sets. Indeed, a program can be represented as a formula f = f1* f2* f3*... * fn, where f1, f2, ... fn are programming language operators, and their superposition is a program.

The truth of such a formula can be substantiated using a formal approach - that is, deducing the required formulas and statements (theorems) from the original axiom formulas using formal procedures (inference rules). The advantage of the formal approach is that it avoids accessing an infinite range of values and operates only with a finite set of symbols at each step of the proof. However, often the construction of a formal system and the formalization of the program itself are very complex processes. An alternative approach to justifying truth is interpretation.

The interpretive approach is used when substituting constants into formulas and then interpreting the formulas as meaningful statements in the elements of sets of concrete values. The truth of the interpreted formulas is checked on a finite set of possible values. The complexity of the approach is that often the number of combinations of values is very large and the combinations themselves consist of a large number of values - which means that processing all combinations will require significant resources. There are various methods, allowing you to reduce the number of combinations that need to be considered. Main testing problem- determining the sufficiency of a set of tests for the truth of the conclusion about the correctness of the program implementation, as well as finding a set of tests that have this property.

Static testing Using formal analysis methods, without executing the program under test, identifies incorrect constructions or incorrect relationships between program objects (formal task errors) using special code monitoring tools - CodeChecker.

Dynamic testing(testing itself) identifies errors only in a running program using special tools test automation- Testbed or Testbench.

Testing Basics

Test Criteria Classes

Structural criteria use information about the structure of the program (the so-called “white box” criteria), which presupposes knowledge of the source text of the program or the specification of the program in the form of a flow control graph. Structural criteria are based on the main elements of the control graph - operators, branches and paths.

Condition of the command testing criterion (criterion C0) - a set of tests in total must ensure that each command is passed at least once.
The condition of the branch testing criterion (criterion C1) is that the set of tests together must ensure that each branch is passed at least once.
Condition of the criterion for testing paths (criterion C2) - a set of tests in total must ensure that each path is passed at least 1 time.

Functional criteria are formulated in the description of the requirements for the software product (the so-called “black box” criteria). They provide, first of all, control over the degree of fulfillment of customer requirements in the software product. Since the requirements are formulated for the product as a whole, they reflect the interaction of the application under test with the environment. The problem with functional testing is, first of all, labor intensity; The fact is that the documents recording the requirements for a software product are, as a rule, quite voluminous, however, the corresponding verification must be comprehensive.

The following special types are distinguished: functional criteria:

testing specification items;
testing input data classes;
testing of rules - a set of tests together should ensure verification of each rule if the input and output values are described by a set of rules of some grammar;
testing output classes;
Function testing;
combined criteria for programs and specifications. Stochastic testing criteria formulated in terms

checking the presence of specified properties in the application under test, by means of testing a certain statistical hypothesis. It is used when testing complex software systems - when a set of deterministic tests (X, Y) has enormous power.

Mutation criteria are focused on checking the properties of a software product based on the Monte Carlo approach.

The mutation testing method consists of introducing mutations (small errors) into the developed program P, i.e. artificially create mutant programs P1, P2... . Then program P and its mutants are tested on the same test set (X, Y).

If the correctness of the program P is confirmed on the set (X, Y) and, in addition, all errors introduced into the mutant programs are identified, then the test set (X, Y) corresponds to the mutation criterion, and the program under test is declared correct. If some mutants did not reveal all the mutations, then the test set (X, Y) must be expanded and testing continued.

Testing phases

When testing, there are usually three phases: unit, integration and system testing.

Unit testing- this is testing a program at the level of individual modules, functions or classes. The purpose of unit testing is to identify errors localized in the module in the implementation of algorithms, as well as to determine the degree of readiness of the system to move to the next level of development and testing. Unit testing is carried out according to the “white box” principle, that is, it is based on knowledge of the internal structure of the program, and often includes certain methods of code coverage analysis.

Integration testing is testing a part of a system consisting of two or more modules. The main task of integration testing is to search for defects associated with errors in the implementation and interpretation of interface interaction between modules. The main difference between unit and integration testing is the objectives, that is, the types of defects detected, which, in turn, determine the strategy for selecting input data and analysis methods.

System testing is qualitatively different from the integration and modular levels. It considers the system under test as a whole and operates at the level user interfaces. Main task system testing consists of identifying defects associated with the operation of the system as a whole, such as incorrect use of system resources, unintended combinations of user-level data, incompatibility with the environment, unintended use scenarios, missing or incorrect functionality, inconvenience in use, and the like.

System testing is carried out on the project as a whole using the “black box” method. The structure of the program does not matter, only the inputs and outputs are available for testing, visible to the user. Codes and user documentation are subject to testing.

In addition, they highlight regression testing- a testing cycle that is carried out when changes are made during the system testing or product maintenance phase. The main problem of regression testing is the choice between full and partial retesting and replenishment of test kits. With partial retesting Only those parts of the project that are associated with changed components are monitored.

Testing stages

Each testing phase includes the following steps:

Setting goals(testing requirements), including the following specification: what parts of the system will be tested, what aspects of their operation will be selected for verification, what is the desired quality, etc.
Planning: creating a schedule (schedule) for developing tests for each tested subsystem; assessment of required human, software and hardware resources; development of test cycle schedule. It is important to note that the testing schedule must be consistent with the development schedule of the system being created.
Test development (test code for the system under test).
Running tests: implementation of test cycles.
Analysis of results.

Test cycle is a test execution cycle that includes phases 4 and 5 of the test process. The test cycle consists of running the developed tests on some uniquely defined section of the system (state of the code of the system being developed). Typically, such a slice of the system is called build.

Test plan- this is a document, or a set of documents, that contains test resources, a list of functions and subsystems to be tested, test strategy, scheduling test cycles, fixing the test configuration (composition and specific parameters of the hardware and software environment), defining a list of test metrics that need to be collected and analyzed during the test cycle (for example, metrics assessing the degree of coverage of a set of requirements by tests).

Tests are developed based on specifications, both manually and using automated tools. In addition to the code itself, the concept of “test” includes its general description And detailed description steps performed in this test.

To assess the quality of tests, various metrics are used related to the number of defects found, code coverage, functional requirements, and a variety of scenarios.

All information about defects discovered during testing (type, detection conditions, reason, conditions for correction, time spent on correction) are entered into the defect database.

Information about the test plan, tests and defects is used at the end of each test cycle to generate test report and adjusting the test system for the next iteration.

Types of tests

The test plan identifies and documents various types of tests.

Types of testing by type of subsystem or product are as follows:

Testing of the main functionality, when the system itself, which is the main manufactured product, is tested.
Installation testing includes testing of initial system installation scripts, re-installation scripts (on top of an existing copy), uninstallation testing, installation testing in the presence of errors in the installed package, in the environment or in the script, etc.
Testing of user documentation includes checking the completeness and clarity of the description of the rules and features of using the product, the presence of a description of all scenarios and functionality, the syntax and grammar of the language, the functionality of examples, etc.

Types of testing based on the method of selecting input values:

Functional testing, which checks:
- covering functional requirements;
- Coverage of use cases.
Stress testing, which tests extreme conditions of product use.
Boundary testing.
Performance testing.
Testing for compliance with standards.
Testing compatibility with other software and hardware systems.
Testing work with the environment.
Testing work on a specific platform.

Test Driven Development

Let's consider a testing approach slightly different from the one above. Test Driven Development (TDD) is a software development process that involves writing and automating unit tests before the corresponding classes or modules are written. This ensures that all responsibilities of any piece of software are defined before they are even coded.

TDD specifies the following order of programming steps:

Red - write a small test that doesn't work and maybe doesn't even compile.
Green - make the test run as quickly as possible, without worrying about the correctness of the design and cleanliness of the code. Write just enough code to make the test work.
Refactoring - Remove any duplication from the code you write.
Once developers have mastered TDD, they find that they write significantly more tests than before and move forward in small steps that previously might have seemed pointless.

Once the programmer has made the test work and can be sure that this part of the functionality is covered, he makes the second test work, then the third, fourth, etc. Than more difficult problem facing the programmer, the smaller the area of functionality each test should cover. The result is 100% code coverage with unit tests, which, as a rule, cannot be achieved with the classical approach to testing.

There are definitely problems that cannot (at least at the moment) be solved with tests alone. In particular, TDD does not allow mechanical demonstration of the adequacy of the developed code in the areas of data security and interaction between processes. Of course, security is based on code that must be free of defects, but it is also based on human participation in data protection procedures. The subtle problems that arise in the area of communication between processes cannot be reproduced with certainty simply by running some code.

Results

The more actively new ones are developed information systems As architectures become more complex and new technologies develop, the testing process becomes more important. There are more and more networking applications and applications for mobile devices. Testing such systems is much more difficult than single-user programs for home PCs. These types of systems require efficient test automation algorithms. In addition, the task of security testing is relevant. information systems in all its manifestations. The video game industry also needs new approaches to testing.

Testing accompanies almost the entire development process, including the earliest stages. Improvements in technology for testing specifications and requirements are still needed. The current task is to develop tests that test the development process, business requirements and goals of the entire organization. It's about developing more efficient tests that cover the most various characteristics information system.

In addition, research continues in the field of tests aimed at specific model development (waterfall, spiral) or for a specific programming paradigm. For example, agent-based testing is proposed for testing component-oriented systems. To test active Java applets, it is proposed to use neural networks. To test agents that exist on the web (robots, spiders), it is proposed to use knowledge-based systems.

Thus, despite the significant certainty of the testing process and the complete automation of many of its stages, there remains a lot of areas for research and practical work.

Testing web services

Most best way evaluate whether we tested the product well - analyze missed defects. Those that our users, implementers, and businesses have encountered. Based on them, you can evaluate a lot: what we haven’t checked thoroughly enough, what areas of the product should be given more attention, what the overall percentage of omissions is and what the dynamics of its changes are. With this metric (perhaps the most common in testing), everything is fine, but... When we released the product and learned about the missed errors, it may already be too late: an angry article about us appeared on Habré, competitors are rapidly spreading criticism, we lost customers trust in us, management is dissatisfied.

To prevent this from happening, we usually try to assess the quality of testing in advance, before release: how well and thoroughly do we test the product? What areas lack attention, where are the main risks, what is the progress? And to answer all these questions, we evaluate test coverage.

Why evaluate?

Any evaluation metrics are a waste of time. At this time, you can test, create bugs, and prepare autotests. What kind of magical benefit do we get from test coverage metrics in order to sacrifice testing time?

Finding your weak areas. Naturally, do we need this? not just to grieve, but to know where improvements are needed. What functional areas are not covered by tests? What haven't we checked? Where are the greatest risks of missing errors?
Rarely do we get 100% based on our coverage assessment results. What to improve? Where to go? What is the percentage now? How can we increase it with any task? How quickly can we get to 100? All these questions bring transparency and clarity to our process., and the answers to them are provided by the coverage assessment.
Focus of attention. Let's say our product has about 50 different functional areas. A new version comes out, and we start testing 1 of them, and we find typos, and buttons that have moved a couple of pixels, and other little things... And now the testing time is over, and this functionality is tested in detail... And the other 50? Coverage assessment allows us to prioritize tasks based on current realities and deadlines.

How to evaluate?

Before implementing any metric, it is important to decide how you will use it. Start by answering exactly this question - most likely, you will immediately understand how best to count it. And in this article I will just share some examples and my experience of how this can be done. Not in order to blindly copy solutions - but so that your imagination is based on this experience, thinking through a solution that is ideal for you.

We evaluate the coverage of requirements by tests

Let’s say you have analysts on your team, and they don’t waste their time working hours. Based on the results of their work, requirements were created in RMS (Requirements Management System) - HP QC, MS TFS, IBM Doors, Jira (with additional plugins), etc. They enter requirements into this system that meet the requirements requirements (pardon the tautology). These requirements are atomic, traceable, specific... In general, ideal conditions for testing. What can we do in this case? When using a scripted approach, link requirements and tests. We run tests in the same system, make a requirement-test connection, and at any time we can see a report on which requirements have tests, which ones do not, when these tests were passed, and with what result.
We get a coverage map, we cover all uncovered requirements, everyone is happy and satisfied, we don’t miss any mistakes...

Okay, let's return from heaven to earth. Most likely, you don’t have detailed requirements, they are not atomic, some of the requirements are completely lost, and you don’t have time to document every test, or at least every second one. You can despair and cry, or you can admit that testing is a compensatory process, and the worse things are with analytics and development on the project, the more we must try ourselves and compensate for the problems of other participants in the process. Let's look at the problems separately.

Problem: Requirements are not atomic.

Analysts also sometimes make mistakes in their heads, and usually this is fraught with problems with the entire project. For example, you are developing a text editor, and you may have two requirements in your system (among others): “html formatting must be supported” and “when opening a file of an unsupported format, a pop-up window with a question must appear.” How many tests are required for the basic verification of the 1st requirement? And for the 2nd? The difference in answers is most likely about a hundred times!!! We cannot say that if there is at least 1 test for the 1st requirement, this is enough - but for the 2nd, most likely, it is.

Thus, the presence of a test for a requirement does not guarantee us anything at all! What does our coverage statistics mean in this case? Almost nothing! We'll have to decide!

In this case, the automatic calculation of requirements coverage by tests can be removed - it still does not carry any meaning.
For each requirement, starting with the highest priority, we prepare tests. When preparing, we analyze what tests this requirement will require, how many will be enough? We carry out a full-fledged test analysis, and do not dismiss it as “there is one test, but that’s okay.”
Depending on the system used, we export/upload tests on demand and... test these tests! Are there enough of them? Ideally, of course, such testing should be carried out with an analyst and developer of this functionality. Print out the tests, lock your colleagues in a meeting room, and don’t let them go until they say “yes, these tests are enough” (this only happens with written agreement, when these words are said to sign off, even without analyzing the tests. During an oral discussion, your colleagues will pour out criticism, missed tests, misunderstood requirements, etc. - this is not always pleasant, but it is very useful for testing!)
After finalizing the tests as required and agreeing on their completeness, this requirement can be assigned the status “covered by tests” in the system. This information will mean much more than “there is at least 1 test here.”

Of course, such an approval process requires a lot of resources and time, especially at first, before gaining practice. Therefore, carry out only high-priority requirements and new improvements. Over time, you will tighten up the remaining requirements, and everyone will be happy! But... what if there are no requirements at all?

Problem: there are no requirements at all.

They are absent from the project, discussed orally, everyone does what he wants/can and as he understands. We test it the same way. As a result, we get a huge number of problems not only in testing and development, but also initially incorrect implementation of features - we wanted something completely different! Here I can recommend the option “define and document the requirements yourself,” and I even used this strategy a couple of times in my practice, but in 99% of cases there are no such resources in the testing team - so let’s take a much less resource-intensive route:

Create a feature list. Sami! In the form of a Google sign, in PBI format in TFS - choose any, as long as you don’t text format. We still need to collect statuses! In this list we include all the functional areas of the product, and try to choose one general level of decomposition (you can write down software objects, or user scripts, or modules, or web pages, or API methods, or screen forms...) - just not all of this at once ! ONE decomposition format, which is easier and more visual for you, will allow you not to miss important things.
We agree on the COMPLETENESS of this list with analysts, developers, business, within our team... Try to do everything so as not to lose important parts of the product! How deeply to analyze is up to you. In my practice, there have only been a few products for which we created more than 100 pages in the table, and these were giant products. Most often, 30-50 lines are an achievable result for subsequent careful processing. In a small team without dedicated test analysts larger number featurelist elements will be too difficult to maintain.
After that, we go by priorities and process each line of the feature list as in the requirements section described above. We write tests, discuss, agree on sufficiency. We mark the statuses for which feature there are enough tests. We get status, progress, and expansion of tests through communication with the team. Everyone is happy!

But... What if requirements are maintained, but not in a traceable format?

Problem: Requirements are not traceable.

There is a huge amount of documentation on the project, analysts type at a speed of 400 characters per minute, you have specifications, technical specifications, instructions, certificates (most often this happens at the request of the customer), and all of this acts as requirements, and everything has been on the project for a long time Confused about where to look for what information?
We repeat the previous section, helping the whole team get organized!

We create a feature list (see above), but without a detailed description of the requirements.
For each feature, we collect links to technical specifications, specifications, instructions, and other documents.
We go by priorities, prepare tests, agree on their completeness. Everything is the same, only by combining all documents into one table we increase ease of access to them, transparent statuses and consistency of tests. In the end, everything is great with us and everyone is happy!

But... Not for long... It seems that over the past week, analysts have updated 4 different specifications based on customer requests!!!

Problem: requirements change all the time.

Of course, it would be nice to test some kind of fixed system, but our products are usually living. The customer asked for something, something changed in the legislation external to our product, and somewhere analysts found an error in the analysis of the year before last... Requirements live their own lives! What to do?

Let's say you have already collected links to technical specifications and specifications in the form of a feature list table, PBI, requirements, Wiki notes, etc. Let's say you already have tests for these requirements. And now, the requirement is changing! This could mean a change in RMS, or a task in TMS (Task Management System), or a letter in the mail. In any case, this leads to the same consequence: your tests are irrelevant! Or they may not be relevant. This means they require updating (test coverage old version the product somehow doesn’t really count, right?)
In featurelist, in RMS, in TMS (Test Management System - testrails, sitechco, etc) tests must be necessarily and immediately marked as irrelevant! In HP QC or MS TFS, this can be done automatically when updating requirements, but in a Google plate or wiki you will have to enter it manually. But you should see right away: tests are irrelevant! This means that we are faced with a complete iterative path: update, re-run the test analysis, rewrite the tests, agree on the changes, and only after that mark the feature/requirement again as “covered by tests.”

In this case, we get all the benefits of test coverage evaluation, and in dynamics! Everyone is happy!!! But…
But you've spent so much time working on requirements that you now don't have enough time to either test or document tests. In my opinion (and there is room for a religious dispute here!) requirements are more important than tests, and it’s better that way! At least they are ok, and the whole team is aware, and the developers are doing exactly what is needed. BUT THERE IS NO TIME LEFT FOR DOCUMENTING TESTS!

Problem: There is not enough time to document tests.

In fact, the source of this problem may be not only a lack of time, but also your very conscious choice not to document them (we don’t like them, we avoid the effect of a pesticide, the product changes too often, etc.). But how to evaluate test coverage in this case?

You still need the requirements, either as full requirements or as a feature list, so some of the sections described above, depending on the work of the analysts on the project, will still be necessary. Got the requirements/featurelist?
We describe and verbally agree briefly on the testing strategy, without documenting specific tests! This strategy may be specified in a table column, on a wiki page, or in a requirement in the RMS, and again must be agreed upon. This strategy will test differently, but you will know: When was this last tested and by what strategy? And this, you see, is also not bad! And everyone will be happy.

But… What other “but”? Which???

Say, we’ll get around everything, and may quality products be with us!

Why is testing necessary?

In this section we will look at the most basic concepts and the principles that are used in the testing process. We will find out what testing actually is, why it is needed and who does it. Let's consider the goals, principles and main stages of testing. Let’s feel what the psychological attitude of a real tester should be and finally debunk a few myths about testing. We are sure you will be interested.
Let's start with what “testing” is. To begin with, let's abstract from dry academic definitions and look at this concept from the point of view of everyday use.
When we test something, we ask ourselves a simple question: “does it work as we expect?” or, in other words: does the actual behavior of the test object match our expectations? If the answer is positive, great; if not, we were deceived in our expectations, which means something needs to be corrected.
Testing is necessary because we all make mistakes. Some of them may be minor, while others may have the most devastating consequences. Everything that is produced by man may contain errors (that’s how we humans are designed). This is why any product needs to be verified - tested before it can be used effectively and safely.
The same is true for software.
Software – computer programs, functions, as well as accompanying documentation and data relevant to the operation of the computer system.
Computer technologies are penetrating deeper and deeper into our daily lives. Software controls the operation of many things around us - from mobile phones and computers up to washing machines and credit cards. In any case, we have all encountered one or another error in the program: a text editor that freezes while working on a thesis project, an ATM that “ate” a card, or simply a website that will not load - all this does not make our life easier.
However, not all mistakes are equally dangerous - for different programs In different systems, risk levels may differ.
Risk:
– a factor that may lead to negative consequences in the future; as a rule, it is expressed in terms of the likelihood of such consequences occurring and their impact on the system.
– something that has not yet happened and may not happen at all; potential problem.
In addition, the level of risk will depend on the likelihood of negative consequences occurring.
For example, the same minor error, say a typo, can have completely different risk levels for different programs:
– a typo in the description of interests on a personal page on a social network is unlikely to have significant consequences, except that it will make your friends smile;
– the same simple typo made in the description of the activity large company, posted on its website, is already dangerous, since it indirectly indicates the unprofessionalism of its employees;
- a typo in the code of the program that calculates radiation levels when operating an X-ray machine (for example, 100 instead of 10) can have the most dire consequences - harm caused to the health and safety of people will result in a loss of confidence in the company and lawsuits with many zeros.

Your goal is like system administrator
is to implement effective strategies for
maximizing your computer resources.

D. Gunter, S. Barnett, L. Gunter.
Integration of Windows NT and Unix

IT specialists not only have to familiarize themselves with the numerous tests published in the computer press, but also develop the test procedures themselves, which are necessary both when choosing a supplier and when creating their own solution. Therefore, we will try to answer questions that arise in the difficult testing process, especially when it concerns such complex systems like servers.

What is being tested and why?

Often in computer periodicals there are various kinds of reviews of programs, hardware and solutions. Of particular interest, as a rule, are comparative reviews of functionally homogeneous products, which present test results. It is believed that these detailed tables help the user, administrator and IT professional to at least be aware of what is happening in this area and even decide on the choice of product.

So, what factors are taken into account in such cases, what is the object of research and what types of tests are most popular?

Testing criteria are usually:

product functionality;
ease of learning;
ease of installation;
quality of documentation and support;
performance;
For equipment, the design is sometimes taken into account.

There are also very ambiguous criteria. Not long ago, one of the reviews of Web servers considered “high degree of integration with the operating system” as a positive factor when giving an overall rating. But if an application crash causes the operating system(the probability of which is proportional to the degree of integration) - is this really such an advantage?

Are a hundred rabbits equal to one tiger?

Separately, I would like to dwell on the price/performance ratio, which is typical when evaluating hardware. At first glance, this is really the only objective criterion linking technical specifications the system under study with the consumer’s wallet. However, not everything is as simple as it seems. The fact is that the above-mentioned approach works only at the time of purchase and does not take into account the cost of ownership, the safety of investments in equipment or software, or the possibility of further modernization.

A typical example is a comparison of older models of systems on Intel processors with the younger ones in the line of RISC platforms. Yes, indeed, in a given price range, machines with Intel architecture are comparable or, in some cases, even superior to RISC systems. However, what is the ceiling for some platforms is just the starting level for others, and so on.

Conclusions: be critical of the criteria by which a product is evaluated—you and the testers may have different tastes. Try telling Unix fans that for the sake of convenience GUI When configuring the system, you should accept the need to reboot after changing IP parameters. As for the compact design of the system unit, this is good until you need to insert an additional hard drive into the slim case.

In short, rethink test results to suit your needs.

Server testing specifics

If the computer does not turn on, it is faulty.
If it doesn’t turn off, it’s a server.
Folk sign

In our opinion, one of the fundamental requirements for servers is reliability. Performance, of course, is also important, since it affects the response time of the system - the most important characteristic from the user's point of view, but the availability of the service is determined by reliability. The timeliness of its provision, the relevance and integrity of the information also depend on reliability.

In addition, it should be taken into account that specialized, i.e., providing only one service, servers are still the exception rather than the rule. Typically, one such computer combines a number of functions - for example, an application server can also serve as a file server, print server, service controller backup etc. It is typical for communication servers to work with multiple protocols application level, each of which is served by its own "daemon".

And finally, characteristic feature the functioning of servers is the presence of peak loads. The reasons for their appearance can be very different - from the start of the working day in a large organization (especially if all users arrive at work on time) to the restoration of a “downed” connection at an Internet service provider, when the backlog of mail and newsgroups hits the communication servers.

These factors, i.e. the requirement for increased reliability in conditions of providing multiple services and peak loads, should be key when determining the ideology of server testing.

Unfortunately, most reviews published in computer periodicals are devoted to either comparing the performance of different hardware solutions on a set of test tasks performed sequentially, or comparative testing of one or another service (for example, testing Web servers different manufacturers). One of the worst cases of this approach is when comparative review capabilities of similar solutions are called testing only because the author of the publication carried out the installation and “drove” the product a little.

Test conditions

First, a little theory. Glenford Myers, in his paper "Software Reliability", gives several "axioms of testing". Let's try, following them, to consider what and how to test.

From time to time, reports of an almost sporting nature appear in the computer press: a product from company N showed record performance in the M test. How informative are the tests conducted by the manufacturing companies?

Unable to test your own program

Often tests are written by company employees for a specific product. Processor performance tests, written in such a way as to realize the advantages of a particular processor, have become the talk of the town. For example, the size of the testing program is selected taking into account its placement in cache memory, etc. The graphical presentation of such results is often quite biased.

Knowing the architecture of applications and their use of OS resources allows software developers to configure the system in such a way as to obtain maximum results for their program. It does not matter at all whether other software or services will feel comfortable with such installations of the operating system and whether the application under test will “seize resources”.

The author encountered this phenomenon while trying to configure Netscape Enterprise Web Server under Solaris (SPARC). The performance of the server using the http protocol was increased almost 6 (!) times (according to testing with MS InetLoad), but in a complex test the increase turned out to be threefold, while the performance of the POP3 server doubled, the News server remained unchanged, and SMTP showed twice as bad results as before the changes.

In addition, manufacturers, knowing the characteristics of a particular test set, can optimize system parameters specifically for it. An example of this is the Netscape Web page, which provides recommendations on how to configure Netscape Enterprise Server for testing using SPECweb96.

Testing is carried out to detect errors

In the case of servers and server software, this means that the device should be forced to operate in the most unfavorable mode - conduct a “survivability” test. This can be achieved by testing the server in the following operating configuration:

all services must be running;
all services must be tested simultaneously (comprehensive test);
a stream of requests is sent to each of the services, simulating typical user activity;
this activity should periodically increase during the test until at least one service can no longer cope with processing requests.

Two notes are relevant here:

1. User behavior model.

In relation to users, the administrator must be a pessimist. Survival testing should be constructed accordingly.

Provide for the maximum number of actions that it would simply never occur to you to take in a normal state. Estimate (or check) whether the system will function normally in this situation. And just as important, will the user receive a clear message from her that this is no longer worth doing and why.

2. The service has stopped handling requests: possible options.

According to the degree of severity, such failures can be divided into 4 groups:

decreased performance - the service does not have time to process, but responds correctly (returns the corresponding error code - “Too many connections”, etc.);
abnormal termination of a service that does not entail negative consequences for the system: the corresponding program has completed its work, unloaded from memory, system resources released;
abnormal termination of a service, negatively affecting system performance. The program either “hangs” in the list of processes without releasing resources, or during the termination process it seizes additional resources;
system collapse - in best case scenario followed by a reboot, or, at worst, a freeze.

Prepare tests for both correct and incorrect inputs

This axiom details the previous one in terms of input information flows.

How will the system react to sending a letter several tens of megabytes in size? Will it get stuck in a queue, thereby blocking your mail system indefinitely (especially if communication with the recipient host is regularly interrupted), or will it be destroyed and the user notified that such actions are unacceptable?

Advice taken from the same book by G. Myers: "Try not to let the system make the user angry, as this may lead to some unexpected situations at the input - Rule #5 of minimizing user errors in dialog systems. Being a pessimist does not mean being a misanthrope!" .

What about the news server - is it installed there? maximum size articles?

Could someone intending to load half of your FTP site open three dozen parallel FTP sessions, and if so, how would this affect your channel and the work of others who want to visit FTP?

As an example confirming the correctness of this approach, we can mention the incident with the missile cruiser Yorktown, where an operator input error resulted in engine control system failure. Or another one, cited by Myers himself: “Operators of the New York SPRINT police vehicle dispatch system in free time they amused themselves by trying to disable it by entering deliberately incorrect messages." This happened in the early 70s. Maybe morals have softened since then, but this is unlikely.

Avoid irreproducible tests

In the case of testing servers and server software, this axiom is especially relevant. Firstly, testing them requires the presence of hardware-separated load generators (Client-Side Load Generators, CSLG) - usually these are groups of workstations that execute the client part of the test and provide a flow of requests to the server. Secondly, the results may be affected by the state of the network connecting the server and the CSLG. In addition, in many cases, performance depends on the history of calls to the server. Most server applications use caching. The speed of accessing cache memory is much higher than the speed of accessing the disk subsystem. The application cache may fill up due to preliminary or debugging runs of test programs - and the results may change accordingly. Moreover, with complex testing, cross-influence of applications is possible - for example, the number of complex requests processed per unit of time to POP3 or IMAP servers depends on the size of the mail spool, which can be increased by the previous SMTP test. Finally, operating system settings affect performance.

All decent reviews have a section on “How the tests were carried out.” In some publications it is more detailed, in others less - it seems that there is still no standard for describing and recording testing. An excellent example of this is the SPECweb96 test. This document takes into account the specifics of testing a server application. Unlike traditional descriptions, there are logging requirements additional settings operating system and the application under study - something that is usually only mentioned in passing even in the best examples of testing descriptions.

Perhaps you yourself will come to the realization that you need to conduct your own test. This need may arise in the following cases:

you are planning to expand your network, which will lead to increased load on the servers located in it;
you intend to update (or change) the software;
you decide to change your server (or servers) to more productive ones;
Finally, maybe you've just decided to figure out the "limits to growth" of your system.

Your first step will probably be to look at published reviews. Therefore, in order to use the data obtained by someone else, treat it critically and try to understand, among other things, the motivation of the people who performed this testing. And then everything depends on you - understanding the goal, choosing or writing an adequate set of tests, and correctly conducting the testing itself. I hope that the considerations outlined in this article will help you with this.