What principle of information encoding is used in a computer. Coding information in a computer. Encoding numerical information

One of the main advantages of a computer is that it is an amazingly versatile machine. Anyone who has ever encountered it knows that doing arithmetic calculations is not at all the main method of using a computer. Computers perfectly reproduce music and videos, with their help you can organize speech and video conferences on the Internet, create and process graphic images, and the ability to use a computer in the field of computer games at first glance looks completely incompatible with the image of a super-arithmometer, grinding hundreds of millions of digits per second.

When compiling an information model of an object or phenomenon, we must agree on how to understand certain designations. That is, agree on the type of presentation of information.

A person expresses his thoughts in the form of sentences made up of words. They are an alphabetical representation of information. The basis of any language is the alphabet - a finite set of various signs (symbols) of any nature that make up a message.

The same entry can carry different meanings. For example, the set of numbers 251299 can indicate: the mass of an object; object length; distance between objects; phone number; recording the date December 25, 1999.

To present information, different codes can be used and, accordingly, you need to know certain rules - the laws of recording these codes, i.e. be able to code.

Code - a set of symbols for presenting information.

Coding - the process of representing information in the form of code.

To communicate with each other we use a code - the Russian language. When speaking, this code is transmitted by sounds, when writing - by letters. The driver transmits the signal using a horn or flashing headlights. You encounter information encoding when crossing the road in the form of traffic lights. Thus, coding comes down to using a set of symbols according to strictly defined rules.

Information can be encoded in various ways: orally; in writing; gestures or signals of any other nature.

Encoding data in binary code.

As technology developed, different ways of encoding information appeared. In the second half of the 19th century, the American inventor Samuel Morse invented an amazing code that still serves humanity today. Information is encoded with three symbols: long signal (dash), short signal (dot), no signal (pause) - to separate letters.

Computer technology also has its own system - it’s called binary coding and is based on representing data as a sequence of just two characters: 0 and 1. These characters are called binary digits, in English - binary digit or abbreviated bit (bit).

One bit can express two concepts: 0 or 1 ( Yes or No, black or white, true or lie and so on.). If the number of bits is increased to two, then four different concepts can be expressed:

Three bits can encode eight different values:

000 001 010 011 100 101 110 111

By increasing the number of bits in a binary coding system by one, we double the number of values that can be expressed in a given system, that is, the general formula looks like:

where N is the number of independent encoded values;

m is the bit depth of binary coding adopted in this system.

The same information can be presented (encoded) in several forms. With the advent of computers, the need arose to encode all types of information that both an individual and humanity as a whole deal with. But humanity began to solve the problem of encoding information long before the advent of computers. The grandiose achievements of mankind - writing and arithmetic - are nothing more than a system for encoding speech and numerical information. Information never appears in its pure form, it is always presented somehow, encoded somehow.

Binary coding is one of the common ways of representing information. In computers, robots and numerically controlled machines, as a rule, all the information that the device deals with is encoded in the form of words of the binary alphabet.

Coding of symbolic (text) information.

The main operation performed on individual text characters is character comparison.

When comparing characters, the most important aspects are the uniqueness of the code for each character and the length of this code, and the choice of encoding principle itself is practically irrelevant.

Various conversion tables are used to encode texts. It is important that the same table is used when encoding and decoding the same text.

Conversion table is a table containing a list of encoded characters ordered in some way, according to which the character is converted into its binary code and back.

The most popular conversion tables: DKOI-8, ASCII, CP1251, Unicode.

Historically, 8 bits or 1 byte was chosen as the code length for character encoding. Therefore, most often one character of text stored in a computer corresponds to one byte of memory.

With a code length of 8 bits, there can be 28 = 256 different combinations of 0 and 1, so no more than 256 characters can be encoded using one conversion table. With a code length of 2 bytes (16 bits), 65536 characters can be encoded.

Encoding numerical information

The similarity in the encoding of numeric and text information is as follows: in order to compare data of this type, different numbers (as well as different characters) must have a different code. The main difference between numeric data and symbolic data is that in addition to the comparison operation, various mathematical operations are performed on numbers: addition, multiplication, root extraction, logarithm calculation, etc. The rules for performing these operations in mathematics are developed in detail for numbers represented in the positional number system.

In the process of development, humanity has come to realize the need to store and transmit certain information over distances. In the latter case, it required its conversion into signals. This process is called data encoding. Text information, as well as graphic images, can be converted into numbers. Our article will tell you how this can be done.

Transmitting information over a distance

courier and postal service;
acoustic (for example, through a loudspeaker);
based on one or another method of telecommunication (wired, radio, optical, radio relay, satellite, fiber optic).

The most common at the moment are transmission systems of the latter type. However, to use them, you must first apply one or another method of encoding information. It is extremely difficult to do this using numbers in the decimal calculus familiar to modern people.

Encryption

Binary number system

At the dawn of the computer era, scientists were preoccupied with searching for a device that would make it possible to represent numbers in a computer as simply as possible. The issue was resolved when Claude Chenon proposed using the binary number system. It has been known since the 17th century, and its implementation required a device with 2 stable states, corresponding to logical “1” and logical “0”. There were plenty of them known at that time - from a core that could be either magnetized or demagnetized, to a transistor that could be either open or closed.

Presentation of color pictures

The method of encoding information using numbers for such images is somewhat more complicated to implement. For this purpose, it is first necessary to decompose the picture into 3 primary colors (green, red and blue), since as a result of mixing them in certain proportions, you can get any shade perceived by the human eye. This method of encoding a picture using numbers using 24 binary bits is called RGB, or True Color.

When it comes to printing, the CMYK system is used. It is based on the idea that each of the basic RGB components can be assigned a color that is its complement to white. They are cyan, magenta and yellow. Although there are enough of them, in order to reduce printing costs, a fourth component is added - black. Thus, to represent graphics in the CMYK system, 32 binary bits are required, and the mode itself is usually called full color.

Representation of sounds

To the question of whether there is a way to encode information using numbers for this, the answer should be yes. However, at the moment such methods are not considered perfect. These include:

FM method. It is based on the decomposition of any complex sound into a sequence of elementary harmonic signals of different frequencies, which can be described by a code.
Table-wave method. Samples - sound samples for various musical instruments - are stored in pre-compiled tables. Numeric codes express the type and model number of the instrument, pitch, intensity and duration of sound, etc.

Now you know that binary coding is one of the common ways of representing information, which played a huge role in the development of computer technology.

With the advent of technical means of storing and transmitting information, new ideas and coding techniques arose.

The first technical means of transmitting information over a distance was the telegraph, invented in $1837$ by the American Samuel Morse.

A telegraph message is a sequence of electrical signals transmitted from one telegraph apparatus through wires to another telegraph apparatus.

These technical circumstances led Morse to the idea of using only two types of signals - short and long - to encode messages transmitted over telegraph lines.

This coding method is called Morse code . In it, each letter of the alphabet is encoded by a sequence of short signals (dots) and long signals (dash). Letters are separated from each other by pauses - the absence of signals. The code table below shows Morse code in relation to the Russian alphabet. There are no special punctuation marks in it. They are usually written with the words: “tchk” - period, “zpt” - comma, etc.

A code table is a correspondence between a set of characters (symbols) and their codes.

The most famous telegraph message is the distress signal. SOS» ( S ave O ur S ouls - save our souls).

Here's what it looks like in Morse code:
Three dots represent the letter S, three dashes - the letter O. Two pauses separate the letters from each other.

A characteristic feature of Morse code is the variable length of the code of different letters, which is why Morse code is called uneven code . Letters that appear more often in the text have a shorter code than rare letters. For example, the code for the letter “E” is one dot, and the code for the letter “B” consists of six characters. Why is this done? To shorten the length of the entire message. But due to the variable length of the letter code, the problem of separating letters from each other in the text arises. Therefore, it is necessary to use a pause (skip) for separation. Consequently, the Morse telegraph alphabet is ternary, since it uses three characters: dot, dash, space.

Morse code is an uneven telegraph code where each letter and sign is represented by long and short signals, the so-called “dashes” and “dots”.

Uniform telegraph code was invented by the Frenchman Jean Maurice Baudot at the end of the 19th century. It used only two types of signals. It doesn’t matter what you call them: dot and dash, plus and minus, zero and one. These are two different electrical signals.

In the Baudot code, the code length of all characters of the alphabet is the same and equals five. In this case, there is no problem of separating letters from each other: each five signals is a text sign.

Baudot code - This is the first method of binary coding of information in the history of technology. Thanks to Baudot's idea, it was possible to automate the process of transmitting and printing letters. A keyboard telegraph apparatus was created. Pressing a key with a certain letter generates a corresponding five-pulse signal, which is transmitted over the communication line. The receiving device, under the influence of this signal, prints the same letter on a paper tape.

Baudot code - r uniform telegraph$5$-bit code using two different electrical signals.

Encoding text information in a computer is sometimes an essential condition for the correct operation of a device or the display of a particular fragment. How this process occurs during the operation of a computer with text and visual information, sound - we will analyze all this in this article.

Introduction

An electronic computer (which we call a computer in everyday life) perceives text in a very specific way. For her, encoding text information is very important, since she perceives each text fragment as a group of symbols isolated from each other.

What are the symbols?

Not only Russian, English and other letters act as symbols for a computer, but also punctuation marks and other characters. Even the space we use to separate words when typing on a computer is perceived by the device as a symbol. In some ways it is very reminiscent of higher mathematics, because there, according to many professors, zero has a double meaning: it is both a number and at the same time does not mean anything. Even for philosophers, the question of white space can be a pressing issue. A joke, of course, but, as they say, there is some truth in every joke.

What kind of information is there?

So, to perceive information, the computer needs to start processing processes. What kind of information is there anyway? The topic of this article is the encoding of textual information. We will pay special attention to this task, but we will also deal with other micro-topics.

Information can be text, numeric, audio, graphic. The computer must run processes that encode textual information in order to display on the screen what we, for example, type on a keyboard. We will see symbols and letters, this is understandable. What does the machine see? She perceives absolutely all information - and now we are not just talking about text - as a certain sequence of zeros and ones. They form the basis of the so-called binary code. Accordingly, the process that converts the information received by the device into something it can understand is called “binary encoding of text information.”

Brief principle of operation of binary code

Why is it that binary coding of information is most widespread in electronic machines? The text base, which is encoded using zeros and ones, can be absolutely any sequence of symbols and signs. However, this is not the only advantage that binary text encoding of information has. The thing is that the principle on which this coding method is based is very simple, but at the same time quite functional. When there is an electrical impulse, it is marked (conditionally, of course) with a unit. There is no impulse - marked with zero. That is, text coding of information is based on the principle of constructing a sequence of electrical impulses. A logical sequence made up of binary code symbols is called machine language. At the same time, encoding and processing text information using binary code allows operations to be carried out in a fairly short period of time.

Bits and bytes

A number perceived by a machine contains a certain amount of information. It is equal to one bit. This applies to every one and every zero that make up one or another sequence of encrypted information.

Accordingly, the amount of information in any case can be determined simply by knowing the number of characters in the binary code sequence. They will be numerically equal to each other. 2 digits in the code carry 2 bits of information, 10 digits - 10 bits, and so on. The principle of determining the information volume that lies in a particular fragment of binary code is quite simple, as you can see.

Coding text information in a computer

Right now you are reading an article that consists of a sequence, as we believe, of letters of the Russian alphabet. And the computer, as mentioned earlier, perceives all information (and in this case too) as a sequence not of letters, but of zeros and ones, indicating the absence and presence of an electrical impulse.

The thing is that you can encode one character that we see on the screen using a conventional unit of measurement called a byte. As written above, binary code has a so-called information load. Let us recall that numerically it is equal to the total number of zeros and ones in the selected code fragment. So, 8 bits make 1 byte. Combinations of signals can be very different, as can be easily seen by drawing a rectangle on paper consisting of 8 cells of equal size.

It turns out that text information can be encoded using an alphabet with a capacity of 256 characters. What's the point? The meaning lies in the fact that each character will have its own binary code. Combinations “tied” to certain characters start from 00000000 and end with 11111111. If you move from the binary to the decimal number system, then you can encode information in such a system from 0 to 255.

Do not forget that now there are various tables that use the encoding of letters of the Russian alphabet. These are, for example, ISO and KOI-8, Mac and CP in two variations: 1251 and 866. It is easy to make sure that text encoded in one of these tables will not be displayed correctly in an encoding other than this one. This is due to the fact that in different tables different symbols correspond to the same binary code.

This was a problem at first. However, nowadays programs already have built-in special algorithms that convert text, bringing it to the correct form. 1997 was marked by the creation of an encoding called Unicode. In it, each character has 2 bytes at its disposal. This allows you to encode text with a much larger number of characters. 256 and 65536: is there a difference?

Graphics coding

Coding text and graphic information has some similarities. As you know, a computer peripheral device called a “monitor” is used to display graphic information. Graphics now (we are talking about computer graphics now) are widely used in a variety of fields. Fortunately, the hardware capabilities of personal computers make it possible to solve quite complex graphics problems.

Processing video information has become possible in recent years. But the text is much “lighter” than the graphics, which, in principle, is understandable. Because of this, the final size of graphics files must be increased. Such problems can be overcome by knowing the essence in which graphical information is presented.

Let's first figure out what groups this type of information is divided into. Firstly, it is raster. Secondly, vector.

Raster images are quite similar to checkered paper. Each cell on such paper is painted over with one color or another. This principle is somewhat reminiscent of a mosaic. That is, it turns out that in raster graphics the image is divided into separate elementary parts. They are called pixels. Translated into Russian, pixels mean “dots”. It is logical that the pixels are ordered relative to the lines. The graphic grid consists of just a certain number of pixels. It is also called a raster. Considering these two definitions, we can say that a raster image is nothing more than a collection of pixels that are displayed on a rectangular grid.

Monitor raster and pixel size affect image quality. The larger the monitor's raster, the higher it will be. Raster sizes are screen resolution, which every user has probably heard of. One of the most important characteristics that computer screens have is resolution, not just resolution. It shows how many pixels there are per unit of length. Typically, monitor resolution is measured in pixels per inch. The more pixels per unit length, the higher the quality will be, since the “grain” is reduced.

Audio stream processing

Coding of text and audio information, like other types of coding, has some features. We will now talk about the last process: encoding audio information.

The representation of an audio stream (as well as an individual sound) can be produced using two methods.

Analogue form of audio information representation

In this case, the quantity can take on a truly huge number of different values. Moreover, these same values do not remain constant: they change very quickly, and this process is continuous.

Discrete form of representation of audio information

If we talk about the discrete method, then in this case the quantity can take only a limited number of values. In this case, the change occurs spasmodically. You can discretely encode not only audio, but also graphic information. As for the analog form, by the way.

Analog audio information is stored on vinyl records, for example. But the CD is already a discrete way of presenting audio information.

At the very beginning, we talked about the fact that the computer perceives all information in machine language. To do this, information is encoded in the form of a sequence of electrical impulses - zeros and ones. Encoding audio information is no exception to this rule. To process sound on a computer, you first need to turn it into that very sequence. Only after this can operations be performed on a stream or a single sound.

When the encoding process occurs, the stream is subject to time sampling. The sound wave is continuous; it develops over small periods of time. The amplitude value is set for each specific interval separately.

Conclusion

So, what did we find out during this article? Firstly, absolutely all information that is displayed on a computer monitor is encoded before appearing there. Secondly, this coding involves translating information into machine language. Thirdly, machine language is nothing more than a sequence of electrical impulses - zeros and ones. Fourthly, there are separate tables for encoding different characters. And, fifthly, graphic and sound information can be presented in analog and discrete form. Here, perhaps, are the main points that we have discussed. One of the disciplines that studies this area is computer science. Coding of textual information and its basics are explained at school, since there is nothing complicated about it.

We got acquainted with number systems - ways of encoding numbers. Numbers give information about the number of items. This information must be encoded and presented in some kind of number system. Which of the known methods to choose depends on the problem being solved.
Until recently, computers mainly processed numerical and textual information. But a person receives most of the information about the outside world in the form of images and sound. In this case, the image turns out to be more important. Remember the proverb: “It is better to see once than to hear a hundred times.” Therefore, today computers are beginning to work more and more actively with images and sound. We will definitely consider ways to encode such information.

Binary coding of numeric and text information.

Any information is encoded in a computer using sequences of two digits - 0 and 1. The computer stores and processes information in the form of a combination of electrical signals: voltage 0.4V-0.6V corresponds to logical zero, and voltage 2.4V-2.7V corresponds to logical one. Sequences of 0 and 1 are called binary codes , and the numbers 0 and 1 are bits (binary digits). This encoding of information on a computer is called binary coding . Thus, binary encoding is encoding with the minimum possible number of elementary symbols, encoding by the simplest means. This is why it is remarkable from a theoretical point of view.
Engineers are attracted to binary coding of information because it is easy to implement technically. Electronic circuits for processing binary codes must be in only one of two states: there is a signal / no signal or high voltage/low voltage .
In their work, computers operate with real and integer numbers, presented in the form of two, four, eight and even ten bytes. To represent the sign of a number when counting, an additional sign digit , which is usually located before the numeric digits. For positive numbers, the value of the sign bit is 0, and for negative numbers - 1. To write the internal representation of a negative integer number (-N), you must:
1) get the additional code of the number N by replacing 0 with 1 and 1 with 0;
2) add 1 to the resulting number.

Since one byte is not enough to represent this number, it is represented as 2 bytes or 16 bits, its complement code is 1111101111000101, therefore -1082=1111101111000110.
If a PC could only handle single bytes, it would be of little use. In reality, a PC works with numbers that are written in two, four, eight and even ten bytes.
Since the late 60s, computers have increasingly been used to process text information. To represent text information, 256 different characters are usually used, for example, capital and small letters of the Latin alphabet, numbers, punctuation marks, etc. In most modern computers, each character corresponds to a sequence of eight zeros and ones, called byte .
A byte is an eight-bit combination of zeros and ones.
When encoding information in these electronic computers, 256 different sequences of 8 zeros and ones are used, which allows 256 characters to be encoded. For example, the large Russian letter “M” has the code 11101101, the letter “I” has the code 11101001, the letter “P” has the code 11110010. Thus, the word “WORLD” is encoded with a sequence of 24 bits or 3 bytes: 111011011110100111110010.
The number of bits in a message is called the message information volume. This is interesting!

Initially, only the Latin alphabet was used in computers. It has 26 letters. So, five pulses (bits) would be enough to designate each one. But the text contains punctuation marks, decimal numbers, etc. Therefore, in the first English-language computers, a byte - a machine syllable - included six bits. Then seven - not only to distinguish large letters from small ones, but also to increase the number of control codes for printers, signal lights and other equipment. In 1964, the powerful IBM-360 appeared, in which the byte finally became equal to eight bits. The last eighth bit was needed for pseudographics characters.
Assigning a particular binary code to a symbol is a matter of convention, which is recorded in the code table. Unfortunately, there are five different encodings of Russian letters, so texts created in one encoding will not be reflected correctly in another.
Chronologically, one of the first standards for encoding Russian letters on computers was KOI8 (“Information Exchange Code, 8-bit”). The most common encoding is the standard Microsoft Windows Cyrillic encoding, denoted by the abbreviation SR1251 (“SR” stands for “Code Page” or “code page”). Apple has developed its own encoding of Russian letters (Mac) for Macintosh computers. The International Standards Organization (ISO) has approved the ISO 8859-5 encoding as a standard for the Russian language. Finally, a new international standard, Unicode, has appeared, which allocates not one byte for each character, but two, and therefore with its help you can encode not 256 characters, but as many as 65536.
All of these encodings continue the ASCII (American Standard Code for Information Interchange) code table, which encodes 128 characters.
ASCII character table:

code	symbol	code	symbol	code	symbol	code	symbol	code	symbol	code	symbol
32	Space	48	.	64	@	80	P	96	"	112	p
33	!	49	0	65	A	81	Q	97	a	113	q
34	"	50	1	66	B	82	R	98	b	114	r
35	#	51	2	67	C	83	S	99	c	115	s
36	$	52	3	68	D	84	T	100	d	116	t
37	%	53	4	69	E	85	U	101	e	117	u
38	&	54	5	70	F	86	V	102	f	118	v
39	"	55	6	71	G	87	W	103	g	119	w
40	(	56	7	72	H	88	X	104	h	120	x
41	)	57	8	73	I	89	Y	105	i	121	y
42	*	58	9	74	J	90	Z	106	j	122	z
43	+	59	:	75	K	91	[	107	k	123	{
44	,	60	;	76	L	92	\	108	l	124	\|
45	-	61	<	77	M	93	]	109	m	125	}
46	.	62	>	78	N	94	^	110	n	126	~
47	/	63	?	79	O	95	_	111	o	127	DEL

Binary coding of text occurs as follows: when you press a key, a certain sequence of electrical impulses is transmitted to the computer, and each character corresponds to its own sequence of electrical impulses (zeros and ones in machine language). The keyboard and screen driver program determines the character using the code table and creates its image on the screen. Thus, texts and numbers are stored in the computer's memory in binary code and converted programmatically into images on the screen.

Binary coding of graphic information.

Since the 80s, the technology of processing graphic information on a computer has been rapidly developing. Computer graphics are widely used in computer simulation in scientific research, computer simulation, computer animation, business graphics, games, etc.
Graphic information on the display screen is presented in the form of an image, which is formed from dots (pixels). Look closely at a newspaper photograph and you will see that it also consists of tiny dots. If these are only black and white dots, then each of them can be encoded with 1 bit. But if there are shades in the photo, then two bits allows you to encode 4 shades of dots: 00 - white, 01 - light gray, 10 - dark gray, 11 - black. Three bits allow you to encode 8 shades, etc.
The number of bits required to encode one shade of color is called color depth.

In modern computers resolution (number of dots on the screen), as well as the number of colors depends on the video adapter and can be changed by software.
Color images can have different modes: 16 colors, 256 colors, 65536 colors ( high color), 16777216 colors ( true color). Per point for mode high color 16 bits or 2 bytes are needed.
The most common screen resolution is 800 by 600 pixels, i.e. 480000 points. Let's calculate the amount of video memory required for high color mode: 2 bytes *480000=960000 bytes.
Larger units are also used to measure the amount of information:

Therefore, 960000 bytes is approximately equal to 937.5 KB. If a person speaks for eight hours a day without a break, then over the course of 70 years of life he will speak about 10 gigabytes of information (that’s 5 million pages - a stack of paper 500 meters high).
Information transfer rate is the number of bits transmitted per second. The transmission rate of 1 bit per second is called 1 baud.

A bitmap, which is a binary image code, is stored in the computer's video memory, from where it is read by the processor (at least 50 times per second) and displayed on the screen.

Binary coding of audio information.

Since the early 90s, personal computers have been able to work with audio information. Every computer with a sound card can save as files ( a file is a certain amount of information stored on disk and has a name ) and play audio information. Using special software (audio file editors) opens up wide possibilities for creating, editing and listening to sound files. Speech recognition programs are being created, and it becomes possible to control the computer with your voice.
It is the sound card (card) that converts the analog signal into a discrete phonogram and vice versa, the “digitized” sound into an analog (continuous) signal that goes to the speaker input.

When binary coding an analog audio signal, the continuous signal is sampled, i.e. is replaced by a series of its individual samples - readings. The quality of binary encoding depends on two parameters: the number of discrete signal levels and the number of samples per second. The number of samples or sampling frequency in audio adapters can be different: 11 kHz, 22 kHz, 44.1 kHz, etc. If the number of levels is 65536, then 16 bits (216) are designed for one audio signal. A 16-bit audio adapter encodes and reproduces audio more accurately than an 8-bit audio adapter.
The number of bits required to encode one audio level is called audio depth.
The volume of a mono audio file (in bytes) is determined by the formula:

With stereophonic sound, the volume of the audio file doubles, with quadraphonic sound it quadruples.
As programs become more complex and their functions increase, as well as the emergence of multimedia applications, the functional volume of programs and data increases. If in the mid-80s the usual volume of programs and data was tens and only sometimes hundreds of kilobytes, then in the mid-90s it began to amount to tens of megabytes. The amount of RAM increases accordingly.

A modern computer can process numerical, text, graphic, sound and video information. All these types of information in a computer are presented in binary code, that is, an alphabet with a capacity of two characters (0 and 1) is used. This is due to the fact that it is convenient to represent information in the form of a sequence of electrical impulses: there is no impulse (0), there is an impulse (1). Such coding is usually called binary, and the logical sequences of zeros and ones themselves are called machine language.

Each digit of machine binary code carries an amount of information equal to one bit.

This conclusion can be made by considering the numbers of the machine alphabet as equally probable events. When writing a binary digit, you can choose only one of two possible states, which means it carries an amount of information equal to 1 bit. Therefore, two digits carry 2 bits of information, four digits carry 4 bits, etc. To determine the amount of information in bits, it is enough to determine the number of digits in the binary machine code.

Encoding text information

Currently, most users use a computer to process text information, which consists of symbols: letters, numbers, punctuation marks, etc.

Based on one cell with an information capacity of 1 bit, only 2 different states can be encoded. In order for each character that can be entered from the keyboard in the Latin case to receive its own unique binary code, 7 bits are required. Based on a sequence of 7 bits, in accordance with Hartley’s formula, N = 2 7 = 128 different combinations of zeros and ones can be obtained, i.e. binary codes. By assigning each character its binary code, we obtain an encoding table. A person operates with symbols, a computer with their binary codes.

For the Latin keyboard layout, there is only one encoding table for the whole world, so text typed using the Latin layout will be adequately displayed on any computer. This table is called ASCII (American Standard Code of Information Interchange) in English it is pronounced [éski], in Russian it is pronounced [áski]. Below is the entire ASCII table, the codes in which are indicated in decimal form. From it you can determine that when you enter, say, the symbol “*” from the keyboard, the computer perceives it as code 42(10), in turn 42(10)=101010(2) - this is the binary code of the symbol “* " Codes 0 to 31 are not used in this table.

ASCII character table

In order to encode one character, an amount of information equal to 1 byte is used, i.e. I = 1 byte = 8 bits. Using a formula that relates the number of possible events K and the amount of information I, you can calculate how many different symbols can be encoded (assuming that symbols are possible events):

K = 2 I = 2 8 = 256,

i.e., an alphabet with a capacity of 256 characters can be used to represent text information.

The essence of encoding is that each character is assigned a binary code from 00000000 to 11111111 or a corresponding decimal code from 0 to 255.

It must be remembered that currently Five different code tables are used to encode Russian letters(KOI - 8, SR1251, SR866, Mac, ISO), and texts encoded using one table will not be displayed correctly in another encoding. This can be visually represented as a fragment of a combined character encoding table.

Different symbols are assigned to the same binary code.

Binary code	Decimal code

However, in most cases, it is not the user who takes care of transcoding text documents, but special programs - converters that are built into applications.

Since 1997, the latest versions of Microsoft Office have supported the new encoding. It's called Unicode. Unicode is an encoding table that uses 2 bytes to encode each character, i.e. 16 bit. Based on such a table, N=2 16 =65,536 characters can be encoded.

Unicode includes almost all modern scripts, including: Arabic, Armenian, Bengali, Burmese, Greek, Georgian, Devanagari, Hebrew, Cyrillic, Coptic, Khmer, Latin, Tamil, Hangul, Han (China, Japan, Korea), Cherokee, Ethiopian, Japanese (katakana, hiragana, kanji) and others.

For academic purposes, many historical scripts have been added, including: ancient Greek, Egyptian hieroglyphs, cuneiform, Mayan writing, and the Etruscan alphabet.

Unicode provides a wide range of mathematical and musical symbols and pictograms.

There are two code ranges for Cyrillic characters in Unicode:

Cyrillic (#0400 - #04FF)

Cyrillic Supplement (#0500 - #052F).

But the implementation of the Unicode table in its pure form is being held back by the fact that if the code of one character takes up not one byte, but two bytes, it will take twice as much disk space to store the text, and twice as much time to transmit it over communication channels.

Therefore, in practice now the Unicode representation UTF-8 (Unicode Transformation Format) is more common. UTF-8 provides the best compatibility with systems that use 8-bit characters. Text consisting only of characters numbered less than 128 is converted to plain ASCII text when written in UTF-8. Other Unicode characters are represented as sequences of 2 to 4 bytes in length. In general, since the world's most common characters, the Latin alphabet, still occupy 1 byte in UTF-8, this encoding is more economical than pure Unicode.

To determine the numeric code of a character, you can either use a code table. To do this, select “Insert” - “Symbol” from the menu, after which the Symbol dialog panel appears on the screen. A character table for the selected font appears in the dialog box. The characters in this table are arranged line by line, sequentially from left to right, starting with the Space character.

There is a constant exchange of information flows in the world. Sources can be people, technical devices, various things, objects of inanimate and living nature. Either one object or several can receive information.

For better data exchange, information is simultaneously encoded and processed on the transmitter side (preparing data and converting it into a form convenient for broadcasting, processing and storage), forwarding and decoding on the receiver side (converting encoded data into its original form). These are interrelated tasks: the source and receiver must have similar information processing algorithms, otherwise the encoding-decoding process will be impossible. Coding and processing of graphic and multimedia information is usually implemented on the basis of computer technology.

Encoding information on a computer

There are many ways to process data (texts, numbers, graphics, video, sound) using a computer. All information processed by a computer is represented in binary code - using the numbers 1 and 0, called bits. Technically, this method is implemented very simply: 1 - electrical signal is present, 0 - absent. From a human point of view, such codes are inconvenient to understand - long strings of zeros and ones, which are encoded characters, are very difficult to decipher right away. But this recording format immediately clearly shows what information coding is. For example, the number 8 in binary eight-bit form looks like the following sequence of bits: 000001000. But what is difficult for a person is simple for a computer. It is easier for electronics to process many simple elements than a small number of complex ones.

Text coding

When we press a button on the keyboard, the computer receives a specific code for the button pressed, looks for it in the standard ASCII character table (American Code for Information Interchange), “understands” which button is pressed and transmits this code for further processing (for example, to display the character on the monitor ). To store the symbolic code in binary form, 8 bits are used, so the maximum number of combinations is 256. The first 128 characters are used for control characters, numbers and Latin letters. The second half is intended for national symbols and pseudographics.

Text coding

It will be easier to understand what information encoding is with an example. Let's look at the codes for the English character "C" and the Russian letter "C". Note that the characters are uppercase, and their codes differ from lowercase ones. The English symbol will look like 01000010, and the Russian one will look like 11010001. What looks the same to a person on a monitor screen is perceived by a computer completely differently. It is also necessary to pay attention to the fact that the codes of the first 128 characters remain unchanged, and from 129 onwards one binary code may correspond to different letters depending on the code table used. For example, the decimal code 194 can correspond to the letter “b” in KOI8, “B” in CP1251, “T” in ISO, and not a single character corresponds to this code in the CP866 and Mac encodings. Therefore, when, when opening text, we see alphabetic and symbolic abracadabra instead of Russian words, this means that such information encoding is not suitable for us and we need to choose another character converter.

Encoding numbers

In the binary number system, there are only two variants of value - 0 and 1. All basic operations with binary numbers are used by a science called binary arithmetic. These actions have their own characteristics. Take, for example, the number 45 typed on the keyboard. Each digit has its own eight-bit code in the ASCII code table, so the number occupies two bytes (16 bits): 5 - 01010011, 4 - 01000011. In order to use this number in calculations, it is converted using special algorithms into the binary number system in the form of an eight-bit binary number: 45 - 00101101.

In the 50s, graphical display of data was first implemented on computers, which were most often used for scientific and military purposes. Today, visualization of information received from a computer is a common and familiar phenomenon for any person, but in those days it made an extraordinary revolution in working with technology. Perhaps the influence of the human psyche had an effect: clearly presented information is better absorbed and perceived. A big breakthrough in the development of data visualization occurred in the 80s, when coding and processing of graphic information received powerful development.

Analog and discrete graphics representation

Audio coding

Coding of multimedia information consists of converting the analog nature of sound into a discrete one for more convenient processing. The ADC receives the input, measures its amplitude at certain time intervals and outputs a digital sequence with data on changes in amplitude. No physical transformations occur.

The output signal is discrete, therefore, the more often the amplitude measurement frequency (sample), the more accurately the output signal corresponds to the input signal, the better the encoding and processing of multimedia information. A sample is also commonly called an ordered sequence of digital data received through an ADC. The process itself is called sampling, in Russian - discretization.

The reverse conversion occurs using a DAC: based on the digital data arriving at the input, an electrical signal of the required amplitude is generated at certain points in time.

Sampling parameters

The main sampling parameters are not only the measurement frequency, but also the bit depth - the accuracy of measuring the change in amplitude for each sample. The more accurately the value of the signal amplitude in each unit of time is transmitted during digitization, the higher the quality of the signal after the ADC, the higher the reliability of wave reconstruction during inverse conversion.