• Asuka codes of characters. ASCII encoding (American standard code for information interchange) - basic text encoding for the Latin alphabet

    Each computer has its own set of characters that it implements. This set contains 26 upper and lowercase letters, numbers and special characters (dot, space, etc.). When converted to integers, symbols are called codes. Standards were developed so that computers would have the same sets of codes.

    ASCII standard

    ASCII (American Standard Code for Information Interchange) is an American standard code for information exchange. Each ASCII character has 7 bits, so the maximum number of characters is 128 (Table 1). Codes 0 to 1F are control characters and are not printed. Many non-printable ASCII characters are needed to transmit data. For example, a message may consist of the start-of-header character SOH, the header itself and the start-of-text character STX, the text itself and the end-of-text character ETX, and the end-of-transmission character EOT. However, data over the network is transmitted in packets, which themselves are responsible for the beginning and end of the transmission. So non-printable characters are almost never used.

    Table 1 - ASCII code table

    Number Command Meaning Number Command Meaning
    0 NUL Null pointer 10 DLE Exit from the transmission system
    1 SOH start of title 11 DC1 Device management
    2 STX Beginning of text 12 DC2 Device management
    3 ETX End of text 13 DC3 Device management
    4 EOT End of transmission 14 DC4 Device management
    5 ACK Request 15 N.A.K. Non-confirmation of reception
    6 BEL Acceptance confirmation 16 SYN Simple
    7 B.S. Bell symbol 17 ETB End of transmission block
    8 HT Step back 18 CAN Mark
    9 LF Horizontal tabulation 19 E.M. End of media
    A VT Line feed 1A SUB Subscript
    B FF Vertical tab 1B ESC Exit
    C CR Page translation 1C FS File separator
    D SO Carriage return 1D G.S. Group separator
    E S.I. Switch to additional register 1E R.S. Record separator
    S.I. Switch to standard case 1F US Module separator
    Number Symbol Number Symbol Number Symbol Number Symbol Number Symbol Number Symbol
    20 space 30 0 40 @ 50 P 60 . 70 p
    21 ! 31 1 41 A 51 Q 61 a 71 q
    22 32 2 42 B 52 R 62 b 72 r
    23 # 33 3 43 C 53 S 63 c 73 s
    24 φ 34 4 44 D 54 T 64 d 74 t
    25 % 35 5 45 E 55 AND 65 e 75 And
    26 & 36 6 46 F 56 V 66 f 76 v
    27 37 7 47 G 57 W 67 g 77 w
    28 ( 38 8 48 H 58 X 68 h 78 x
    29 ) 39 9 49 I 59 Y 69 i 70 y
    2A 3A ; 4A J 5A Z 6A j 7A z
    2B + 3B ; 4B K 5B [ 6B k 7B {
    2C 3C < 4C L 5C \ 6C l 7C |
    2D 3D = 4D M 5D ] 6D m 7D }
    2E 3E > 4E N 5E 6E n 7E ~
    2F / 3F g 4F O 5F _ 6F o 7F DEL
    Unicode standard

    The previous encoding is great for English, but it is not convenient for other languages. For example, German has umlauts, and French has superscripts. Some languages ​​have completely different alphabets. The first attempt at extending ASCII was IS646, which extended the previous encoding by an additional 128 characters. Latin letters with strokes and diacritics were added, and received the name - Latin 1. The next attempt was IS 8859 - which contained a code page. There were also attempts at extensions, but this was not universal. UNICODE encoding was created (is 10646). The idea behind the encoding is to assign each character a single constant 16-bit value, which is called a code pointer. In total there are 65536 pointers. To save space, we used Latin-1 for codes 0 -255, easily changing ASII to UNICODE. This standard solved many problems, but not all. Due to the arrival of new words, for example, for the Japanese language, the number of terms needs to be increased by about 20 thousand. Braille also needs to be included.

    [8-bit encodings: ASCII, KOI-8R and CP1251] The first encoding tables created in the USA did not use the eighth bit in a byte. The text was represented as a sequence of bytes, but the eighth bit was not taken into account (it was used for official purposes).

    The ASCII (American Standard Code for Information Interchange) table has become a generally accepted standard. The first 32 characters of the ASCII table (00 to 1F) were used for non-printing characters. They were designed to control a printing device, etc. The rest - from 20 to 7F - are regular (printable) characters.

    Table 1 - ASCII encoding

    Dec Hex Oct Char Description
    0 0 000 null
    1 1 001 start of heading
    2 2 002 start of text
    3 3 003 end of text
    4 4 004 end of transmission
    5 5 005 inquiry
    6 6 006 acknowledge
    7 7 007 bell
    8 8 010 backspace
    9 9 011 horizontal tab
    10 A 012 new line
    11 B 013 vertical tab
    12 C 014 new page
    13 D 015 carriage return
    14 E 016 shift out
    15 F 017 shift in
    16 10 020 data link escape
    17 11 021 device control 1
    18 12 022 device control 2
    19 13 023 device control 3
    20 14 024 device control 4
    21 15 025 negative acknowledge
    22 16 026 synchronous idle
    23 17 027 end of trans. block
    24 18 030 cancel
    25 19 031 end of medium
    26 1A 032 substitute
    27 1B 033 escape
    28 1C 034 file separator
    29 1D 035 group separator
    30 1E 036 record separator
    31 1F 037 unit separator
    32 20 040 space
    33 21 041 !
    34 22 042 "
    35 23 043 #
    36 24 044 $
    37 25 045 %
    38 26 046 &
    39 27 047 "
    40 28 050 (
    41 29 051 )
    42 2A 052 *
    43 2B 053 +
    44 2C 054 ,
    45 2D 055 -
    46 2E 056 .
    47 2F 057 /
    48 30 060 0
    49 31 061 1
    50 32 062 2
    51 33 063 3
    52 34 064 4
    53 35 065 5
    54 36 066 6
    55 37 067 7
    56 38 070 8
    57 39 071 9
    58 3A 072 :
    59 3B 073 ;
    60 3C 074 <
    61 3D 075 =
    62 3E 076 >
    63 3F 077 ?
    Dec Hex Oct Char
    64 40 100 @
    65 41 101 A
    66 42 102 B
    67 43 103 C
    68 44 104 D
    69 45 105 E
    70 46 106 F
    71 47 107 G
    72 48 110 H
    73 49 111 I
    74 4A 112 J
    75 4B 113 K
    76 4C 114 L
    77 4D 115 M
    78 4E 116 N
    79 4F 117 O
    80 50 120 P
    81 51 121 Q
    82 52 122 R
    83 53 123 S
    84 54 124 T
    85 55 125 U
    86 56 126 V
    87 57 127 W
    88 58 130 X
    89 59 131 Y
    90 5A 132 Z
    91 5B 133 [
    92 5C 134 \
    93 5D 135 ]
    94 5E 136 ^
    95 5F 137 _
    96 60 140 `
    97 61 141 a
    98 62 142 b
    99 63 143 c
    100 64 144 d
    101 65 145 e
    102 66 146 f
    103 67 147 g
    104 68 150 h
    105 69 151 i
    106 6A 152 j
    107 6B 153 k
    108 6C 154 l
    109 6D 155 m
    110 6E 156 n
    111 6F 157 o
    112 70 160 p
    113 71 161 q
    114 72 162 r
    115 73 163 s
    116 74 164 t
    117 75 165 u
    118 76 166 v
    119 77 167 w
    120 78 170 x
    121 79 171 y
    122 7A 172 z
    123 7B 173 {
    124 7C 174 |
    125 7D 175 }
    126 7E 176 ~
    127 7F 177 DEL

    As you can easily see, this encoding contains only Latin letters, and those that are used in the English language. There are also arithmetic and other service symbols. But there are neither Russian letters, nor even special Latin ones for German or French. This is easy to explain - the encoding was developed specifically as an American standard. As computers began to be used throughout the world, other characters needed to be encoded.

    To do this, it was decided to use the eighth bit in each byte. This made 128 more values ​​available (from 80 to FF) that could be used to encode characters. The first of the eight-bit tables - “extended ASCII” ( Extended ASCII) - included various variants of Latin characters used in some languages ​​of Western Europe. It also contained other additional symbols, including pseudographics.

    Pseudographic characters allow you to provide some semblance of graphics by displaying only text characters on the screen. For example, the file management program FAR Manager works using pseudographics.

    There were no Russian letters in the Extended ASCII table. Russia (formerly the USSR) and other countries created their own encodings that made it possible to represent specific “national” characters in 8-bit text files - Latin letters of the Polish and Czech languages, Cyrillic (including Russian letters) and other alphabets.

    In all encodings that have become widespread, the first 127 characters (that is, the byte value with the eighth bit equal to 0) are the same as ASCII. So an ASCII file works in either of these encodings; The letters of the English language are represented in the same way.

    The ISO organization (International Standardization Organization) has adopted the ISO 8859 group of standards. It defines 8-bit encodings for different language groups. So, ISO 8859-1 is an Extended ASCII table for the USA and Western Europe. And ISO 8859-5 is a table for the Cyrillic alphabet (including Russian).

    However, for historical reasons, the ISO 8859-5 encoding did not take root. In reality, the following encodings are used for the Russian language:

    Code Page 866 (CP866), aka “DOS”, aka “alternative GOST encoding”. Widely used until the mid-90s; now used to a limited extent. Practically not used for distributing texts on the Internet.
    - KOI-8. Developed in the 70-80s. It is a generally accepted standard for transmitting email messages on the Russian Internet. It is also widely used in operating systems of the Unix family, including Linux. The Russian-language version of KOI-8 is called KOI-8R; There are versions for other Cyrillic languages ​​(for example, KOI8-U is a version for the Ukrainian language).
    - Code Page 1251, CP1251, Windows-1251. Developed by Microsoft to support the Russian language in Windows.

    The main advantage of the CP866 was the preservation of pseudo-graphics characters in the same places as in Extended ASCII; therefore, foreign text programs, for example, the famous Norton Commander, could work without changes. The CP866 is now used for Windows programs running in text windows or full-screen text mode, including FAR Manager.

    Texts in CP866 have been quite rare in recent years (but it is used to encode Russian file names in Windows). Therefore, we will dwell in more detail on two other encodings - KOI-8R and CP1251.



    As you can see, in the CP1251 encoding table, Russian letters are arranged in alphabetical order (with the exception, however, of the letter E). This arrangement makes it very easy for computer programs to sort alphabetically.

    But in KOI-8R the order of Russian letters seems random. But in reality this is not the case.

    In many older programs, the 8th bit was lost when processing or transmitting text. (Now such programs are practically “extinct”, but in the late 80s - early 90s they were widespread). To get a 7-bit value from an 8-bit value, just subtract 8 from the most significant digit; for example, E1 becomes 61.

    Now compare KOI-8R with the ASCII table (Table 1). You will find that Russian letters are placed in clear correspondence with Latin ones. If the eighth bit disappears, lowercase Russian letters turn into uppercase Latin letters, and uppercase Russian letters turn into lowercase Latin letters. So, E1 in KOI-8 is the Russian “A”, while 61 in ASCII is the Latin “a”.

    So, KOI-8 allows you to maintain the readability of Russian text when the 8th bit is lost. “Hello everyone” becomes “pRIWET WSEM”.

    Recently, both the alphabetical order of characters in the encoding table and readability with the loss of the 8th bit have lost their decisive importance. The eighth bit in modern computers is not lost during transmission or processing. And alphabetical sorting is done taking into account the encoding, and not by simply comparing codes. (By the way, the CP1251 codes are not completely arranged alphabetically - the letter E is not in its place).

    Due to the fact that there are two common encodings, when working with the Internet (mail, browsing Web sites), you can sometimes see a meaningless set of letters instead of Russian text. For example, “I AM SBYUFEMHEL.” These are just the words “with respect”; but they were encoded in CP1251 encoding, and the computer decoded the text using the KOI-8 table. If the same words, on the contrary, were encoded in KOI-8, and the computer decoded the text according to the CP1251 table, the result would be “U HCHBTSEOYEN”.

    Sometimes it happens that a computer deciphers Russian-language letters using a table not intended for the Russian language. Then, instead of Russian letters, a meaningless set of symbols appears (for example, Latin letters of Eastern European languages); they are often called “crocozybras”.

    In most cases, modern programs cope with determining the encodings of Internet documents (emails and Web pages) independently. But sometimes they “misfire”, and then you can see strange sequences of Russian letters or “krokozyabry”. As a rule, in such a situation, to display real text on the screen, it is enough to select the encoding manually in the program menu.

    Information from the page http://open-office.edusite.ru/TextProcessor/p5aa1.html was used for this article.

    Material taken from the site:

    Excel for Office 365 Word for Office 365 Outlook for Office 365 PowerPoint for Office 365 Publisher for Office 365 Excel 2019 Word 2019 Outlook 2019 PowerPoint 2019 OneNote 2016 Publisher 2019 Visio Professional 2019 Visio Standard 2019 Excel 2016 Word 2016 Outlook 2016 PowerPoint 2016 2013 Publisher 2016 Visio 2013 Visio Professional 2016 Visio Standard 2016 Excel 2013 Word 2013 Outlook 2013 PowerPoint 2013 Publisher 2013 Excel 2010 Word 2010 Outlook 2010 PowerPoint 2010 OneNote 2010 Publisher 2010 Visio 2010 Excel 2007 Word 2007 Outlook 20 07 PowerPoint 2007 Publisher 2007 Access 2007 Visio 2007 OneNote 2007 Office 2010 Visio Standard 2007 Visio Standard 2010 Less

    In this article: Insert an ASCII or Unicode character into a document

    If you only need to enter a few special characters or symbols, you can use keyboard shortcuts. For a list of ASCII characters, see the following tables or the article Inserting National Alphabets Using Keyboard Shortcuts.

    Notes:

    Inserting ASCII characters

    To insert an ASCII character, press and hold the ALT key while entering the character code. For example, to insert a degree symbol (º), press and hold the ALT key, then enter 0176 on the numeric keypad.

    To enter numbers, use the numeric keypad rather than the numbers on the main keyboard. If you need to enter numbers on the numeric keypad, make sure the NUM LOCK indicator is on.

    Inserting Unicode Characters

    To insert a Unicode character, enter the character code, then press ALT and X. For example, to insert a dollar symbol ($), enter 0024 and press ALT and X. For all Unicode character codes, see .

    Important: Some Microsoft Office programs, such as PowerPoint and InfoPath, do not support converting Unicode codes to characters. If you need to insert a Unicode character in one of these programs, use .

    Notes:

      If the wrong Unicode character appears after you press ALT+X, select the correct code, and then press ALT+X again.

      In addition, you must enter "U+" before the code. For example, if you enter "1U+B5" and press ALT+X, the text "1µ" will be displayed, and if you enter "1B5" and press ALT+X, the symbol "Ƶ" will be displayed.

    Using the symbol table

    A character table is a program built into Microsoft Windows that allows you to view the characters available for a selected font.

    Using a symbol table, you can copy individual symbols or a group of symbols to the clipboard and paste them into any program that supports displaying those symbols. Opening the symbol table

      In Windows 10, enter the word "symbol" in the search box on the taskbar and select the symbol table from the search results.

      In Windows 8, type the word "symbol" on the Start screen and select the symbol table from the search results.

      In Windows 7, click the Start button, select All Programs, Accessories, System Tools, and then click Character Map.

    Characters are grouped by font. Click the font list to select the appropriate character set. To select a symbol, click it, then click the Select button. To insert a symbol, right-click the desired location in the document and select Paste.

    Frequently used character codes

    For a complete list of characters, see Computer, ASCII Character Code Table, or Unicode Character Tables Organized by Set.

    Glyph

    Glyph

    Currency

    Legal symbols

    Mathematical symbols

    Fractions

    Punctuation and dialect symbols

    Shape symbols

    Commonly used diacritics codes

    For a complete list of glyphs and corresponding codes, see.

    Glyph

    Glyph

    Non-printing ASCII control characters

    The characters used to control some peripheral devices, such as printers, are numbered 0–31 in the ASCII table. For example, the page feed/new page character is number 12. This character tells the printer to move to the beginning of the next page.

    Table of non-printing ASCII control characters

    Decimal number

    Sign

    Decimal number

    Sign

    Freeing the data channel

    Start of title

    First device control code

    Beginning of text

    Second device control code

    End of text

    Third device control code

    End of transmission

    Fourth device control code

    five-pointed

    Negative confirmation

    Confirmation

    Synchronous transmission mode

    Beep

    End of transmitted data block

    Horizontal tabulation

    End of media

    Line feed/new line

    Replacement symbol

    Vertical tab

    exceed

    Page translation/new page

    Twelve

    File separator

    Carriage return

    Group separator

    Shift without saving bits

    Record separator

    Bit-preserving shift

    fifteen

    Data separator

    DecHexSymbol DecHexSymbol
    000 00 specialist. NOP 128 80 Ђ
    001 01 specialist. SOH 129 81 Ѓ
    002 02 specialist. STX 130 82
    003 03 specialist. ETX 131 83 ѓ
    004 04 specialist. EOT 132 84
    005 05 specialist. ENQ 133 85
    006 06 specialist. ACK 134 86
    007 07 specialist. BEL 135 87
    008 08 specialist. B.S. 136 88
    009 09 specialist. TAB 137 89
    010 0Aspecialist. LF 138 8AЉ
    011 0Bspecialist. VT 139 8B‹ ‹
    012 0Cspecialist. FF 140 8CЊ
    013 0Dspecialist. CR 141 8DЌ
    014 0Especialist. SO 142 8EЋ
    015 0Fspecialist. S.I. 143 8FЏ
    016 10 specialist. DLE 144 90 ђ
    017 11 specialist. DC1 145 91
    018 12 specialist. DC2 146 92
    019 13 specialist. DC3 147 93
    020 14 specialist. DC4 148 94
    021 15 specialist. N.A.K. 149 95
    022 16 specialist. SYN 150 96
    023 17 specialist. ETB 151 97
    024 18 specialist. CAN 152 98
    025 19 specialist. E.M. 153 99
    026 1Aspecialist. SUB 154 9Aљ
    027 1Bspecialist. ESC 155 9B
    028 1Cspecialist. FS 156 9Cњ
    029 1Dspecialist. G.S. 157 9Dќ
    030 1Especialist. R.S. 158 9Eћ
    031 1Fspecialist. US 159 9Fџ
    032 20 clutch SP (Space) 160 A0
    033 21 ! 161 A1 Ў
    034 22 " 162 A2ў
    035 23 # 163 A3Ћ
    036 24 $ 164 A4¤
    037 25 % 165 A5Ґ
    038 26 & 166 A6¦
    039 27 " 167 A7§
    040 28 ( 168 A8Yo
    041 29 ) 169 A9©
    042 2A* 170 A.A.Є
    043 2B+ 171 AB«
    044 2C, 172 A.C.¬
    045 2D- 173 AD­
    046 2E. 174 A.E.®
    047 2F/ 175 A.F.Ї
    048 30 0 176 B0°
    049 31 1 177 B1±
    050 32 2 178 B2І
    051 33 3 179 B3і
    052 34 4 180 B4ґ
    053 35 5 181 B5µ
    054 36 6 182 B6
    055 37 7 183 B7·
    056 38 8 184 B8e
    057 39 9 185 B9
    058 3A: 186 B.A.є
    059 3B; 187 BB»
    060 3C< 188 B.C.ј
    061 3D= 189 BDЅ
    062 3E> 190 BEѕ
    063 3F? 191 B.F.ї
    064 40 @ 192 C0 A
    065 41 A 193 C1 B
    066 42 B 194 C2 IN
    067 43 C 195 C3 G
    068 44 D 196 C4 D
    069 45 E 197 C5 E
    070 46 F 198 C6 AND
    071 47 G 199 C7 Z
    072 48 H 200 C8 AND
    073 49 I 201 C9 Y
    074 4AJ 202 C.A. TO
    075 4BK 203 C.B. L
    076 4CL 204 CC M
    077 4DM 205 CD N
    078 4EN 206 C.E. ABOUT
    079 4FO 207 CF P
    080 50 P 208 D0 R
    081 51 Q 209 D1 WITH
    082 52 R 210 D2 T
    083 53 S 211 D3 U
    084 54 T 212 D4 F
    085 55 U 213 D5 X
    086 56 V 214 D6 C
    087 57 W 215 D7 H
    088 58 X 216 D8 Sh
    089 59 Y 217 D9 SCH
    090 5AZ 218 D.A. Kommersant
    091 5B[ 219 D.B. Y
    092 5C\ 220 DC b
    093 5D] 221 DD E
    094 5E^ 222 DE Yu
    095 5F_ 223 DF I
    096 60 ` 224 E0 A
    097 61 a 225 E1 b
    098 62 b 226 E2 V
    099 63 c 227 E3 G
    100 64 d 228 E4 d
    101 65 e 229 E5 e
    102 66 f 230 E6 and
    103 67 g 231 E7 h
    104 68 h 232 E8 And
    105 69 i 233 E9 th
    106 6Aj 234 E.A. To
    107 6Bk 235 E.B. l
    108 6Cl 236 E.C. m
    109 6Dm 237 ED n
    110 6En 238 E.E. O
    111 6Fo 239 E.F. n
    112 70 p 240 F0 r
    113 71 q 241 F1 With
    114 72 r 242 F2 T
    115 73 s 243 F3 at
    116 74 t 244 F4 f
    117 75 u 245 F5 X
    118 76 v 246 F6 ts
    119 77 w 247 F7 h
    120 78 x 248 F8 w
    121 79 y 249 F9 sch
    122 7Az 250 F.A. ъ
    123 7B{ 251 FB s
    124 7C| 252 F.C. b
    125 7D} 253 FD uh
    126 7E~ 254 F.E. yu
    127 7FSpecialist. DEL 255 FF I
    ASCII Windows character code table.
    Description of special (control) characters It should be noted that initially control characters of the ASCII table were used to ensure data exchange via teletype, data entry from punched tape and for simple control of external devices.
    Currently, most of the ASCII table control characters no longer carry this load and can be used for other purposes. Code Description
    NUL, 00Null, empty
    SOH, 01Start Of Heading
    STX, 02Start of TeXt, the beginning of the text.
    ETX, 03End of TeXt, end of text
    EOT, 04End of Transmission, end of transmission
    ENQ, 05Enquire. Please confirm
    ACK, 06Acknowledgment. I confirm
    BEL, 07Bell, call
    BS, 08Backspace, go back one character
    TAB, 09Tab, horizontal tab
    LF, 0ALine Feed, line feed.
    Nowadays in most programming languages ​​it is denoted as \n
    VT, 0BVertical Tab, vertical tabulation.
    FF, 0CForm Feed, page feed, new page
    CR, 0DCarriage Return, carriage return.
    Nowadays in most programming languages ​​it is denoted as \r
    SO,0EShift Out, change the color of the ink ribbon in the printing device
    SI,0FShift In, return the color of the ink ribbon in the printing device back
    DLE, 10Data Link Escape, switching the channel to data transmission
    DC1, 11
    DC2, 12
    DC3, 13
    DC4, 14
    Device Control, device control symbols
    NAK, 15Negative Acknowledgment, I do not confirm.
    SYN, 16Synchronization. Synchronization symbol
    ETB, 17End of Text Block, end of the text block
    CAN, 18Cancel, canceling previously transferred
    EM, 19End of Medium
    SUB, 1ASubstitute, substitute. Placed in place of a symbol whose meaning was lost or corrupted during transmission
    ESC, 1BEscape Control Sequence
    FS, 1CFile Separator, file separator
    GS, 1DGroup Separator
    RS, 1ERecord Separator, record separator
    US, 1FUnit Separator
    DEL, 7FDelete, erase the last character.

    The set of characters with which text is written is called alphabet.

    The number of characters in the alphabet is its power.

    Formula for determining the amount of information: N=2b,

    where N is the power of the alphabet (number of characters),

    b – number of bits (information weight of the symbol).

    The alphabet, with a capacity of 256 characters, can accommodate almost all the necessary characters. This alphabet is called sufficient.

    Because 256 = 2 8, then the weight of 1 character is 8 bits.

    The unit of measurement 8 bits was given the name 1 byte:

    1 byte = 8 bits.

    The binary code of each character in computer text takes up 1 byte of memory.

    How is text information represented in computer memory?

    The convenience of byte-by-byte character encoding is obvious because a byte is the smallest addressable part of memory and, therefore, the processor can access each character separately when processing text. On the other hand, 256 characters is quite a sufficient number to represent a wide variety of symbolic information.

    Now the question arises, which eight-bit binary code to assign to each character.

    It is clear that this is a conditional matter; you can come up with many encoding methods.

    All characters of the computer alphabet are numbered from 0 to 255. Each number corresponds to an eight-bit binary code from 00000000 to 11111111. This code is simply the serial number of the character in the binary number system.

    A table in which all characters of the computer alphabet are assigned serial numbers is called an encoding table.

    Different types of computers use different encoding tables.

    The table has become the international standard for PCs ASCII(read aski) (American Standard Code for Information Interchange).

    The ASCII code table is divided into two parts.

    Only the first half of the table is the international standard, i.e. symbols with numbers from 0 (00000000), up to 127 (01111111).

    ASCII encoding table structure
    Serial number Code Symbol
    0 - 31 00000000 - 00011111

    Symbols with numbers from 0 to 31 are usually called control symbols.
    Their function is to control the process of displaying text on the screen or printing, sounding a sound signal, marking up text, etc.

    32 - 127 00100000 - 01111111

    Standard part of the table (English). This includes lowercase and uppercase letters of the Latin alphabet, decimal numbers, punctuation marks, all kinds of parentheses, commercial and other symbols.
    Character 32 is a space, i.e. empty position in the text.
    All others are reflected in certain signs.

    128 - 255 10000000 - 11111111

    Alternative part of the table (Russian).
    The second half of the ASCII code table, called the code page (128 codes, starting from 10000000 and ending with 11111111), can have different options, each option has its own number.
    The code page is primarily used to accommodate national alphabets other than Latin. In Russian national encodings, characters from the Russian alphabet are placed in this part of the table.

    First half of the ASCII code table

    Please note that in the encoding table, letters (uppercase and lowercase) are arranged in alphabetical order, and numbers are ordered in ascending order. This observance of lexicographic order in the arrangement of symbols is called the principle of sequential coding of the alphabet.

    For letters of the Russian alphabet, the principle of sequential coding is also observed.

    Second half of the ASCII code table

    Unfortunately, there are currently five different Cyrillic encodings (KOI8-R, Windows. MS-DOS, Macintosh and ISO). Because of this, problems often arise with transferring Russian text from one computer to another, from one software system to another.

    Chronologically, one of the first standards for encoding Russian letters on computers was KOI8 ("Information Exchange Code, 8-bit"). This encoding was used back in the 70s on computers of the ES computer series, and from the mid-80s it began to be used in the first Russified versions of the UNIX operating system.

    From the early 90s, the time of dominance of the MS DOS operating system, the CP866 encoding remains ("CP" means "Code Page", "code page").

    Apple computers running the Mac OS operating system use their own Mac encoding.

    In addition, the International Standards Organization (ISO) has approved another encoding called ISO 8859-5 as a standard for the Russian language.

    The most common encoding currently used is Microsoft Windows, abbreviated CP1251.

    Since the late 90s, the problem of standardizing character encoding has been solved by the introduction of a new international standard called Unicode. This is a 16-bit encoding, i.e. it allocates 2 bytes of memory for each character. Of course, this increases the amount of memory occupied by 2 times. But such a code table allows the inclusion of up to 65536 characters. The complete specification of the Unicode standard includes all the existing, extinct and artificially created alphabets of the world, as well as many mathematical, musical, chemical and other symbols.

    Let's try using an ASCII table to imagine what words will look like in computer memory. Internal representation of words in computer memory

    Sometimes it happens that a text consisting of letters of the Russian alphabet received from another computer cannot be read - some kind of “abracadabra” is visible on the monitor screen. This happens because computers use different character encodings for the Russian language.