• Image compression algorithms. JPEG, JPEG2000, JPEG-LS. Lossy and lossless image compression

    "Implementation of algorithms

    JPEG and JPEG2000"

    Completed:

    student of group 819

    Ugarov Dmitry

    Operating principles of the JPEG and JPEG2000 algorithms

    1. JPEG algorithm

    JPEG (Joint Photographic Experts Group) is a widely used method for compressing photographic images. The file format that contains compressed data is usually also called JPEG; The most common extensions for such files are .jpeg, .jfif, .jpg, .JPG, or .JPE. However, of these, .jpg is the most popular extension on all platforms.

    The JPEG algorithm is a compression algorithm with loss of quality.

    Scope of application

    The format is a lossy compression format, so it is incorrect to think of JPEG as storing data at 8 bits per channel (24 bits per pixel). On the other hand, since JPEG compressed and decompressed data are typically represented in 8 bits per channel format, this terminology is sometimes used. Compression of black-and-white halftone images is also supported.

    When saving a JPEG file, you can specify the degree of quality, and therefore the degree of compression, which is usually specified in some conventional units, for example, from 1 to 100 or from 1 to 10. A larger number corresponds to better quality, but the file size increases. Usually, the difference in quality between 90 and 100 is practically not perceived by eye. It should be remembered that the bit-by-bit restored image is always different from the original. A common misconception is that JPEG quality is the same as the percentage of information stored.

    Coding stages

    The JPEG compression process includes a number of steps:

    1. Convert the image to the optimal color space;

    If the luma/chrominance (YCbCr) color space is used, a better compression ratio is achieved. On at this stage encoding using the appropriate relationships, the RGB color model is converted to YCbCr:

    Y = 0.299*R + 0.587*G + 0.114*B

    Cb = - 0.1687*R – 0.3313*G + 0.5*B

    Cr = 0.5*R – 0.4187*G – 0.0813*B.
    During decoding, the appropriate inverse transform can be used:
    R = Y + 1.402*Cr

    G = Y – 0.34414*Cb – 0.71414*Cr

    B = Y + 1.772*Cb.
    Note relating Y,Cb,Cr in the human visual system:

    The eye, especially the retina, has two types of cells as visual analyzers: night vision cells that perceive only shades of gray (from bright white to dark black) and day vision cells that perceive color tint. The first cells, which produce RGB color, detect a brightness level similar to the Y value. Other cells, responsible for the perception of hue, determine the value associated with chroma difference.


    2. Subsampling of color components by averaging groups of pixels;

    Most of the visual information to which the human eye is most sensitive consists of high-frequency, grayscale luminance (Y) components of the YCbCr color space. The other two chromaticity components (Cb and Cr) contain high-frequency color information to which the human eye is less sensitive. Therefore, a certain part of it can be discarded and, thus, the number of pixels taken into account for color channels can be reduced.

    1) type 4:2:0 (when the image is divided into squares of 2x2 pixels and in each of them all pixels receive same values channels Cb and Cr, and the brightness Y remains different for each)

    2) type 4:2:2 (combination by chromaticity components occurs only horizontally in groups of two pixels).

    3) 4:4:4 type means that each pixel in each row has its own unique value of the Y, Cb and Cr components. (Fig. 1 a)

    4) type 4:2:2. By subsampling the chrominance signal with a factor of 2 horizontally, we obtain from a 4: 4: 4 YCbCr stream a 4: 2: 2 YCbCr stream. The entry “4: 2: 2” means that in a single line there are 4 brightness values ​​for 2 chromaticity values ​​(see Fig. 1 b). The 4:2:2 YCbCr signal is very slightly inferior in image quality to the 4:4:4 YCbCr signal, but the required bandwidth is reduced by 33% of the original.

    3. Application of discrete cosine transforms to reduce redundancy of image data;

    The main stage of the algorithm is the discrete cosine transform (DCT or DCT), which is a type of Fourier transform. It is used when working with images for various purposes, not only for compression purposes. The transition to the frequency representation of pixel values ​​allows us to look at the image differently, process it, and, what is interesting for us, compress it. Moreover, knowing the conversion coefficients, we can always perform the opposite action - return the original image.

    DCT directly applied to a block (in our case 8x8 pixels) of the image will look like this:

    where x, y are the spatial coordinates of the pixel (0..7),

    f(x,y) - pixel values ​​of the original macroblock (for example, brightness)

    u,v - pixel coordinates in frequency representation (0..7)

    w(u) =1/SQRT(2) for u=0, in other cases w(u)=1 (SQRT - square root)

    w(v) =1/SQRT(2) for v=0, in other cases w(v)=1

    Or in matrix form:

    4. Quantization of each block of DCT coefficients using weighting functions optimized taking into account human visual perception;

    The discrete cosine transform prepares information for lossy compression and rounding. For each element of the matrix being transformed, there is a corresponding matrix element quantization. The resulting matrix is ​​obtained by dividing each element of the matrix being transformed by the corresponding element of the quantization matrix and then rounding the result to the nearest integer. When compiling a quantization matrix, its large elements are located in the lower left corner, so that when dividing by them, the data in this corner after a discrete cosine transformation (precisely those whose rounding will be less painful) are rounded more roughly. Accordingly, the lost information is less important to us than the remaining information.


    5. Secondary Compression Stage

    The final stage of the JPEG encoder is encoding the resulting matrix.

    5.1 Zigzag permutation of 64 DCT coefficients

    So, after we have performed the DCT transformation on the 8x8 block of values, we have a new 8x8 block. Then, this 8x8 block is traversed in a zigzag pattern like this:

    (The numbers in the 8x8 block indicate the order in which we scan the 2-dimensional 8x8 matrix)

    0, 1, 5, 6,14,15,27,28,

    2, 4, 7,13,16,26,29,42,

    3, 8,12,17,25,30,41,43,

    9,11,18,24,31,40,44,53,

    10,19,23,32,39,45,52,54,

    20,22,33,38,46,51,55,60,

    21,34,37,47,50,56,59,61,

    35,36,48,49,57,58,62,63

    As you can see, first is the upper left corner (0,0), then the value at (0,1), then (1,0), then (2,0), (1,1), (0,2), (0.3), (1.2), (2.1), (3.0), etc.

    After we zigzag the 8x8 matrix, we now have a vector with 64 coefficients (0..63). The point of this zigzag vector is that we are looking through the 8x8 DCT coefficients in order of increasing spatial frequencies. So, we get a vector sorted by spatial frequency criteria: the first value on the vector (index 0) corresponds to the lowest frequency in the image - it is denoted by the term DC. As the index on the vector increases, we obtain values ​​corresponding to higher frequencies (a value with index 63 corresponds to the amplitude of the highest frequency in the 8x8 block). The rest of the DCT coefficients are denoted by AC.

    5.2 RunLength zero encoding (RLE)

    Now we have a vector with a long sequence of zeros. We can use this by encoding consecutive zeros. IMPORTANT: You'll see why later, but here we're skipping the encoding of the first vector coefficient (the DC coefficient), which is encoded differently. Consider the original 64 vector as a 63 vector (this is a 64 vector without the first coefficient)

    Let's say we have 57,45,0,0,0,0,23,0,-30,-16,0,0,1,0,0,0,0,0,0, only 0,... .0

    Here - as RLC JPEG compression done for this example:

    (0.57); (0.45); (4.23); (1,-30); (0,-16); (2.1); EOB

    As you can see, we encode for each value other than 0 the number of consecutive LEADING zeros before the value, then we add the value. Another note: EOB is short form for End of Block, it is a special encoded value (marker). If we have reached at a position on a vector from which we have only vector zeros to the end, we will allocate that position with an EOB and complete RLC compression of the quantized vector.

    [Note that if the quantized vector is not zero-terminated (has the last element not 0), we will not have an EOB token.]

    (0,57); (0,45); (4,23); (1,-30); (0,-16); (2,1); (0,0)

    Another BASIC thing: Let's say somewhere on the quantized vector we have:

    57, eighteen zeros, 3, 0.0 ,0.0 2, thirty-three zeros, 895, EOB

    JPG Huffman encoding makes the restriction that the number of leading zeros must be encoded as a 4-bit value cannot exceed 15.

    So the previous example should be coded as:

    (0,57); (15,0) (2,3); (4,2); (15,0) (15,0) (1,895), (0,0)

    (15,0) is a special encoded value that indicates that there follows 16 consecutive zeros.

    5.3 Final step - Huffman encoding

    First an IMPORTANT note: Instead of storing the actual value, the JPEG standard specifies that we store the minimum bit size at which we can hold this value (this is called the category of this value) and then a bit-encoded representation of this value like this:

    7,..,-4,4,..,7 3 000,001,010,011,100,101,110,111

    15,..,-8,8,..,15 4 0000,..,0111,1000,..,1111

    31,..,-16,16,..,31 5 00000,..,01111,10000,..,11111

    63,..,-32,32,..,63 6 .

    127,..,-64,64,..,127 7 .

    255,..,-128,128,..,255 8 .

    511,..,-256,256,..,511 9 .

    1023,..,-512,512,..,1023 10 .

    2047,..,-1024,1024,..,2047 11 .

    4095,..,-2048,2048,..,4095 12 .

    8191,..,-4096,4096,..,8191 13 .

    16383,..,-8192,8192,..,16383 14 .

    32767,..,-16384,16384,..,32767 15 .

    Subsequently for the previous example:

    (0,57); (0,45); (4,23); (1,-30); (0,-8); (2,1); (0,0)

    let's encode only the right value of these pairs, except for pairs that are special tokens like (0,0) or (if we must have) (15,0)

    45, similarly, would be encoded as (6.101101)

    30 -> (5,00001)

    And now, we'll write the string of pairs again:

    (0,6), 111001; (0,6), 101101; (4,5), 10111; (1,5), 00001; (0,4), 0111; (2,1), 1; (0,0)

    Pairs of 2 values ​​enclosed in parentheses can be represented in a byte, since in fact each of the 2 values ​​can be represented in a 4-bit chunk (the count of leading zeros is always less than 15 and the same as the category [numbers encoded in JPG file- in the area -32767..32767]). In this byte, the high bit represents the number of preceding zeros, and the low bit represents the category of the new value other than 0.

    The final encoding step is to Huffman encode this byte, and then record in a JPG file, as a bitstream, the Huffman code of this byte, followed by the bitwise representation of this number.

    For example, for byte 6 (equivalent to (0,6)) we have Huffman code = 111000;

    21 = (1,5) - 11111110110

    4 = (0,4) - 1011

    33 = (2,1) - 11011

    0 = EOB= (0,0) - 1010

    The final bitstream written in the JPG file to disk for the previous example is 63 coefficients (remember we skipped the first coefficient) -

    111000 111001 111000 101101 1111111110011001 10111 11111110110 00001

    1011 0111 11011 1 1010
    Advantages and disadvantages

    The disadvantages of the format include the fact that at high levels of compression the block data structure makes itself felt, the image is “split into squares” (each 8x8 pixels in size). This effect is especially noticeable in areas with low spatial frequency (smooth image transitions, for example, a clear sky). In areas with high spatial frequency (for example, contrasting edges of the image), characteristic “artifacts” appear - an irregular structure of pixels with distorted color and/or brightness. In addition, small color details disappear from the image. Please also remember that this format does not support transparency.

    However, despite its shortcomings, JPEG has become very widespread due to its high compression ratio relative to the alternatives that existed at the time of its introduction.

    2. JPEG2000 algorithm

    The JPEG-2000 algorithm was developed by the same group of photography experts that developed JPEG. The formation of JPEG as an international standard was completed in 1992. In 1997, it became clear that a new, more flexible and powerful standard was needed, which was finalized by the winter of 2000.

    The main differences between the algorithm in JPEG 2000 and the algorithm in JPEG are as follows:

    1) Better image quality with a high degree of compression. Or, what is the same thing, a higher compression ratio with the same quality for high compression ratios. In effect, this means a noticeable reduction in the size of the "Web-quality" graphics used by most sites.

    2)Support for encoding individual areas with better quality. It is known that certain areas of the image are critical for human perception (for example, the eyes in a photograph), while the quality of others can be sacrificed (for example, the background). With “manual” optimization, the compression rate is increased until quality is lost in some important part of the image. Now it becomes possible to set the quality in critical areas, compressing other areas more strongly, i.e. we get an even greater final compression ratio with subjectively equal image quality.

    3) The main compression algorithm has been replaced by wavelet. In addition to the indicated increase in the compression ratio, this made it possible to get rid of the 8-pixel blockiness that occurs when the compression ratio is increased. In addition, smooth development of the image is now initially included in the standard (Progressive JPEG, actively used on the Internet, appeared much later than JPEG).

    4) To increase the compression ratio, the algorithm uses arithmetic compression. The JPEG standard also originally included arithmetic compression, but this was later replaced by the less efficient Huffman compression because arithmetic compression was protected by patents. Now the main patent has expired, and there is an opportunity to improve the algorithm.

    5)Supports lossless compression. In addition to the usual lossy compression, the new JPEG will now support lossless compression. Thus, it becomes possible to use JPEG to compress medical images, in printing, while saving text for recognition by OCR systems, etc.

    6)Supports compression of single-bit (2-color) images. For saving single-bit images (ink drawings, scanned text, etc.), the GIF format was previously widely recommended, since DCT compression is very ineffective for images with sharp color transitions. In JPEG, when compressed, a 1-bit image was converted to 8-bit, i.e. increased by 8 times, after which an attempt was made to compress, often less than 8 times. Now we can recommend JPEG 2000 as a universal algorithm.

    7) Transparency is supported at the format level. It will now be possible to smoothly apply a background when creating WWW pages not only in GIF, but also in JPEG 2000. In addition, not only 1 bit of transparency is supported (pixel is transparent/opaque), but a separate channel, which will allow you to set a smooth transition from an opaque image to transparent background.

    In addition, at the format level, it supports the inclusion of copyright information in the image, support for resistance to bit errors during transmission and broadcasting, you can request external tools (plug-ins) for decompression or processing, you can include its description, search information, etc. .d.

    Coding stages

    The JPEG2000 compression process includes a number of steps:

    1. Convert the image to the optimal color space.
    At this stage of encoding, the RGB color model is converted to YUV using appropriate relationships:

    When decompressing, the corresponding inverse transformation is applied:

    2. Discrete wavelet transform.

    Discrete wavelet transform (DWT) can also be of two types - for the case of lossy compression and for lossless compression.

    This transformation in the one-dimensional case is a scalar product of the corresponding coefficients and a string of values. But because many coefficients are zero, then the direct and inverse wavelet transformation can be written by the following formulas (to transform the extreme elements of a line, its expansion by 2 pixels in each direction is used, the values ​​of which are symmetrical with the values ​​of the elements of the line relative to its extreme pixels):
    y(2*n + 1) = x(2*n + 1) - (int)(x(2*n) + x(2*n + 2)) / 2

    y(2*n) = x(2*n) + (int)(y(2*n - 1) + y(2*n + 1) + 2) / 4

    and vice versa

    x(2*n) = y(2*n) - (int)(y(2*n - 1) + y(2*n + 1) + 2) / 4

    x(2*n + 1) = y(2*n + 1) + (int)(x(2*n) + x(2*n + 2)) / 2.

    3. Quantization of coefficients.

    Just like the JPEG algorithm, quantization is used when encoding an image into the JPEG2000 format. The discrete wavelet transform, like its analogue, sorts the coefficients by frequency. But, unlike JPEG, in the new format there is one quantization matrix for the entire image.


    4. Secondary Compression Stage

    . Like JPEG, the last step in the compression algorithm in the new format is lossless encoding. But, unlike the previous format, JPEG2000 uses an arithmetic compression algorithm.

    Software implementation

    In this work, the JPEG and JPEG2000 algorithms are implemented. Both algorithms implement forward and reverse coding (the last stage of secondary compression is absent). JPEG calculation takes quite a long time (about 30 seconds) due to the “direct” calculation of DCT. If you need to increase the speed of work, you should initially calculate the DCT matrix (changes should be made in the DCT class).

    Let's move on to the program:


    1. After launch, a window appears where

    and you can save it by clicking the button (2) and entering the desired name in the dialog box.

  • With a sufficiently large Quality Factor, the image will change greatly. If this is a JPEG algorithm, then blocks of size 8x8 will be clearly visible. (In the case of the JPEG2000 algorithm, there will be no block division)
  • To:

    After:



    The good old JPEG, despite a lot of undeniable advantages, still has significant limitations. A new method of image compression, the development of which had been underway for a long time, was called upon to remove them. Now that JPEG2000 has become an officially recognized format, this should serve as the beginning of its active support by various software manufacturers.

    Surely many who work with graphics on a computer are interested in the question: how can an image that occupies a very impressive amount of space in the PC memory be squeezed into a much smaller size on the disk? I remember that at the dawn of my publishing career, the word “compression” was so mysterious and surprising for me... In fact, how does image compression occur - after all, without it it is now unthinkable to imagine either the Web, or digital photography, or color printing?

    So, compression. It may or may not lead to a loss of quality. The last case is methods such as RLE (Run Length Encoding, encoding run lengths, which results in pairs like ( skip, value, Where skip is the number of consecutive zeros, and value- the value following them) and LZW (compression using the Lempel-Ziff-Welch method), implemented in PSD formats, GIF and TIFF. They are also widely used by archivers such as RAR and ZIP. The average degree of lossless compression is 2-3 times.

    If you need to compress an image more, you cannot do without losing quality. What are the principles? Firstly, any image contains a certain redundancy, the removal of which will not lead to a noticeable change in the quality of the image. Secondly, the human eye is more sensitive to changes in brightness than color. Therefore, different degrees of compression are applied to different image channels - information is lost, but this is not visually noticeable. In addition, the eye's sensitivity to small image elements is low, which makes it possible to remove them without compromising quality. This way you can compress the image (even if the deterioration in quality is already noticeable) up to an acceptable threshold. The degree of quality degradation is determined for each specific case. For printing, only minimal distortions are allowed, but for posting on the Internet (depending on the purpose) much more.

    The most popular among lossy compression methods is JPEG, which, even with thirtyfold compression, retains sufficient image quality. By the way, most modern data compression methods (for example, Layer-4, known as mp3, as well as MPEG) implement mechanisms similar to JPEG. Let's take a closer look at this format, especially since not so long ago its newest implementation, JPEG2000, was finally approved, which included all the additions made to JPEG/MPEG over ten years of its development.

    JPEG

    The name of the compression algorithm is an abbreviation for the Joint Photographic Expert Group, an initiative group formed from experts from the ITU (International Telecommunication Union) and ISO (International Organization for Standardization). That is why its name contains the prefix Joint. In 1992, JPEG was declared the international standard for graphics.

    When compressing using the JPEG method, quality is always lost. In this case, there is always a choice: to give preference to quality at the expense of volume (the file size will be compressed by approximately three times) or, on the contrary, to achieve a minimum image size at which it will still remain recognizable (the degree of compression can reach 100). Compression, in which the difference in quality between the resulting image and the original is still unnoticeable, results in a 10-20 times reduction in file size.

    Scope of application

    JPEG is the best compressor for photographic-quality full-color and monochrome images. If you want to save an image with an index palette, it is first converted to full color. When compressing using the JPEG method, you need to keep in mind that everything depends on the nature of the images: much less volume will be occupied by those where the color changes are insignificant and there are no sharp color transitions. JPEG is used wherever photographic images need to be stored: in digital cameras, printing (EPS DCS 2.0), and the Internet is unthinkable without it.

    There are several types of JPEG compression, but we will consider only two of them, used in the standard package for working with raster images Adobe Photoshop, - baseline And progressive. The other two methods - ariphmetic and loseless - are exotic, and for a number of reasons have not become widespread.

    How does compression occur?

    1. The first stage is converting color model images (usually RGB) into a model where the brightness and color components are separated (for example, YCbCr or YUV), which allows for an optimal approach to the choice of compression levels for each channel (taking into account the characteristics of eye perception). The conversion occurs as follows:

    Y = 0.299xR+0.587*G+0.114xB Cb = (B-Y)/0.866/2+128 Cr = (R-Y)/0.701/2+128

    2. At the next stage, the so-called prefiltration, in which neighboring pixels separately in each of the channels Cb and Cr are grouped in pairs in the horizontal and vertical directions, and the brightness channel Y is left unchanged. After this, the entire group of four pixels receives the average value of the corresponding Cb and Cr components. For brevity, such a scheme can be designated as 4:1:1 (the same form of representation is adopted in DRAW - the jpeg export window). Taking into account the fact that each pixel is encoded by 3 bytes (256 levels for each of the three channels), as a result, the amount of data is automatically reduced by 2 times (instead of 12 bytes to transfer 4 pixels, it is enough to transfer only 4+1+1 = 6 bytes) . From a mathematical point of view, such a transformation leads to a significant loss of information, but the human eye does not perceive the loss, since there is significant redundancy in ordinary photographic images.

    3. The received information, which has passed the stage of primary “cleaning”, is separately grouped in each channel again into blocks, but already 8x8 in size, after which basic compression is applied to them - the so-called. discrete cosine transform, for short - DCT (discrete cosine transform). As a result, information about distribution pixel brightness is converted to another form, where it is described by a distribution based on frequency of occurrence one or another pixel brightness. DCT has a number of advantages over other transforms (for example, the Fourier transform), providing better information recovery.

    Instead of an array of 64 values ​​(8x8 pixels) for each block that makes up the image, we get an array of 64 frequencies. Let's look at how DCT works using an example. Let's say the brightness of the pixels in one block of our image has the form shown in Fig. 1 on the left, then the result of the transformation will be as shown on the right.

    1

    Despite significant accuracy, some loss of information does occur at this stage - this is why JPEG always leads to loss of quality. The main purpose of the transformation is to find out the overall picture of the distribution of large (in the figure - top left) and small (bottom right) objects, which will be useful later when eliminating unimportant information.

    4. The next stage is removing information that is barely noticeable to the eye from the block, or quantization(quantization). All components are divided into various coefficients that determine the significance of each of them for high-quality restoration of the original image, and the result rounded up up to an integer value. It is this procedure that introduces the greatest loss of quality, reducing the final volume of the image. High-frequency components are quantized roughly, while low-frequency components are quantized more accurately, since they are most noticeable. In order to somewhat smooth out the decrease in quality, the brightness channel uses smaller division factors than the color channels. But more often (this is done to speed up calculations), instead of specially selected values, only one is taken - the one entered by the user when choosing the degree of compression.

    Here, for example, is what the Photoshop window looks like when saving an image using the Save for web operation, where the Quality parameter (or rather, a derivative of it) is the same rounding factor(Fig. 2).

    As a result of quantization, a set of components is obtained, from which the original image is reconstructed with a given accuracy (Fig. 3).

    4

    In Fig. Figure 4 shows the result of reconstructing a black and white square using one, four and fifteen components, respectively.

    5. After the main work of image compression has been completed, further transformations are reduced to secondary tasks: the remaining components are collected in sequence in such a way that those responsible for large parts are located first, and then for all the smaller ones. If you look at the picture, the movement of the encoder looks like a zigzag line. This stage is called ZigZag (Fig. 5).

    5

    Then the resulting sequence is compressed: first with the usual RLE, then with the Huffman method.

    6. And finally, clean technical stage— the data is enclosed in a shell and provided with a header in which all compression parameters are indicated so that the image can be restored. However, sometimes this information is not included in the headers, which gives additional benefits in compression, but in this case you need to be sure that the application that will read the file knows about them.

    That, in general, is all the transformation. Now let's calculate what compression was achieved in our example. We received 7 values ​​from which the original 8x8 image will be restored. So, the compression from the use of DCT conversion in both color channels was 8x8/7 9 times. Let's allocate not seven, but 11 coefficients to the brightness channel, which will give 8x8/11 6. For all three channels it will be (9+9+6)/3=8 times. The reduction in quality during the “thinning” of the image, which occurred at the second stage, gives an additional double increase (the 4-1-1 scheme, taking into account the peculiarities of encoding the brightness component), which will give a final result of 16 times. This is a rough calculation that does not take into account some aspects, but reflects the real picture. To get a thirty-fold reduction in file size, you need to leave only 3-4 components.

    The image restoration process proceeds in reverse order: first, the components are multiplied by the values ​​​​from the quantization table, and approximate coefficients for the inverse cosine transform are obtained. The better the quality selected during compression, the higher the degree of approximation to the original coefficients, which means the image will be restored more accurately. There is only one action left to add: just before completion, make some adjustments (noise) to the boundary pixels from neighboring blocks in order to remove sharp differences between them.

    Disadvantages of JPEG

    1. Inability to achieve high compression rates due to block size restrictions (8x8 only).
    2. Blocky structure at high levels of compression.
    3. Rounds sharp corners and blurs fine elements in an image.
    4. Only RGB images are supported (JPEG for CMYK images can only be used in EPS format via DCS).
    5. The image cannot be displayed until it has completely loaded.

    It's been ten years since JPEG was approved as a standard. During this time, groups of researchers proposed a number of significant additions to the original version, which resulted in the emergence of a new standard at the end of last year.

    JPEG2000

    Since 1997, work began aimed at creating a universal encoding system that would remove all the limitations imposed by JPEG and could effectively work with all types of images: black and white, grayscale, full color and multi-component, regardless of content ( whether these will be photographs, fairly small text, or even drawings). Along with international standardizing organizations, such industry giants as Agfa, Canon, Fujifilm, Hewlett-Packard, Kodak, LuraTech, Motorola, Ricoh, Sony and others took part in its development.

    Since the new algorithm claimed to be universal, it was additionally tasked with using various methods of data transmission (in real mode time and with a narrow bandwidth), which is especially critical in multimedia applications, for example, in real broadcasts over the Internet.

    Basic requirements for the JPEG2000 format:

    1. Achieving a higher degree of compression compared to JPEG.
    2. Supports monochrome images, which will allow you to use it to compress images with text.
    3. Possibility of compression without loss at all.
    4. Outputs images with gradual improvement in detail (as in progressive GIF).
    5. The use of priority areas in the image, for which the quality can be set higher than the rest of the image.
    6. Decoding in real time (no delays).

    Compression principle

    The main compression mechanism in JPEG2000, unlike JPEG, uses wavelet transformation - a system of filters applied to the entire image. Without going into details of compression, we will only note the main points.

    6
    First, in the same way as for JPEG, the image is converted into the YCrCb system, after which the initial removal of redundant information occurs (by already known combining adjacent pixels into 2x2 blocks). Then the entire image is divided into parts of the same size (tile), each of which, independently of the others, will undergo further transformations (this reduces the requirements for memory and computing resources). Next, each channel is filtered by low-pass and high-pass filters separately in rows and rows, as a result of which, after the first pass, four smaller images (subband) are formed in each part. All of them carry information about the original image, but their information content is very different (Fig. 6).

    For example, the image obtained after low-pass filtering by rows and rows (top left) bears the most more information, and what is received after high-frequency is minimal. The information content of images obtained after low-pass filtering of rows and high-pass filtering of columns (and vice versa) is average. The most informative image is again filtered, and the resulting components, as with jpeg compression, are quantized. This happens several times: for lossless compression the cycle is usually repeated 3 times, with losses - 10 iterations are considered a reasonable compromise between size, quality and decompression speed. The result is one small image and a set of pictures with small details, which sequentially and with a certain accuracy restore it to normal size. Obviously, the highest degree of compression is obtained on large images, since a larger number of cycles can be set.

    Practical implementation

    Since the foundations of JPEG2000 compression were laid, a number of companies have developed fairly effective algorithms for its implementation.

    Among the major software developers, Corel can be noted (by the way, it was one of the first to introduce into its packages support for the wi format, based on wave transformations, for which it is honored and praised) - all images are supplied on CDs with the CorelDRAW package up to the ninth version , were compressed in exactly this way.

    Later, Adobe joined in. Some of the ideas contained in JPEG2000 were applied by the developers of Photoshop 6 in the form of advanced options when saving an image in the JPEG format (regular, based on the cosine transform). Among them is progressive JPEG (the Progressive option in the Save for Web window). This algorithm is intended primarily for real-time systems and works exactly the same as progressive GIF. First, a rough copy of the image appears, consisting of only a few blocks large size, and over time, when the remaining data is loaded, the structure begins to be viewed more and more clearly, until, finally, the final image is completely restored. Unlike GIF, this algorithm creates heavy load on the viewer, since it will have to perform the entire transformation cycle for each version transferred.

    Among other additions, we note the inclusion in the file of several JPEG-compressed images with different degrees of compression, resolution and even color models. Accordingly, in Photoshop 6 it became possible to select individual areas in an image and apply different compression settings to them ( Region-Of-Interest, such a mechanism was first proposed back in 1995), using lower values ​​in the quantization table. To do this, set the required area (for example, in the form of a new channel in the image) and click the mask icon next to the Quality item. In the window that appears, you can experiment with the image by moving the sliders - the finished result is displayed on the screen, allowing you to quickly find the necessary compromise between quality and size.

    Specialized converters and viewers

    Since the standard does not stipulate specific implementations of compression/decompression methods, this gives scope to third-party developers of compression algorithms. In fact, you can use either a simplified wave conversion algorithm and thereby speed up the compression process, or, conversely, use a more complex one and, accordingly, requiring large system resources.

    Customized solutions from other companies are available as commercial designs. Some are implemented as separate programs (JPEG 2000 developed by Aware), others - as additional modules for the most common raster editors (ImagePress JPEG2000 developed by Pegasus Imaging and the LEAD JPEG2000 module from LEAD Technologies). The company LuraTech, which has been working on this issue for a long time, stands out against their background. It promotes its LuraWave technology in the self-contained product LuraWave SmartCompress (the third version is already available) and offers modules for Photoshop, Paintshop, Photopaint. Distinctive feature— higher speed (almost instant conversion) even with images several megabytes in size. Accordingly, the price of this module is the highest - $79.

    To view JPEG2000 images in browsers, you need to install a special viewer module (all developers offer it for free). Inserting an image into an HTML document, like any plug-in, comes down to using the EMBED construct (with additional parameters). For example, means that a progressive image transmission method will be used. That is, in our example (a file of 139 KB in size), first only 250 bytes are transferred, on the basis of which a rough image will be built, then, after loading 500 bytes, the image is updated (this continues until the LIMIT value is reached).

    If you want to get a better image, you need to select Improve from the menu that pops up using the right button (Fig. 9). In four downloads, the entire image will be downloaded completely.

    9

    Conclusions

    So, JPEG2000 objectively shows best results than JPEG only at high compression levels. With compression of 10-20 times, there is not much difference. Will it be able to displace or simply compete with the widespread format? In the near future - it’s unlikely; in most cases, the quality/size ratio provided by JPEG is quite acceptable. And those 10-20% additional compression that JPEG2000 provides with visually the same quality are unlikely to lead to an increase in its popularity.

    But digital camera manufacturing companies are showing keen interest in the new format, since the size of light-sensitive matrices is steadily increasing every year, and it is becoming increasingly difficult to store images in memory. And then the new format will become more widespread, and who knows, perhaps after some time JPEG2000 will become equal to JPEG. In any case, Analog Micro Devices recently released a specialized chip in which compression/decompression using the new technology is implemented at the hardware level, and the US Department of Defense is already actively using the new format for recording photographs obtained from spy satellites.

    Facts and speculation

    1. JPEG loses quality when opening and resaving the file.

    Not true. Quality is only lost when a compression level less than the one with which the image was saved is selected.

    2. JPEG loses quality when editing the file.

    Is it true. When you save the modified file, all transformations are performed again - so avoid frequent image editing. This only applies when the file is closed; if the file remains open, there is no cause for concern.

    3. The result of compression with the same parameters in different programs will be the same.

    Not true. Different programs interpret user input differently. For example, in one program the quality of the saved image is indicated (as, for example, in Photoshop), in another - the degree of its compression (the inverse value).

    4. When setting the maximum quality, the image is saved without any loss of quality.

    Not true. JPEG always compresses with losses. But setting, for example, 90% quality instead of 100% results in a reduction in file size greater than the deterioration in quality perceived by the eye.

    5. Any JPEG file can be opened in any editor that understands the JPEG format.

    Not true. This type of JPEG, called progressive JPEG, is not understood by some editors.

    6. JPEG does not support transparency.

    Is it true. Sometimes it may seem that some part of the image is transparent, but in fact its color is simply chosen to match the background color in the html page.

    7. JPEG compresses better than GIF.

    Not true. They have different areas of application. In general, a typical “GIF” image after conversion to JPEG will have a larger volume.

    JPEG2000 vs JPEG

    7
    1. With twenty to thirty times compression, JPEG2000 and JPEG give approximately the same quality (by the way, Photoshop cannot compress an ordinary photograph more than this limit).

    2. With higher compression, the quality of JPEG2000 is significantly higher than that of JPEG, which allows you to compress up to 50 times without any losses, and with some losses (we are talking about images for the Internet) - up to 100 and even up to 200.

    3. At high levels of compression in those areas where a smooth color change occurs, the image does not acquire the block structure characteristic of a simple JPEG. JPEG2000 also somewhat smears and rounds out sharp edges - see photos (Fig. 7 and 8).

    It shows the results of compression of a test file with different degrees of compression (on the left - saved in Photoshop in JPG format, on the right - in JPEG2000 format). For the image in Fig. 7, compression levels of 20, 40, 70 and 145 were selected (they can be explicitly specified when saving in JPEG2000), the JPG compression level was chosen so that the file size was the same as after compression using JPEG2000. As they say, the results are obvious. For clarity, a second experiment was carried out on an image with sharper details (with compression levels of 10, 20, 40 and 80). The advantage is again on the side of JPEG2000 (Fig. 8).

    8

    4. Since, in fact, copies with different resolutions are stored in one JPEG2000 file

    I mean, for those who make image galleries on the Internet, there is no need to create thumbnails for them.

    5. Of particular interest is compression without distortion (loseless mode). Thus, the test file with LZW compression from Photoshop took 827 KB, and the compressed JPEG2000 took 473 KB.

    6. Compared to JPEG, its more advanced namesake consumes significantly more system resources. But the power of computers, which has significantly increased over the past couple of years, makes it possible to successfully solve image compression problems using a new method.

    7. Lack of JPEG2000 support in browsers. To view such images, you need to download a fairly large additional module (1.2 MB).

    8. Lack of free software for saving images in the new format.

    Magazines are freely available.

    On the same topic:


    Scope of application

    The format is a lossy compression format, so it is incorrect to think of JPEG as storing data as 8 bits per channel (24 bits per pixel). On the other hand, since JPEG compressed and decompressed data are typically represented in 8 bits per channel format, this terminology is sometimes used. Compression of black-and-white halftone images is also supported.

    When saving a JPEG file, you can specify the degree of quality, and therefore the degree of compression, which is usually specified in some conventional units, for example, from 1 to 100 or from 1 to 10. A larger number corresponds to better quality, but the file size increases. Usually, the difference in quality between 90 and 100 is practically not perceived by eye. Please remember that an image restored from JPEG format is not an exact copy of the original. A common misconception is that JPEG quality is the same as the percentage of information stored.

    Widespread support for the JPEG format by various software often leads to JPEG encoding of images that were not intended for that purpose - without any gain in compression compared to properly made PNG or GIF, but with unfortunate consequences for quality. For example, an attempt to record an image in JPEG containing small contrast details (especially color ones) will lead to the appearance of characteristic, clearly visible artifacts even with a high “quality level”.

    Compression

    Compression converts the image from the RGB color space to YCbCr (YUV). It should be noted that the JPEG standard (ISO/IEC 10918-1) does not in any way regulate the choice of YCbCr, allowing other types of conversion (for example, with a number of components other than three), and compression without conversion (directly to RGB), however, the specification JFIF (JPEG File Interchange Format, proposed in 1991 by specialists from C-Cube Microsystems, and which has now become a de facto standard) involves the use of the RGB->YCbCr conversion.

    After the RGB->YCbCr conversion, for the image channels Cb and Cr, which are responsible for color, “subsampling” can be performed, which consists in the fact that each block of 4 pixels (2x2) of the brightness channel Y is assigned the average values ​​of Cb and Cr (thinning scheme "4:2:0". In this case, for each 2x2 block, instead of 12 values ​​(4 Y, 4 Cb and 4 Cr), only 6 are used (4 Y and one averaged Cb and Cr). If the quality of the reconstructed image compression has increased requirements, thinning can be performed only in one direction - vertically (scheme "4:4:0") or horizontally ("4:2:2"), or not performed at all ("4: 4:4").

    The standard also allows decimation with averaging of Cb and Cr not for a 2x2 block, but for four pixels located sequentially (vertically or horizontally), that is, for 1x4 or 4x1 blocks ("4:1:1" scheme). It is also possible to use different types of thinning for Cb and Cr, but in practice such schemes are extremely rare.

    Next, the brightness component Y and the color components Cb and Cr are divided into blocks of 8x8 pixels. Each such block is subjected to a discrete cosine transform (DCT). The resulting DCT coefficients are quantized (in general, different quantization matrices are used for Y, Cb and Cr) and packed using Huffman codes. The JPEG standard also allows for the use of much more efficient arithmetic encoding, however, due to patent restrictions (the patent for the arithmetic QM encoder described in the JPEG standard belongs to IBM), it is not used in practice.

    The matrices used to quantize the DCT coefficients are stored in the header part of the JPEG file. They are usually constructed so that high-frequency coefficients are subject to stronger quantization than low-frequency ones. This results in coarsening of small details in the image. The higher the compression ratio, the more strongly all coefficients are quantized.

    Varieties of JPEG compression schemes

    The JPEG standard provides two main ways to represent encoded data.

    The most common, supported by most available codecs, is the sequential JPEG data representation, which involves sequential traversal of the encoded image block by block from left to right, from top to bottom. The operations described above are performed on each encoded image block, and the encoding results are sequentially placed in the output stream in the form of a single “scan” (an array of encoded data). The main or "baseline" encoding mode allows only this representation. Extended mode, along with sequential mode, also allows progressive JPEG data representation.

    In the case of progressive JPEG, the compressed data is written to the output stream as a set of scans, each of which describes the entire image with an increasing degree of detail. This is achieved either by recording in each scan not the full set of DCT coefficients, but only some part of them: first - low-frequency ones, in subsequent scans - high-frequency ones (the “spectral selection” method, i.e. spectral samples), or by sequential, from scan to scan, refinement of DCT coefficients (method of “successive approximation”, i.e. successive approximations). This progressive representation of data is especially useful when transmitting compressed images using low-speed communication channels, since it allows you to get an overview of the entire image after only a small part of the JPEG file has been transmitted.

    Both described schemes (both sequential and progressive JPEG) are based on DCT and fundamentally do not allow obtaining a reconstructed image absolutely identical to the original one. However, the standard also allows compression that does not use DCT, but is built on the basis of a linear predictor (lossless, i.e., “lossless”, JPEG), guaranteeing a complete, bit-for-bit, match of the original and restored images. At the same time, the compression ratio for photographic images rarely reaches 2, but the guaranteed absence of distortion in some cases is in demand. Noticeably large degrees compression can be obtained using the JPEG-LS compression method, which, despite the similarity in names, is not directly related to the JPEG ISO/IEC 10918-1 (ITU T.81 Recommendation) standard, described by the ISO/IEC 14495-1 (ITU T .87 Recommendation).

    Syntax and structure

    The JPEG file contains a sequence of markers, each of which begins with a 0xFF byte indicating the beginning of the marker, and an identifier byte. Some markers consist only of this pair of bytes, while others contain additional data consisting of a two-byte field with the length of the information part of the marker (including the length of this field, but minus the two bytes of the beginning of the marker, i.e. 0xFF and the identifier) ​​and the data itself.

    Basic JPEG markers
    MarkerBytesLengthPurpose
    Problems with lossy archiving algorithms

    Conventional algorithms were the first to be used for image archiving. Those that were and are used in backup systems, when creating distributions, etc. These algorithms archived the information unchanged. However, the main trend recently has been the use of new image classes. Old algorithms no longer meet archiving requirements. Many images were practically not compressed, although “at first glance” they had obvious redundancy. This led to the creation of a new type of algorithm - compression with loss of information. As a rule, the archiving coefficient and, therefore, the degree of quality loss in them can be set. This creates a compromise between image size and quality.

    One of the serious problems of computer graphics is that an adequate criterion for assessing image quality losses has not yet been found. And it is constantly lost - during digitization, during translation into a limited color palette, during translation into another color representation system for printing, and, what is especially important for us, during archiving with losses. An example can be given simple criterion: standard deviation of pixel values ​​(L 2 measure, or root mean square - RMS):

    According to it, the image will be severely damaged if the brightness is reduced by only 5% (the eye will not notice this - the brightness setting varies much more on different monitors). At the same time, images with “snow” - a sharp change in the color of individual dots, weak stripes or “moiré” will be considered “almost unchanged” (Explain why?). Other criteria also have their unpleasant sides.

    Consider, for example, the maximum deviation:

    This measure, as you might guess, is extremely sensitive to the runout of individual pixels. Those. In the entire image, only the value of one pixel can change significantly (which is almost imperceptible to the eye), however, according to this measure, the image will be greatly damaged.

    The measure that is now used in practice is called the peak-to-peak signal-to-noise ratio (PSNR).

    This measure is essentially similar to the standard deviation, but it is somewhat more convenient to use due to the logarithmic scale of the scale. It has the same disadvantages as the standard deviation.

    Our eyes are the best judge of image quality loss. Archiving is considered excellent if it is impossible to distinguish between the original and unarchived images by eye. The good thing is that when you can tell which of the images was archived, you can only compare two adjacent pictures. With a further increase in the compression ratio, as a rule, side effects characteristic of this algorithm become noticeable. In practice, even with excellent quality preservation, specific changes may be made to the image on a regular basis. Therefore, it is not recommended to use lossy archiving algorithms when compressing images that are later going to be either printed high quality, or processed with image recognition programs. Unpleasant effects with such images, as we have already said, can occur even with simple image scaling. JPEG algorithm

    JPEG is one of the newest and quite powerful algorithms. It is practically the de facto standard for full-color images. The algorithm operates on 8x8 areas in which brightness and color change relatively smoothly. As a result, when decomposing the matrix of such a region into a double series in cosines (see formulas below), only the first coefficients are significant. Thus, JPEG compression is achieved by smoothly changing colors in the image.

    The algorithm was developed by a group of photography experts specifically for compressing 24-bit images. JPEG - Joint Photographic Expert Group - a division within ISO - the International Organization for Standardization. The name of the algorithm reads ["jei"peg]. In general, the algorithm is based on a discrete cosine transform (hereinafter DCT) applied to the image matrix to obtain a new coefficient matrix. An inverse transformation is applied to obtain the original image.

    DCT decomposes the image according to the amplitudes of certain frequencies. Thus, when transforming, we obtain a matrix in which many coefficients are either close or equal to zero. In addition, due to the imperfections of human vision, it is possible to approximate the coefficients more roughly without noticeable loss of image quality.

    For this purpose, coefficient quantization is used. In the simplest case, this is an arithmetic bitwise shift to the right. With this conversion, some information is lost, but large compression ratios can be achieved.

    How the algorithm works

    So, let's look at the algorithm in more detail. Let us compress a 24-bit image.

    Step 1.

    We convert the image from the RGB color space, with components responsible for the red, green and blue components of the point color, into the YCrCb color space (sometimes called YUV).

    In it, Y is the brightness component, and Cr, Cb are the components responsible for color (chromatic red and chromatic blue). Due to the fact that the human eye is less sensitive to color than to brightness, it becomes possible to archive arrays for Cr and Cb components with high losses and, accordingly, high compression ratios. This kind of conversion has been used in television for a long time. A narrower frequency band is allocated there for signals responsible for color.

    A simplified translation from the RGB color space to the YCrCb color space can be represented using a transition matrix:

    The inverse transformation is carried out by multiplying the YUV vector by the inverse matrix.

    Step 2.

    We split the original image into 8x8 matrices. We form three working DCT matrices from each - 8 bits separately for each component. At higher compression ratios, this step may be a little more difficult. The image is divided by the Y component - as in the first case, and for the Cr and Cb components the matrices are typed through a row and through a column. Those. from the original 16x16 matrix, only one working DCT matrix is ​​obtained. In this case, as it is easy to see, we lose 3/4 of the useful information about the color components of the image and immediately get twice the compression. We can do this by working in the YCrCb space. As practice has shown, this does not have a significant effect on the resulting RGB image.

    Step 3.

    We apply DCT to each working matrix. In this case, we obtain a matrix in which the coefficients in the left top corner correspond to the low-frequency component of the image, and in the lower right - to the high-frequency component.

    In a simplified form, this transformation can be represented as follows:

    Step 4.

    We perform quantization. In principle, this is simply dividing the working matrix by the quantization matrix element by element. For each component (Y, U and V), in the general case, its own quantization matrix q (hereinafter referred to as MC) is specified. This step is where the compression ratio is controlled and where the biggest losses occur. It is clear that by specifying MK with large coefficients, we will get more zeros and, therefore, a higher compression ratio.

    Specific effects of the algorithm are also associated with quantization. At high gamma values, the loss in low frequencies can be so great that the image breaks up into 8x8 squares. Losses in high frequencies can manifest themselves in the so-called “Gibbs effect”, when a kind of “halo” is formed around contours with a sharp color transition.

    Step 5.

    We convert the 8x8 matrix into a 64-element vector using “zigzag” scanning, i.e. take elements with indices (0,0), (0,1), (1,0), (2,0)...

    Thus, at the beginning of the vector we obtain the matrix coefficients corresponding to low frequencies, and at the end - high.

    Step 6.

    We collapse the vector using the group encoding algorithm. In this case, we get pairs of the type (skip, number), where “skip” is the counter of skipped zeros, and “number” is the value that must be put in the next cell. Thus, the vector 42 3 0 0 0 -2 0 0 0 0 1 ... will be folded into pairs (0.42) (0.3) (3,-2) (4,1) ... .

    Step 7

    We collapse the resulting pairs using Huffman coding with a fixed table.

    The image restoration process in this algorithm is completely symmetrical. The method allows you to compress some images by 10-15 times without serious losses.


    The pipeline of operations used in the JPEG algorithm.

    The significant positive aspects of the algorithm are that:

    1. Sets the compression ratio.
    2. Day off a color image can have 24 bits per dot.
    The negative aspects of the algorithm are that:
    1. As the compression level increases, the image breaks up into individual squares (8x8). This is due to the fact that large losses occur at low frequencies during quantization, and it becomes impossible to restore the original data.
    2. The Gibbs effect appears - halos along the boundaries of sharp color transitions.
    As already mentioned, JPEG was standardized relatively recently - in 1991. But even then there were algorithms that compressed more strongly with less quality loss. The fact is that the actions of the standard developers were limited by the power of the technology that existed at that time. That is, even on a personal computer, the algorithm had to run for less than a minute on the average image, and its hardware implementation had to be relatively simple and cheap. The algorithm had to be symmetrical (unzipping time is approximately equal to archiving time).

    The latter requirement made possible the emergence of such toys as digital cameras - devices the size of a small video camera that take 24-bit photographs on a 10-20 MB flash card with a PCMCIA interface. Then this card is inserted into a slot on your laptop and the corresponding program allows you to read images. Isn’t it true, if the algorithm were asymmetrical, it would be unpleasant to wait a long time for the device to “recharge” and compress the image.

    Another not very pleasant property of JPEG is that often horizontal and vertical stripes are completely invisible on the display and can only appear when printing in the form of a moiré pattern. It occurs when an inclined printing raster is superimposed on horizontal and vertical stripes of the image. Because of these surprises, JPEG is not recommended for active use in printing, setting high coefficients. However, when archiving images intended for human viewing, it is currently indispensable.

    Wide application of JPEG for a long time was, perhaps, held back only by the fact that it operates on 24-bit images. Therefore, in order to view a picture with acceptable quality on a regular monitor in a 256-color palette, the use of appropriate algorithms and, consequently, a certain amount of time was required. In applications aimed at demanding users, such as games, such delays are unacceptable. In addition, if you have images, say, in 8-bit GIF format, converted to 24-bit JPEG, and then back to GIF for viewing, then the loss of quality will occur twice during both conversions. However, the gain in archive size is often so large (3-20 times!), and the loss in quality is so small that storing images in JPEG is very effective.

    A few words need to be said about modifications of this algorithm. Although JPEG is an ISO standard, its file format has not been fixed. Taking advantage of this, manufacturers create their own formats that are incompatible with each other, and, therefore, can change the algorithm. Thus, the internal algorithm tables recommended by ISO are replaced by their own. In addition, there is slight confusion when setting the degree of loss. For example, during testing it turns out that “excellent” quality, “100%” and “10 points” give significantly different pictures. At the same time, by the way, “100%” quality does not mean lossless compression. There are also JPEG options for specific applications.

    How the ISO standard JPEG is becoming increasingly used in image exchange computer networks. The JPEG algorithm is supported in Quick Time, PostScript Level 2, Tiff 6.0 formats and currently occupies a prominent place in multimedia systems.

    Algorithm characteristics JPEG:

    Image class: Full-color 24-bit images or grayscale images without sharp color transitions (photos).

    Symmetry: 1

    Features: In some cases, the algorithm creates a “halo” around sharp horizontal and vertical boundaries in the image (Gibbs effect). In addition, with a high degree of compression, the image is divided into blocks of 8x8 pixels.

    Fractal algorithm

    Idea of ​​the method

    Fractal archiving is based on the fact that we represent the image in a more compact form - using the coefficients of the Iterated Function System (hereinafter referred to as IFS). Before considering the archiving process itself, let's look at how IFS builds an image, i.e. decompression process.

    Strictly speaking, IFS is a set of three-dimensional affine transformations, in our case transforming one image into another. Points in three-dimensional space (x_coordinate, y_coordinate, brightness) undergo transformation.

    This process was most clearly demonstrated by Barnsley in his book “Fractal Image Compression”. There the concept of a Photocopying Machine was introduced, consisting of a screen on which the original picture is depicted, and a system of lenses that project the image onto another screen:

    • Lenses can project part of the image free form to any other location in the new image.
    • Regions in which images are projected do not intersect.
    • The lens can change brightness and reduce contrast.
    • The lens can mirror and rotate your image fragment.
    • Lens should scale(reduce) your fragment of the image.

    By arranging lenses and changing their characteristics, we can control the resulting image. One iteration of the Machine’s work is that a new one is built from the original image using design, after which the new one is taken as the initial one. It is argued that in the process of iterations we will get an image that will stop changing. It will depend only on the location and characteristics of the lenses, and will not depend on the original picture. This image is called “ fixed point" or attractor given IFS. The corresponding theory guarantees that there is exactly one fixed point for each IFS.

    Since the lens mapping is compressive, each lens explicitly specifies self-similar areas in our image. Thanks to self-similarity, we obtain a complex image structure at any magnification. Thus, it is intuitive that a system of iterable functions defines fractal(non-strictly - a self-similar mathematical object).

    The two most famous images obtained using IFS are the “Sierpinski triangle” and the “Barnsley fern”. The “Sierpinski triangle” is defined by three, and the “Barnsley fern” by four affine transformations (or, in our terminology, “lenses”). Each transformation is encoded in literally a few bytes, while the image constructed using them can occupy several megabytes.

    Exercise: Identify 4 areas in the image, the union of which would cover the entire image, and each of which would be similar to the entire image (don't forget the fern stem).

    From the above, it becomes clear how the archiver works and why it takes so much time. In fact, fractal compression is a search for self-similar areas in an image and determination of affine transformation parameters for them.

    =>
    In the worst case, if no optimizing algorithm is used, it will be necessary to enumerate and compare all possible image fragments of different sizes. Even for small images, taking discreteness into account, we get an astronomical number of options to be sorted out. Moreover, even a sharp narrowing of transformation classes, for example, due to scaling only a certain number of times, does not provide a noticeable gain in time. In addition, image quality is lost. The vast majority of research in the field of fractal compression is now aimed at reducing the archiving time required to obtain a high-quality image.

    Definition.

    Where a, b, c, d, e, f real numbers are called two-dimensional affine transformation.

    Definition. Transformation, represented in the form

    where a, b, c, d, e, f, p, q, r, s, t, u are real numbers and is called a three-dimensional affine transformation.

    Definition. Let be a transformation in space X. The point is such that it is called fixed point(attractor) transformation.

    Definition. A transformation in a metric space (X, d) is called contracting if there is a number s: , such that

    Comment: Formally, we can use any compression mapping for fractal compression, but in reality only three-dimensional affine transformations with fairly strong restrictions on the coefficients are used.

    Theorem. (About the compression transformation)

    Let in the complete metric space (X, d). Then there is exactly one fixed point of this transformation, and for any point the sequence converges to .

    A more general formulation of this theorem guarantees convergence.

    Definition. Image is a function S defined on the unit square and taking values ​​from 0 to 1 or

    Let the three-dimensional affine transformation be written in the form

    and is defined on a compact subset of the Cartesian square x. Then it will transfer part of the surface S to an area located with a shift (e,f) and rotation specified by the matrix

    Moreover, if we interpret the value S as the brightness of the corresponding points, it will decrease by p times (the transformation must be compressive) and will change to a shear q.

    Definition. Finite set W contractive three-dimensional affine transformations defined on domains such as , called system of iterable functions ( IFS).

    The system of iterated functions is uniquely associated with a fixed point - an image. Thus, the compression process is to find the coefficients of the system, and the decompression process is to iterate the system until the resulting image (fixed point IFS) is stabilized. In practice, 7-16 iterations are enough. The areas will be further referred to as ranked, and the areas - domain.

    Construction of the algorithm

    As has already become obvious from the above, the main task during compression with a fractal algorithm is to find the corresponding affine transformations. In the most general case, we can translate image areas of any size and shape, but in this case we end up with an astronomical number of iterations of different fragments, which cannot currently be processed even on a supercomputer.

    IN educational version of the algorithm , set out below, the following restrictions are made on the areas:

    1. All areas are squares with sides parallel to the sides of the image. This limitation is quite strict. In fact, we are going to approximate the entire variety of geometric shapes with just squares.
    2. When transferring a domain area to a rank area, the size is reduced exactly twice. This greatly simplifies both the compressor and the decompressor, because the task of scaling small areas is non-trivial.
    3. All domain blocks are square and have fixed size. The image is divided into a set of domain blocks using a uniform grid.
    4. Domain areas are taken “through a point” in both X and Y, which immediately reduces the search by 4 times.
    5. When transferring a domain area to a rank area, rotation of the cube is possible only at 0 0, 90 0, 180 0 or 270 0. Mirror reflection is also allowed. The total number of possible transformations (counting empty ones) is 8.
    6. Scaling (compression) vertically (brightness) is carried out a fixed number of times- at 0.75.
    These restrictions allow:
    1. Build an algorithm that requires a relatively small number of operations even on fairly large images.
    2. It is very compact to present data for writing to a file. We require for each affine transformation in IFS:
    • two numbers to specify the domain block offset. If we limit input images size 512x512, then 8 bits for each number will be enough.
    • three bits to specify the symmetry transformation when converting a domain block to a rank block.
    • 7-9 bits in order to set the brightness shift during translation.
    Block size information can be stored in the file header. Thus, we spent less than 4 bytes per affine transformation. Depending on the block size, you can calculate how many blocks there will be in the image. This way we can get an estimate of the degree of compression.

    For example, for a grayscale file of 256 colors 512x512 pixels with a block size of 8 pixels, there will be 4096 (512/8x512/8) affine transformations. Each will require 3.5 bytes. Therefore, if the original file occupied 262144 (512x512) bytes (excluding header), then the file with coefficients will occupy 14336 bytes. Archiving factor - 18 times. At the same time, we do not take into account that the file with coefficients may also have redundancy and be archived using a lossless archiving method, for example LZW.

    Negative aspects of the proposed restrictions:

    1. Since all areas are squares, it is impossible to take advantage of the similarity of objects that are far from square in shape (which are quite common in real images.)
    2. Similarly, we will not be able to take advantage of the similarity of objects in the image, the similarity coefficient between which is very different from 2.
    3. The algorithm will not be able to take advantage of the similarity of objects in the image, the angle between which is not a multiple of 90 0 .
    This is the fee for compression speed and for the ease of packing coefficients into a file.

    The packing algorithm itself boils down to enumerating all domain blocks and selecting a corresponding rank block for each. Below is a diagram of this algorithm.

    for (all range blocks) (
    min_distance = MaximumDistance;
    R ij= image->CopyBlock(i,j);
    for (all domain blocks) ( // With rotations and neg.
    current=Current coordinates transformations;
    D=image->CopyBlock(current);
    current_distance = R ij.L2distance(D);
    if(current_distance< min_distance) {
    // If the best odds are worse:
    min_distance = current_distance;
    best = current;
    }
    ) //Next range
    Save_Coefficients_to_file(best);
    ) //Next domain

    As can be seen from the above algorithm, for each rank block we check it with all possible domain blocks (including those that have undergone symmetry transformation), find the option with the smallest measure L 2 (the smallest standard deviation) and save the coefficients of this conversion to a file. The coefficients are (1) the coordinates of the found block, (2) a number from 0 to 7 characterizing the symmetry transformation (rotation, reflection of the block), and (3) the brightness shift for this pair of blocks. The brightness shift is calculated as:

    ,

    Where r ij- pixel values ​​of the ranking block ( R), A d ij- pixel values ​​of the domain block ( D). In this case, the measure is calculated as:

    .

    We do not calculate the square root of L 2 measures and do not divide it by n, since these transformations are monotonic and will not prevent us from finding the extremum, however, we will be able to perform two fewer operations for each block.

    Let's calculate the number of operations we need to compress a grayscale image of 256 colors 512x512 pixels with a block size of 8 pixels:

    Thus, we managed to reduce the number of operations of the compression algorithm to quite computable (even if it took a few hours) values.

    Scheme of the image decompression algorithm

    Decompression of the fractal compression algorithm is extremely simple. It is necessary to carry out several iterations of three-dimensional affine transformations, the coefficients of which were obtained at the compression stage.

    Absolutely any image (for example, absolutely black) can be taken as the initial one, since the corresponding mathematical apparatus guarantees us the convergence of the sequence of images obtained during IFS iterations to a still image (close to the original one). Usually 16 iterations are enough for this.

    Let's read the coefficients of all blocks from the file;
    Let's create black image the right size;
    Until(the image will not become still)(
    For(every range (R))(
    D=image->CopyBlock(D_coord_for_R);
    For(every pixel( i,j) in the block(
    R ij = 0.75 D ij + o R ;
    ) //Next pixel
    ) //Next block
    )//Until end

    Since we wrote down the coefficients for the blocks R ij(which, as we stated, in our particular case are squares of the same size) sequentially, it turns out that we sequentially fill the image according to the squares of the partition grid using an affine transformation.

    As can be calculated, the number of operations per pixel of a grayscale image during restoration is unusually small (N “+” operations, 1 “*” operation, where N is the number of iterations, i.e. 7-16). Thanks to this, image decompression for a fractal algorithm is faster than decompression, for example, for the JPEG algorithm, in which there are 64 “+” and 64 “?” operations per point (with optimal implementation of inverse DCT and quantization operations). ” (without taking into account RLE steps and Huffman coding!). In this case, for the fractal algorithm, multiplication occurs by a rational number, one for each block. This means that we can, firstly, use integer rational arithmetic, which is significantly faster than floating point arithmetic. Secondly, multiplying a vector by a number is a simpler and faster operation, often built into the processor architecture (SGI processors, Intel MMX...), than the scalar product of two vectors required in the case of JPEG. For a full-color image, the situation does not change qualitatively, since both algorithms use conversion to another color space.

    Assessment of losses and ways to regulate them

    In the summary of the simplified version of the algorithm, many important issues were missed. For example, what to do if the algorithm cannot find a similar one for a certain image fragment? A fairly obvious solution is to break this fragment into smaller ones and try to search for them. At the same time, it is clear that this procedure cannot be repeated ad infinitum, otherwise the number of necessary transformations will become so large that the algorithm will cease to be a compression algorithm. Therefore, we allow losses in some part of the image.

    For the fractal compression algorithm, as for other lossy compression algorithms, the mechanisms by which the degree of compression and the degree of loss can be adjusted are very important. To date, a fairly large set of such methods has been developed. Firstly, you can limit the number of affine transformations, knowingly ensuring the compression degree is not lower than a fixed value. Secondly, you can demand that in a situation where the difference between the processed fragment and its best approximation is above a certain threshold value, this fragment must be crushed (several “lenses” must be installed for it). Thirdly, you can prohibit splitting fragments smaller than, say, four points. By changing the threshold values ​​and priority of these conditions, we will be able to very flexibly control the image compression ratio in the range from bitwise matching to any compression level. Note that this flexibility will be much higher than that of its closest “competitor” - the JPEG algorithm.

    Characteristics of the fractal algorithm :

    Compression ratios: 2-2000 (user-defined).

    Image class: Full-color 24-bit images or grayscale images without sharp color transitions (photos). It is desirable that areas of greater significance (for perception) be more contrasting and sharp, and areas of lesser significance should be low-contrast and blurry.

    Symmetry: 100-100000

    Features: Can freely scale the image when unzipping, enlarging it 2-4 times without the appearance of a “staircase effect”. As the degree of compression increases, a “blocky” effect appears at the boundaries of blocks in the image.

    Recursive (wave) algorithm

    The English name for recursive compression is wavelet. It is translated into Russian as wave compression, and as compression using bursts. This type of archiving has been known for quite a long time and is directly based on the idea of ​​​​using the coherence of areas. The algorithm is focused on color and black and white images with smooth transitions. Ideal for X-ray type images. The compression ratio is set and varies from 5-100. When you try to set a larger coefficient on sharp boundaries, especially those running diagonally, a “staircase effect” appears - steps of different brightness several pixels in size.

    The idea of ​​the algorithm is that we save the difference to a file - a number between the average values ​​of neighboring blocks in the image, which usually takes values ​​close to 0.

    So two numbers a 2i And a 2i +1 can always be represented in the form b 1 i=(a 2i +a 2i +1 )/2 and b 2 i=(a 2i -a 2i +1 )/2. Similar sequence a ican be translated pairwise into a sequence b 1.2i.

    Let's look at a specific example: let us compress a string of 8 pixel brightness values ​​( a i): (220, 211, 212, 218, 217, 214, 210, 202). We will get the following sequences b 1 i, And b 2 i: (215.5, 215, 215.5, 206) and (4.5, -3, 1.5, 4). Note that the values b 2 iare quite close to 0. Let us repeat the operation, considering b 1 i How a i. This action is performed recursively, hence the name of the algorithm. We get from (215.5, 215, 215.5, 206): (215.25, 210.75) (0.25, 4.75). We can put the resulting coefficients, rounded to integers and compressed, for example, using the Huffman algorithm with fixed tables, into a file.

    Note that we applied our transformation to the chain only twice. In reality, we can afford to use wavelet transformation 4-6 times. Moreover, additional compression can be obtained by using Huffman algorithm tables with uneven spacing (i.e., we will have to store the Huffman code for the closest value in the table). These techniques allow you to achieve noticeable compression ratios.

    Exercise: We restored the chain (215, 211) (0, 5) (5, -3, 2, 4) from the file (see example). Build a string of eight pixel brightness values ​​that the wave compression algorithm will recreate.

    The algorithm for two-dimensional data is implemented similarly. If we have a square of 4 points with brightnesses a 2 i, 2 j,a 2 i +1 , 2 j,a 2 i, 2 j +1, And a 2 i +1 , 2 j +1, That

    Original B 1 B 2
    image B 3 B 4

    Using these formulas, for an image of 512x512 pixels, after the first transformation we obtain 4 matrices of 256x256 elements in size:

    -- The first, as you might guess, will store a small copy of the image. In the second - the averaged differences between pairs of pixel values ​​horizontally. In the third - the averaged differences between pairs of pixel values ​​along the vertical. In the fourth - averaged differences in pixel values ​​along the diagonal. By analogy with the two-dimensional case, we can repeat our transformation and get 4 matrices of size 128x128 instead of the first matrix. Repeating our transformation a third time, we will end up with: 4 64x64 matrices, 3 128x128 matrices and 3 256x256 matrices. In practice, when writing to a file, the values ​​obtained in the last line () are usually neglected (immediately obtaining a gain of about a third of the file size - 1- 1/4 - 1/16 - 1/64...).

    The advantages of this algorithm include the fact that it very easily makes it possible to gradually “develop” an image when transmitting an image over a network. Additionally, since we're actually storing a smaller copy of the image at the top of the image, it makes it easier to show the "zoomed-in" image by title.

    Unlike JPEG and the fractal algorithm, this method does not operate in blocks, for example, 8x8 pixels. More precisely, we operate with blocks of 2x2, 4x4, 8x8, etc. However, due to the fact that we save the coefficients for these blocks independently, we can quite easily avoid splitting the image into “mosaic” squares.

    Characteristics of the wave algorithm:

    Compression ratios: 2-200 (user-defined).

    Image class: Like fractal and JPEG.

    Symmetry: ~1.5

    Features: In addition, with a high degree of compression, the image breaks up into separate blocks.

    Conclusion

    In conclusion, let’s look at the tables that summarize the parameters of the various image compression algorithms we discussed above.

    Algorithm Features of the image due to which compression occurs
    RLE Consecutive identical colors: 2 2 2 2 2 2 15 15 15
    LZW Identical subchains: 2 3 15 40 2 3 15 40
    Huffman Different color frequency: 2 2 3 2 2 4 3 2 2 2 4
    CCITT-3 Predominant white color in the image, large areas filled with one color
    Recursive Smooth color transitions and no sharp boundaries
    JPEG No sharp boundaries
    Fractal Similarity between image elements
    Algorithm Compression kits Time symmetry What for?
    oriented
    Losses Dimension
    RLE 32, 2, 0.5 1 3.4 bit No 1D
    LZW 1000, 4, 5/7 1.2-3 1-8 bit No 1D
    Huffman 8, 1.5, 1 1-1.5 8 bit No 1D
    CCITT-3 213(3), 5, 0.25 ~1 1-bit No 1D
    JBIG 2-30 times ~1 1-bit No 2D
    Lossless JPEG 2 times ~1 24-bit, gray No 2D
    JPEG 2-20 times ~1 24-bit, gray Yes 2D
    Recursive compression 2-200 times 1.5 24-bit, gray Yes 2D
    Fractal 2-2000 times 1000-10000 24-bit, gray Yes 2.5D

    (pronounced "japeg" Joint Photographic Experts Group, after the name of the development organization) is one of the popular graphic formats used for storing photographs and similar images. Files containing JPEG data typically have the extensions .jpeg, .jfif, .jpg, .JPG, or .JPE. However, of these, .jpg is the most popular extension on all platforms.

    1. Joint Group of Experts in the Field of Photography;

    2. An image compression method developed by this group and the corresponding graphic format, often used on the WWW. It is characterized by compactness of files and, accordingly, fast transfer, as well as “loss” of image quality. It is used primarily for photographs, since for them the loss of quality is less critical. Stores color parameters in the RGB color model.

    JPEG(pronounced " jpeg", English Joint Photographic Experts Group, by the name of the developer organization) is one of the popular graphic formats used for storing photographs and similar images. Files containing JPEG data usually have the extension .jpeg, .jfif, .jpg, .JPG, or .JPE. However, of these .jpg the most popular extension on all platforms. The MIME type is image/jpeg.

    The JPEG algorithm is a lossy data compression algorithm.

    Scope of application

    The JPEG algorithm is most suitable for compressing photographs and paintings containing realistic scenes with smooth transitions of brightness and color. JPEG is most widespread in digital photography and for storing and transmitting images using the Internet.

    On the other hand, JPEG is unsuitable for compressing drawings, text and character graphics, where the sharp contrast between adjacent pixels leads to noticeable artifacts. It is advisable to save such images in lossless formats such as TIFF, GIF, PNG or RAW.

    JPEG (like other distortion compression methods) is not suitable for compressing images during multi-stage processing, since distortions will be introduced into the images each time intermediate processing results are saved.

    JPEG should not be used in cases where even minimal losses are unacceptable, for example, when compressing astronomical or medical images. In such cases, the Lossless JPEG compression mode provided by the JPEG standard (which, unfortunately, is not supported by most popular codecs) or the JPEG-LS compression standard may be recommended.

    Compression

    Compression converts the image from the RGB color space to YCbCr (YUV). It should be noted that the JPEG standard (ISO/IEC 10918-1) does not in any way regulate the choice of YCbCr, allowing other types of conversion (for example, with a number of components other than three), and compression without conversion (directly to RGB), however, the specification JFIF (JPEG File Interchange Format, proposed in 1991 by specialists from C-Cube Microsystems, and which has now become a de facto standard) involves the use of the RGB->YCbCr conversion.

    After the RGB->YCbCr conversion, “subsampling” can be performed for the image channels Cb and Cr, which are responsible for color, which consists of assigning average values ​​of Cb and Cr (thinning scheme “4:2:0”). Moreover, for each 2x2 block, instead of 12 values ​​(4 Y, 4 Cb and 4 Cr), only 6 are used (4 Y and one averaged Cb and Cr each). If increased demands are placed on the quality of the image restored after compression, thinning can be performed only in one direction - vertically (“4:4:0” scheme) or horizontally (“4:2:2”), or not performed at all (“4:4:4”).

    The standard also allows decimation with averaging of Cb and Cr not for a 2x2 block, but for four pixels located sequentially (vertically or horizontally), that is, for blocks 1x4, 4x1 (“4:1:1” scheme), as well as 2x4 and 4x2. It is also possible to use different types of thinning for Cb and Cr, but in practice such schemes are used extremely rarely.

    Next, the brightness component Y and the color components Cb and Cr are divided into blocks of 8x8 pixels. Each such block is subjected to a discrete cosine transform (DCT). The resulting DCT coefficients are quantized (in general, different quantization matrices are used for Y, Cb and Cr) and packed using Huffman codes. The JPEG standard also allows for the use of much more efficient arithmetic encoding, however, due to patent restrictions (the patent for the arithmetic QM encoder described in the JPEG standard belongs to IBM), it is not used in practice.

    The matrices used to quantize the DCT coefficients are stored in the header part of the JPEG file. They are usually constructed so that high-frequency coefficients are subject to stronger quantization than low-frequency ones. This results in coarsening of small details in the image. The higher the compression ratio, the more strongly all coefficients are quantized.

    When you save an image as a JPEG file, you specify a quality setting that is specified in some arbitrary unit, such as 1 to 100 or 1 to 10. A higher number usually means better quality (and a larger compressed file size). However, even when using the highest quality (corresponding to a quantization matrix consisting of only ones), the reconstructed image will not exactly match the original one, which is associated both with the final accuracy of the DCT and with the need to round off the values ​​of Y, Cb, Cr and coefficients DCT to the nearest integer. The Lossless JPEG compression mode, which does not use DCT, ensures an exact match between the restored and original images, however, its low efficiency (compression ratio rarely exceeds 2) and lack of support from developers software did not contribute to the popularity of Lossless JPEG.

    Varieties of JPEG compression schemes

    The JPEG standard provides two main ways to represent encoded data.

    The most common, supported by most available codecs, is the sequential JPEG data representation, which involves sequential traversal of the encoded image block by block from left to right, from top to bottom. The operations described above are performed on each encoded image block, and the encoding results are placed in the output stream in the form of a single “scan”, i.e. an array of encoded data corresponding to a sequentially passed (“scanned”) image. The main or "baseline" encoding mode allows only this representation. Extended mode, along with sequential mode, also allows progressive JPEG data representation.

    In the case of progressive JPEG, the compressed data is written to the output stream as a set of scans, each of which describes the entire image with an increasing degree of detail. This is achieved either by recording in each scan not the full set of DCT coefficients, but only some part of them: first - low-frequency ones, in subsequent scans - high-frequency ones (the “spectral selection” method, i.e. spectral samples), or by sequentially, from scan to scan, refinement of DCT coefficients (method of “successive approximation”, i.e. successive approximations). This progressive representation of data is especially useful when transmitting compressed images using low-speed communication channels, since it allows you to get an overview of the entire image after only a small part of the JPEG file has been transmitted.

    Both described schemes (both sequential and progressive JPEG) are based on DCT and fundamentally do not allow obtaining a reconstructed image absolutely identical to the original one. However, the standard also allows compression that does not use DCT, but is built on the basis of a linear predictor (lossless, i.e., “lossless”, JPEG), guaranteeing a complete, bit-for-bit, match of the original and restored images. At the same time, the compression ratio for photographic images rarely reaches 2, but the guaranteed absence of distortion in some cases is in demand. Noticeably higher compression ratios can be obtained using the JPEG-LS compression method, which, despite the similarity in names, is not directly related to the JPEG ISO/IEC 10918-1 (ITU T.81 Recommendation) standard, described by the ISO/IEC 14495-1 standard (ITU T.87 Recommendation).

    JPEG format syntax and structure

    The JPEG file contains the sequence markers, each of which begins with byte 0xFF, indicating the beginning of the marker, and an identifier byte. Some markers consist of only this pair of bytes, while others contain additional data consisting of a two-byte field with the length of the information part of the marker (including the length of this field, but minus the two bytes of the beginning of the marker, i.e. 0xFF and the identifier) ​​and the data itself.

    Basic JPEG markers
    Marker Bytes Length Purpose