Compression of information

Another problem that is closely related to the models of the information, - compression of information. When storing and transmitting data via communication channels, the amount of information is the main parameter.

Another problem that is closely related to the models of the information, - compression of information

Compression of information is based on the elimination of redundancy contained in the original data. The simplest example of redundancy is the repetition in the text of fragments (for example, words of natural or computer language). Such redundancy is usually eliminated by replacing the repeating sequence with a reference to an already encoded fragment indicating its length. Another kind of redundancy is due to the fact that some values in compressible data are encountered more often than others. Reducing the amount of data is achieved by replacing frequently encountered data with short codewords, and rare ones by long ones (entropy encoding). Compression of information that does not have the redundancy property (for example, a random signal or a white noise, encrypted messages), is fundamentally impossible without losses.

Two types of compression algorithms are developed and applied: compression of information from changing the structure of the data (it occurs without loss of data) and compression of information with partial loss of data. Algorithms of the first type provide two operations: compression of information for storing, transferring and restoring data exactly in the original form when they are to be used. This type of compression is used, for example, to store texts (the best known algorithms are Huffman and Lempel-Ziv). Algorithms of the second type do not allow to completely restore the original and are used to store graphics or sound; for texts, numbers or programs they are not applicable.

Different algorithms of information compression may require different amounts of computing system resources on which they are implemented: RAM; permanent memory; processor time. In general, these requirements depend on the complexity and "intelligence" of the algorithm. The general trend is this: the more efficient and more universal the algorithm, the greater the requirements for computing resources it makes. Nevertheless, in specific cases simple and compact algorithms can work no worse than complex and universal algorithms. System requirements determine their consumer qualities: the less demanding the algorithm, the more simple, and therefore, compact, reliable and cheaper system it can be implemented. Since the compression and recovery algorithms work in pairs, the ratio of the system requirements to them is important. Often, you can complicate one algorithm to greatly simplify the other.

There are two main approaches for compressing information of unknown format. The first approach assumes that at each step of the compression algorithm the next compressible symbol is either placed in the output buffer of the compressing encoder as is, or a group of several compressible symbols is replaced by a reference to a group of already encoded symbols that coincides with it. In the second approach of compressing information, for each compressible sequence of symbols, statistics of its occurrence in the encoded data are collected once or at each instant of time. Based on this statistic, the probability of the value of the next encoded character is calculated. After that, one or another kind of entropy coding, for example, arithmetic coding or Huffman coding, is used to represent frequently occurring sequences with short codewords, and rarely occur with longer ones.

Tools