Collection
zero Useful+1
zero

Real time compression

Computer science terminology
Real time compression refers to the compression of data before writing, which uses less storage space than the actual amount of data to achieve the purpose of effective use of disk storage, and improves the ability of graphics system to process data under the limit of limited computing resources.
In terms of processing method, each compression processing unit synchronously completes the same operation in a chain pace manner at the same time, that is, reading, quantifying, encoding and storing image data are completed at the same time, and the system compresses the image data again.
Chinese name
Real time compression
Foreign name
real-time compression
Real time database
PIant Information、Info-PIus
Purpose
Save storage space
compression algorithm
SDA, Lzw, etc
Criteria of discrimination algorithm
Decompression error, data compression ratio
Type
Computer science terminology

Research background

Announce
edit

background

In recent decades, many researchers have conducted in-depth research on real-time database and real-time data management. In the field of process industry control, the representative real-time database products include the famous PIant Information system developed by OSI Software Inc [1] And the Info PIus system developed by Aspen TechnoIogy Inc.
Data compression is a key issue in real-time database. In order to save storage space, it is necessary to compress a historical data before it is sent to the historical database, because it is necessary to make sure that the historical data written to the data buffer pool is important. In addition, as we all know, historical data compression algorithms are also very important for decompressing historical data, and efficient data compression algorithms are needed to meet the storage needs of large amounts of data information in real-time databases. For this reason, OSI Software Inc. has developed the famous revolving door compression algorithm (SDA).
In order to effectively use disk storage, historical data needs to be compressed in real time, which requires not only a high data compression rate, but also high-precision data compression. [2]

Direction and current situation

The domestic and foreign related research mainly focuses on the simplification of graphics after digital map compression and pattern recognition, and the research objectives are mainly divided into three categories:
  1. one
    min— ε, The number of morphological points after compression is known, and the minimum deformation is required;
  2. two
    Min - #, the maximum deformation after compression is known, and the number of morphological points is required to be minimum;
  3. three
    Min rate, the maximum deformation after compression is known, and the data camp is required to be minimum. [3]

Historical data compression

Announce
edit
Figure 1 Historical database data flow
When there is historical data coming, according to the measurement accuracy of the sensor, the historical data shall be first processed by necessary smoothing and compression to determine whether it needs to be stored in the historical database. When it is determined to store the data, first store the data in the historical data buffer pool. When the buffer is full, store the data in blocks in the current historical file. The historical data file is stored in a queue. When the buffer is full, select the earliest historical data file as the available empty historical file (Figure 1).
The historical data buffer queue in Figure 1 is the bridge between the core of the real-time database system and the history database, which is used to buffer the real-time data sent from the core of the real-time database. Data smoothing is to process the measurement noise, and determine whether the data needs smoothing according to the measurement accuracy of the instrument. Data compression is an indispensable step. Determine the key data points to be stored through compression processing, and then write the key data points into the historical data buffer pool. The historical data buffer pool is the main buffer space of historical data in memory, which is used to store recent historical data, thus improving the access efficiency of recent historical data and reducing unnecessary disk operations.
The historical data file queue is the main disk space for historical data storage. At least one historical data file must be in the queue. At a certain time, historical data can only be added to the current historical data file. When the history file is full, set the next available empty history data file as the current history data file; When there is no empty file, the oldest file will be emptied.

Compression algorithm technology

Announce
edit

LZW algorithm

Basic process
The basic process of dictionary coding is relatively simple. First, read the text string to be compressed from the input data stream, put it into the lookahead buffer, and then encode it. Compare the contents in the lookahead buffer with the contents in the dictionary window. If there is a matching part, the input string will be represented with code in the specified format. The matched and encoded data streams successively enter the dictionary window from the lookahead buffer and become part of the dictionary. Because the size of the text window is fixed, part of the original dictionary will slide out from the other end of the window.
Figure 2 Schematic diagram of dictionary coding compression principle
In this way, data is constantly added to the dictionary window from one end of the lookahead buffer, while data is constantly sliding out from the other end of the window, similar to sliding through the input text with a fixed size window. Therefore, LZW compression algorithm is also called sliding window compression. With the sliding of the window, the contents of the dictionary are constantly changing, just like the old phrases in the dictionary are abandoned, and new phrases are added to the dictionary. The sliding window always keeps the recently processed text. The fixed size window is used for term matching instead of matching in all encoded information, because the matching algorithm consumes too much time, and the size of the dictionary must be limited to ensure the efficiency of the algorithm; Slide the dictionary window with the compression process so that it always contains the recently encoded information, because for most information, the string to be encoded is often easier to find a matching string in the latest context. The schematic diagram of dictionary coding compression principle is shown in Figure 2. [4]
Working with objects
Lzw algorithm There are three important objects: input data stream, output encoding stream and a string table for encoding. The input data stream refers to the compressed data; The output code stream refers to the compressed output code stream; The string table stores the index number of the data, and the data of the same block only outputs the index number of the first block, thus realizing data compression. The process of its compression algorithm is:
(1) Initialize the index number and string table. Assign the index number to the initial value, and clear the string table at the same time;
(2) Read characters from the data stream as prefixes and assign them to the variable old;
(3) Read characters from the data stream and put them into the variable new;
(4) The variable old is shifted 8 bits to the left and added with the variable new to form a new string, and the string table is searched. If there is a new string in the table, the index number in the table is taken out to the variable old, and the process (3) is turned; If it does not exist in the table, output the data in the variable old, and then judge whether the index number in the table is greater than the maximum set value (for example, the maximum set value of the 12 bit code is 4096). If it is less than the maximum set value, place the value of the new string at the current index number, add 1 to the index number, and then go to process (2); If the index number is greater than the maximum set value, the string table is full, the string terminator is output, and then data compression is restarted from process (1). [4]

Vector compression technology

Vector compression technology is mainly divided into two categories: approximation method and quantization coding method. The approximation method compresses by reducing the number of points constituting the vector; The quantization coding method quantifies the vector coordinates into a form with strong correlation according to some ways. Then encode and compress.
The approximation method was proposed by Douglas in 1973 and improved by some scholars at home and abroad. The compression process of the approximation method is inefficient, and the time and space complexity is O (N two )The use of dynamic programming can improve its efficiency, reducing the space complexity to O (N), and the time complexity to O (N) ~ O (N two ).
The related research on quantitative coding methods mainly includes the clustering method proposed by Shekhar et al., the chain code method proposed by Kolesnikov et al. [and the method based on dynamic programming. The spatial complexity of clustering method is O (N), the time complexity is greater than O (N), and additional space is needed to store the dictionary; the spatial complexity of dynamic programming method is O (N), and the time complexity is O (N)~O (N two ); The chain code method converts the curve into a chain code sequence, and then uses the context sensitive text compression algorithm to compress. The efficiency is low. When the tolerance is small, the chain code sequence is long, and the compression effect is poor.
Approximation method and quantization coding method have high time and space complexity and cannot compress and decompress massive data in real-time in mobile computing environment. The common curve compression method in mobile computing environment is type conversion method, which converts coordinates from floating point numbers to integer numbers for storage. Its advantage is that the speed can meet the real-time requirements, but its disadvantage is that the compression degree is limited.

Comparison of compression algorithms

An important index to judge a compression algorithm is to compare its decompression error. For the test data set, when the error constraint is met, the current test point passes the compression test and does not need to be stored; Otherwise, it will be recorded and stored in the database. At this time, the new storage point will replace the previous storage point and serve as the starting point for new data compression.
Another indicator to judge whether the compression algorithm is good or bad is the data compression rate, which is a very important indicator of the compression algorithm in the real-time database. Here, it is defined as (M-N)/M, where M is the total number of test points and N is the number of storage points. [2]

Real time compression system

Announce
edit
System structure and composition principle
Schematic diagram of compression system
In order to give full play to the parallel computing ability of the hardware, the array system structure is adopted in the design, in which multiple parallel compression processing units with several functions and identical timing work synchronously.
At the same time, each compression processing unit synchronously completes the same operation in a chain pace manner, that is, simultaneously completes the reading, quantization, coding and storage of image data. The processing speed of the system is N times that of the compression processing unit. Multiplex parallel and array processing improve the modularity of the system, make it easy to expand the processing unit to speed up the processing speed to meet different requirements, and ensure that the system has high-speed real-time compression capability and good expansion and reconstruction capability. The working principle and process of the whole system are as follows:
The compression system works in a data-driven way. The original image data is transformed into 4 × 4 data blocks through a format converter in a line scanning way. After the transformed data is latched, it is sent to each input data buffer area by a data distributor in a fixed time sequence. The compression processing unit reads the data from the data buffer area, and then its internal classifier and quantizer calculate and process the data at the same time, Encoder and buffer are processed sequentially in a pipeline manner. The compression result enters the output buffer area through the buffer, and is sent to the data memory or transmission channel after data consolidation; Due to the inconsistent complexity of image data sub regions, when continuous blocks with complex texture appear, the compression ratio decreases and the output data volume is large, which may lead to channel blocking. On the contrary, when continuous blocks with flat texture appear, the compression ratio is high and the output data volume is small, which may lead to channel idle; Therefore, the compression ratio should be dynamically adjusted according to the amount of data in the output buffer. This can be achieved by changing the threshold of the classifier. The establishment of the input/output buffer achieves the effect of data input, compression, and output parallel pipelining. [5]