Single precision

Computer Expression of Real Number Approximation
Collection
zero Useful+1
zero
Single precision Number refers to a way for computers to express approximate values of real numbers. Single in VB( Single precision floating point )The variable is stored as an IEEE 32-bit (4-byte) floating point number, and its range is negative From -3.402823E38 to -1.401298E-45 Positive number From 1.401298E-45 to 3.402823E38.
Chinese name
Single precision
Foreign name
Single
Sign bit
S(sign) - 1bit
Floating point number
For internally stored data
Index
Left position 2-9) E=(01111110

Storage format

Announce
edit
Sign bit S (sign) - 1bit
The first 0 represents a positive sign, and the first 1 represents a negative sign. (Only+0, no - 0)
Exponent bit E (exponential) - 8bit
E's Value range It is 0-255 (unsigned integer), the double precision is 11 bits, the extended type is greater than or equal to 15 bits, and the actual value e=E-127.
Sometimes E is also called“ Code shifting ”, or called“ Order code
Mantissa Bit M (mantissa) - 23bit
M is also called significant, coefficient and even "decimal".
In general, m=(1. M) 2, making the actual range of action 1 ≤ mantissa<2.
In order to deal with the overflow and expand the processing capacity of the minimum value near 0, IEEE 754 has made some additional provisions on M, see the introduction below.

Floating point number

Announce
edit
For internal storage data (00111111) 2:
Sign bit
(leftmost) S=0. This means a positive number
index
(The second to ninth digits on the left) E=(01111110) 2=(126) 10, so e=E-127=- 1.
Mantissa
(The last 23 bits) M=(11001100110) 2, m=(1. M) 2=(1.7999999523162841796875) 10
The calculation method for converting the binary decimal into decimal is 1+(1/2+1/4)+(1/32+1/64)+(1/512+1/1024)
actual value
N=1.7999999523162841796875*2^-1=0.89999997615814208984375
(In fact, the data is 0.9 Single-precision floating-point )
Here are some examples of other numbers:
Use vertical bar | to separate each segment
Actual Value | Sign Bit |; Index |; Mantissa
1 |
2 |
-6.5 | 1 | 10000001 | 10100000000

Indication range

Single precision
Maximum indication range: Single-precision floating-point The range can be expressed as ± 3.40282 * 10 ^ 38 (1.1111... 1 × 2 ^ 127)
Minimum value close to 0: a single precision floating point number can represent data of 1.175 * 10-38 (1.00... 0 × 2 ^ - 126) without losing precision.
When the value is smaller than the above value, the precision will be gradually lost due to the decrease of the significant digits of the mantissa (as specified in IEEE 754), or some systems directly use the value of 0 to simplify the processing.

accuracy

Single-precision floating-point The actual effective precision of is 24 bits Binary , which is equivalent to 24 * log102 ≈ 7.2 bit decimal precision, so we usually say that "single precision floating-point numbers have 7-bit precision". (Understanding of precision: when changing from 1.000... 02 to 1.000... 12, the range of variation is 2 ^ 23. Considering that the accuracy is doubled due to rounding, the single precision floating point number can reflect the numerical change of 2 ^ 24, that is, the 24 bit binary precision)

error

The floating point number has a finite length of 32bit to reflect the infinite set of real numbers, so it is an approximation in most cases. At the same time, for Floating point number The operation of is accompanied by errors Diffusion phenomenon
Two floating point numbers that appear to be equal at a specific precision may not be equal because they are the smallest Significant digits Different.
Because floating point numbers may not accurately approximate Decimal number , if decimal numbers are used, mathematical or comparison operations using floating point numbers may not produce the same results.
If floating point numbers are involved, the value may not round trip. Value round-trip means that an operation converts the original floating point number to another format, and the reverse operation converts the converted format back to the floating point number, and the final floating point number is equal to the original floating point number. Due to one or more LSB It may be lost or changed in the transformation, and the round trip may fail.

standard format

Announce
edit
Single-precision floating-point It is stored in 4 bytes, [1] Double precision floating point number It is stored in 8 bytes and divided into three parts: sign bit, order and mantissa. Order is exponent, mantissa is effective decimal Number of digits. The order of single precision format is 8 bits, the mantissa is 24 bits, the sign bit is 1 bit, and the order of double precision format is 11 bits, 53 bits mantissa and 1 bit sign bit.
Careful people will find that the number of bytes occupied by each part of single and double precision is one bit more than the actual storage format. Indeed, the fact is that, Mantissa Some include a hidden bit. It is allowed to store only 23 bits to represent 24 bit mantissa. The default 1 bit is Normalized floating point number The first bit of, when normalizing a Floating point number It is always adjusted so that its value is greater than or equal to 1 and less than 2, that is, the single bit is always 1. For example, 1100B Normalization The result of is 1.1 times the third power of 2, but the one bit 1 is not stored in the 23 bit mantissa, which is the default bit.
Order to Code shifting Is stored in the form of. about Single-precision floating-point , the offset is 127 (7FH), and the double precision offset is 1023 (3FFH). Storing floating point numbers Order code Before, the offset should be added to the order code. In the previous example, the third power of order 2, in the single precision floating point number, the result after code shifting is 127+3, that is, 130 (82H), and the double precision is 1026 (402H).
There are two exceptions to floating point numbers. The number 0.0 is stored as all zeros. The order code of an infinite number is stored as all 1, and the mantissa part is all zero. Sign bit indication Positive infinity perhaps Negative infinity
Here are some examples:
Single-precision floating-point
decimal system; Normalization; Symbol; Mantissa of order shift code
-12 -1.1x2^3 1 1000001
0.25 1.0x2^-2 0 01111101
The order of all bytes in memory. Intel's CPUs are in little endian order, and Motorola's CPUs are in big endian order.

More specifications

Announce
edit
0 value
IEEE754
Exponent E and mantissa M are all zero to represent the value of 0. When the index bit S changes, there are actually two internal representations, "positive 0" and "negative 0", whose values are considered equal to 0.
Value close to 0
When the index E is 0, IEEE 754 stipulates that m=(0. M) 2 e=- 126. Through this rule, the ability to represent data near the value of 0 will be expanded.
Several examples:
The sign S exponent E mantissa M represents the corresponding 10 base relative error 0 00000000.12 * 2-126 5.87e-39 2-23
000000000 0.012*2-126 2.94e-39 2-22
…… ;…… ;…… ;……
.0000...12*2-126=2-148 2.80e-45 50%
0 1.40e-45 100%
(Note: Some systems do not support this kind of data, but directly perform zero value processing)

Storage deformation

Announce
edit
Single precision floating point number storage format
Single-precision floating-point [2] It should be extensive, and some low-cost single-chip systems do not have coprocessor hardware for mathematical operations, so in different systems, the software implementation of floating point numbers has been adjusted and simplified according to the hardware characteristics. There are some common deformations of IEEE 754 as follows:
The order of high and low order bytes is different
That is, high byte first big endian and low byte first little endian. The latter is widely used in Intel, Motorola and other CPUs.
The exponential part is stored as a byte separately
Independent bytes are easier to process. This system will connect the symbol bit with Mantissa Store together.
In addition, different systems may have slight differences in the following characteristics:
Infinity Regulations and treatment of NaN
This may affect overflow characteristics
Non normalized data processing
It is possible to treat the non normalized decimal value as 0 directly for the sake of simplicity; Some directly use the (0. M) 2 scheme to represent the global mantissa, and sacrifice 1-bit binary precision to gain the agreement of the algorithm.
The changes in the above two parts have little impact on most applications.

Infinity

Announce
edit
When the index E is all 1, IEEE 754 stipulates that such storage is used for special purpose rather than ordinary data.
When E=255 M=0, it is used as infinity (or Infinity, ∞). According to different symbols, there are+∞ and - ∞.
Infinity can be determined by Arithmetic operation It can be concluded that the following are some examples of operations on infinity:
1/∞ = 0,-1/∞ = -0,1/0 = ∞,-1/0 = -∞
NaN
When E=255 and M is not 0, it is used as NaN (Not a Number).
When performing illegal operations on data (e.g. - 1 Square root )The result is NaN.
When NaN is included in the operation, the result must also be NaN.
Note: NaN<>NaN! It is meaningless to compare NaN with each other.
(NaN also has the usage of QNaN and SNaN, which is used for programs to capture certain exception states. See NaN entry)

Double precision

Announce
edit

brief introduction

Single precision and double precision numerical types first appeared in C language (a more common language). In C language, single precision types are called floating-point Type (float), as the name implies, is floating decimal point To store data. These two data types were first generated for scientific computing, and they can provide scientific computing with high enough precision to store values that require high precision.
But at the same time, he also fully conforms to the concept of numerical value in scientific calculation:
When we compare the lengths of two sticks, one way is to compare them side by side, and the other way is to measure the lengths separately. But in fact, there are no two sticks of the same length in the world, and the length accuracy we measure is limited by human visual ability and measuring tool accuracy. In this sense, it is meaningless to judge whether the two sticks are the same length, because the result must be false, but we can compare which one is longer or shorter. This example well summarizes the original purpose and significance of the design of single precision/double precision numerical types.
Based on the above understanding, the single precision/double precision numerical type is not an accurate numerical type from the beginning of the design. It only guarantees that it is accurate within the precision of its numerical type, but not beyond the precision. For example, for a numerical value of 5.1, the actual value stored in the single precision/double precision numerical value is likely to be 5.1 or 5.09999999999999999. The reasons for this phenomenon can be explained in two ways:

Interpretation method

You can try to set its width to 3.2CM in the property panel of any control. When you enter it, you will find that the value automatically changes to 3.199cm. No matter how you change it, you cannot enter 3.200CM, because in fact, what is stored in the computer is not a single digit value of CM, but a value in the unit of "twips", and the ratio between "twips" and CM, It is a number that is difficult to divide completely, so after you input it, the computer automatically converts it to the nearest "twip" value, and then converts it to cm and displays it on the property panel. This is a multiplication and division, and the error is rounded twice. The principle of single precision/double precision is similar. In fact [3] Binary When storing, the single precision/double precision uses a method similar to similar scores, but such storage cannot be accurate.

evaluate

Announce
edit
Through anatomy [4] Single precision The binary storage format of numerical values, we can clearly see that in fact, the single precision/double precision storage must be done through multiplication and division There must be rounding. If it happens that your value is rounded in division, the initial value you assign may not be exactly the same as the value you finally store. The small difference does not violate the design goal of single precision/double precision.
When we use a single precision/double precision value in the database or VBA code, you may not see the difference from the interface, but in the actual storage, the difference is really there. When you compare them equally, the system simply Binary For the comparison of, small differences that cannot be reflected on the interface have no place to hide in front of binary comparison, so your equal comparison returns an unexpected False.