Q-format Converter & Calculator

Please report any issues on GitHub.

Q-Format Settings

Choose your conversion options. Remember these options.

Bit Position Bit Value
Convert

Type an integer or fractional value in the boxes below. Click or tab away to update.

Batch Convert

Type integers or fractional values in the boxes below separated by commas, spaces or new lines. Click or tab away to update.

Q-Format Settings

Choose your conversion options. Remember these options.

Convert

Type an integer or dB value in the boxes below. Click or tab away to update. The converter will convert the dB value \(x\) to a linear value \(y = 10^{x / B}\), where \(B\) is the multiplier (10 or 20), and then convert the linear value to a fixed-point integer. The dB value is calculated by reversing these operations, with \(x = B\log_{10}y\).

Batch Convert

Type integers or dB values in the boxes below separated by commas, spaces or new lines. Click or tab away to update.

Left Operand

Choose the format and value of the left operand.

Operator

Choose the operation, the arithmetic behaviour, and the output word size.

Right Operand

Choose the format and value of the right operand.

Result

The result of the operation.

Many embedded DSPs and low-powered processors do not support floating-point arithmetic. Instead, they support only integer and fixed-point arithmetic. Fixed-point representations make fractional arithmetic possible using only integers.

Representation

So what is fixed-point? As the name implies, it describes a number where the decimal point is in a fixed position. Specifically, in this context, this refers to the position of the decimal point in the binary representation. For example, a fixed-point number that has 16 bits might use 4 bits for the integer part, and 12 bits for the fractional part. In two's complement, this implies that the integer is in the range [−8, 7], and the fractional part is in the range [0, 1).

Fixed-point binary representation and bit values.
Format Bit values
Integer −215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20
12 fractional bits −23 22 21 20 2−1 2−2 2−3 2−4 2−5 2−6 2−7 2−8 2−9 2−10 2−11 2−12

Fixed-point numbers are stored as integers. All integers have a range that is determined by: the number of bits used to encode the integer, and whether the integer is signed or unsigned. For example, a signed 16-bit integer has 216 = 65536 possible values ranging from −32768 (0b1000000000000000) to 32767 (0b0111111111111111). If the decimal point is moved to the left, then the new range will be reduced. Note, though, that the position of the decimal point is not encoded in the data. So, for example, 0b0001000000000000 conveys 4096 as an integer, or 1 if there are 12 fractional bits. The decimal position will have to be conveyed in metadata, comments, or documentation. This description of the number of integer and fractional bits is called the Q-format.

The choice of the number of integer and fractional bits is a tradeoff: more fractional bits increases the precision whilst reducing the numeric range. Fewer fractional bits has the opposite effect. See some examples below. It's usually preferable to keep the integer range as small as possible with respect to the data, whilst avoiding overflow, in order to maximise the precision (and hence minimising any numeric error).

Fixed-point ranges.
Number of bits Integer bits Fractional bits Minimum value Maximum value Resolution
16 16 0 −32768 32767 1
16 4 12 −8.0 7.999755859375 0.000244140625
16 1 15 −1.0 0.999969482421875 0.000030517578125
16 12 14 −2048.0 2047.9375 0.0625

Conversion

Conversion between integer and fractional numbers is straightforward. For a fractional number \(x\) with \(N\) fractional bits, the integer value \(y\in\mathbb{Z}\) is calculated as: \[y \approx 2^Nx\]

Note that \(y\) is necessarily rounded in order to store the value as an integer. This rounding causes an irreversible loss of precision.

The fractional value can be recovered from the integer with the reverse operation: \[x \approx \frac{y}{2^N}\]

Arithmetic Operations

But how is this useful for arithmetic? For addition and subtraction, provided that the operands have the same number of fractional bits, then the operation is unaffected: the result will have the same number of fractional bits, and can be converted accordingly. Specifically, if two fractional operands \(x_1\) and \(x_2\) have \(N\) fractional bits, then the result of adding them will be: \[2^Nx_1 + 2^Nx_2 \equiv 2^N(x_1 + x_2)\]

As with all arithmetic operations, care must be taken to avoid an overflow. Most processors will either saturate or wrap the result in the event of an overflow.

Multiplication is slightly more complex. The operands are not required to have the same number of fractional bits, but the number of fractional bits in the result will depend on the operands. Specifically, if \(x_1\) has \(M\) fractional bits, and \(x_2\) has \(N\) fractional bits, then the result of multiplying them will be: \[2^Mx_1 \cdot 2^Nx_2 \equiv 2^{M + N}x_1x_2\]

In other words, the result will have \(M + N\) fractional bits. The result will also require a word size that is the sum of the operand word sizes. Depending on the operation, it is usually possible to perform an arithmetic right shift on the result in order to store it in a smaller word.

Many DSPs have native support for fixed-point types with one integer bit, i.e. numbers in the range [−1, 1). For arithmetic operations, this often includes allocating an appropriate accumulator for the result, and then bit shifting the accumulator output to the original size and format.