Floating point numbers

You certainly heard people speaking about 32 bit or 64 bit computers. Now what does it mean to store a number in a 32 bit representation?

Instead of going very deep into it I am going to give an high level overview and some examples. There are enough websites out there with a lot of info when you want to dive real deep.

If a number is stored in 32 bits it means that there are 32 ones or zeros stored somewhere that represent your number.

Rules

1 bit is use as the sign bit to indicate whether the number is positive or negative.

8 bits are used to store the exponent of the number. This exponent is stored as 127 + the actual exponent of your number. It is read by subtracting 127 from the stored exponent.

The last 23 bits are what we call the mantisse. They represent the numbers after the comma for this floating point representation. The number before the comma is always 1 and therefore should not be stored. (now this is not true, there is an exception for representing the number 0,0 . But I will not go into that here)

Example

Decimal to 32 bit floating point

How do you represent the number 5.5

Well, 5.5 in binary is 101.1
okay, normalize. 101.1 becomes 1.011 * 2²
Before the comma is always 1, so we drop that one and get the mantissa -> 011
exponent is 2. We need to store this as 2 + 127 remember. That is 129 which in binary is 1000 0001
The number is positive so the first bit should be a zero
sign bit + exponent + mantissa = 1 1000 0001 01100000000000000000

32 bit floating point to decimal

What decimal number does this 32 bit floating point representation stands for: 01000010101101000000000000000000

split in sign bit + exponent + mantissa = 0 10000101 01101000000000000000000
first bit is a zero so the number is positive
The exponent part is 133. That means 133 – 127 = 6. So 6 is the exponent of the normalized representation of the number we are looking for. -> 2⁶
0110100000000000000000 are numbers behind the comma so that means 1.0110100.. -> 1.01101
combining mantissa * exponent gives 1.01101 * 2⁶ = 1011010
Now convert 10110100 to decimal: 10110100 = 2 + 8 + 16 + 64 = 90 !!

Highest number

Now, what is the highest number that can be stored.

0 11111110 11111111111111111111111

Which means:

the first bit is ‘0’ so the number is positive
the next 8 bits represent the decimal number 254. So 254 – 127 is the exponent. In decimal notation that means 2¹²⁷ .
Now let’s find the number:
- like we did above:
  - mantisse: 11111111111111111111111 means 1.11111111111111111111111 (remember number before comma is a one)
  - Now, cause our exponent is 127, we move the comma 127 places to the right11111111111111111111111100000000..00
  - That is a giant number (I added underscores so you can count better) :
    1111111111_1111111111_1111_000000_0000000000_0000000000_0000000000
    
    _0000000000_0000000000_0000000000_0000000000_0000000000_0000000000_00000000
  - If you would take the effort to convert this to decimal this gives you:
    340282346638528859811704183484516925440
- Using smarter math:
  - The last 23 bits are used to create the numbers behind the comma. These 23 numbers represent the number 2²³ . Now these numbers are calculated from the decimal notation as follows: 2 – 2^–²³. That is the number 1.9999998807907104.
  - Bringing that all together we get:
    - + 2^254-127 * (2 – 2^–²³)
    - 2¹²⁷ * 1.9999998807907104 = 3.4028234663852886e+38

CodingExpedition

A continuous dive in IT

Rules

Example

Decimal to 32 bit floating point

32 bit floating point to decimal

Highest number

Leave a Reply Cancel reply