Most of the content of this answer originally came from this answer (written before that other question was marked as a duplicate). So I discuss using 8-bit values (even though this question asked about 32-bit values), but that's okay because 8-bit values are simpler to understand conceptually, and the same concepts apply to larger values like 32-bit arithmetic.

When you add two numbers that are 8 bit, the biggest number you can get (0xFF + 0xFF = 1FE). In fact, if you multiply two numbers that are 8-bit, the biggest number you can get (0xFF * 0xFF = 0xFE01) is still 16 bits, twice of 8-bits.

Now, you may be assuming that an x-bit processor can only keep track of x-bits. (For example, an 8-bit processor can only keep track of 8 bits.) That's not accurate. The 8-bit processor receives data in 8-bit chunks. (These "chunks" typically have a formal term: a "word". On an 8-bit processor, 8-bit words are used. On a 64-bit processor, 64 bit words can be used.)

So, when you give the computer 3 bytes:

Byte #1: The MUL instruction

Byte #2: the high order bytes (e.g., 0xA5)

Byte #3: the lower order bytes (e.g., 0xCB)

The computer can generate a result that is more than 8 bits. The CPU may generate results like this:

0100 0000 0100 0010 xxxx xxxx xxxx xxxx 1101 0111

a.k.a.:

0x4082xxxxD7

Now, let me interpret that for you:

0x just means the following digits are hexadecimal.

I will discuss the "40" in more detail momentarily.

82 is part of the "A" register, which is a series of 8 bits.

xx and xx are part of two other registers, named the "B" register and the "C" register. The reason that I didn't fill those bits with zeros or ones is that an "ADD" instruction (sent to the CPU) may result in those bits being unchanged by the instruction (whereas most of the other bits I use in this example may get altered, except for some of the flag bits).

D7 would fit in more bits, called the "D" register.

A register is just a piece of memory. Registers are built into the CPUs, so the CPU can access registers without needing to interact with the memory on a RAM stick.

So the mathematical result of 0xA5 times 0xCB is 0x82D7.

Now, why did the bits get split into the A and D registers instead of the A and B registers, or the C and D registers? Well, once again, this is a sample scenario that I'm using, meant to be rather similar in concept to a real Assembly language (Intel x86 16-bit, as used by the Intel 8080 and 8088 and many newer CPUs). There might be some common rules, such as the "C" register typically being used as an index for counting operations (typical for loops), and the "B" register being used for keeping track of offsets that help to specify memory locations. So, "A" and "D" may be more common for some of the common arithmetic functions.

Each CPU instruction should have some documentation, used by people who program in Assembly. That documentation should specify what registers get used by each instruction. (So the choice about which registers to use is often specified by the designers of the CPU, not the Assembly language programmers. Although, there can be some flexibility.)

Now, getting back to the "40" in the above example: that is a series of bits, often called the "flags register". Each bit in the flags register has a name. For example, there is an "overflow" bit that the CPU may set if the resulting is bigger than the space that can store one byte of the results. (The "overflow" bit may often be referred to by the abbreviated name of "OF". That's a capital o, not a zero.) Software can check for the value of this flag and notice the "problem". Working with this bit is often handled invisibly by higher-level languages, so beginning programmers often don't learn about how to interact with the CPU flags. However, Assembly programmers may commonly access some of these flags in a way very similar to other variables.

For instance, you might have multiple ADD instructions. One ADD instruction might store 16 bits of results in the A register and the D register, while another instruction might just store the 8 low bits in the A register, ignore the D register, and specify the overflow bit. Then, later (after storing the results of the A register into main RAM), you could use another ADD instruction that stores just the 8 high bits in a register (possibly the A register.) Whether you would need to use an overflow flag may depend on just what multiplication instruction you use.

(There is also commonly an "underflow" flag, in case you subtract too much to fit in the desired result.)

Just to show you how complicated things got:

The Intel 4004 was a 4-bit CPU

The Intel 8008 was an 8-bit CPU. It had 8-bit registers named A, B, C, and D.

The Intel 8086 was a 16-bit CPU. It had 16-bit registers named AX, BX, CX, and DX.

The Intel 80386 was a 32-bit CPU. It had 32-bit registers named EAX, EBX, ECX, and EDX.

The Intel x64 CPUs have 64-bit registers named RAX, RBX, RCX, and RDX. The x64 chips can run 16-bit code (in some operating modes), and can interpret 16-bit instructions. When doing so, the bits that make up the AX register are half of the bits that make up the EAX register, which are half of the bits that make up the RAX register. So anytime you change the value of AX, you are also changing EAX and RAX, because those bits used by AX are part of the bits used by RAX. (If you change EAX by a value that is a multiple of 65,536, then the low 16 bits are unchanged so AX would not change. If you change EAX by a value that is not a multiple of 65,536, then that would affect AX as well.)



There are more flags and registers than just the ones that I've mentioned. I simply chose some commonly used ones to provide a simple conceptual example.

Now, if you're on an 8-bit CPU, when you write to memory, you may find some restrictions about being able to refer to an address of 8-bits, not an address of 4 bits or 16-bits. The details will vary based on the CPU, but if you have such restrictions, then the CPU may be dealing with 8-bit words, which is why the CPU is most commonly referred to as an "8-bit CPU".