There is a simple (and often very fast solution) which has not been mentioned yet. The solution is based on the fact that n-Bit times m-Bit multiplication does never overflow for a product width of n+m-bit or higher.

So basically, what you need to check is, whether the sum of leading zero bits in both factors is just big enough to prevent overflow. What I really like about the solution is it's mathematical aspect about bit multiplication. Let both operands and the result be n Bit then it's easy to prove that any sum of leading zeros smaller than the result bit width n can (edited thanks to comment) give you an overflow or rather that it won't overflow if the leading zero sum is large enough (at least n). If you only take the smallest possible product of factors whose sum of leading zeros together undercut n -1, then you get an overflow for sure. (However note, it is ambiguous if the leading zero sum is equal to n-1. One would need to slightly extend it to recognize the special case of leading zero sum == n-1.)

The reason, why this approach is significantly more efficient than the proposed division method above, is based on the fact that many popular processors support counting leading zeros with a native machine instruction which is much faster than branchchecking for zero, dividing and then compare-branching again. Divisions usually take much longer to compute (just imagine the more than 10 clock cycles needed on an ARM Cortex-M) than counting leading zeros (which goes in one cycle if supported as machine instruction), and counting them is even not required to be programmed by yourself, not even with inline assembler!

The trick is using builtins/intrinsics. In GCC it looks this way:

/**@fn static inline _Bool chk_mul_ov(uint32_t f1, uint32_t f2) * @return one, if a 32-Bit-overflow occurs when unsigned-unsigned-multipliying f1 with f2 otherwise zero. */ static inline _Bool chk_mul_ov(uint32_t f1, uint32_t f2) { int lzsum = builtin_clz(f1) + builtin_clz(f2); //leading zero sum return lzsum < sizeof(f1)*8-1 || ( //if too small, overflow guaranteed lzsum == sizeof(f1)*8-1 && //if special case, do further check (int32_t)((f1 >> 1)*f2 + (f1 & 1)*(f2 >> 1)) < 0 //check product rightshifted by one ); } ... if (chk_mul_ov(f1, f2)) { //error handling } ...

This is just an example which only works for 32-Bit unsigned-unsigned-multiplication. And even no multiple bit shift is required (because some microcontrollers implement one-bit-shifts only). However, if no count-leading-zeros instruction but a multiplication instruction exists, this might not be better than just multiplying out all bits.

Other compilers have their own way of specifying intrinsics for CLZ operations. I think in many cases this solution might be quite as fast as calculating a 128-Bit product and checking for the higher half. Many processors still might not even provide UUMULL, SUMULL or SSMULL kind of instructions or 64-Bit registers which means they would require 4 registers for 128-Bit. Hence, the count leading zero method might even scale better (in worst case) than using a highly optimized 128-Bit multiplication to check for 64-Bit overflow. Multiplication needs over linear overhead while count bits needs only linear overhead. I didn't try my idea in praxis but I hope it helps the problem.