While we're at it, MinGW also has 64-bit inline assembly language; and it's pretty fast, and free. It used to be slow on some math; so I'd start out comparing performances of MSVC vs. MinGW to see if its a decent starting place for your application.

Also, if inline assembly is supposed to slow down surrounding code; it seems to me that while that might be true for many short segments:

Actually, humans very often do code assembly that runs more efficiently than compilers - or at least that was always the common wisdom when I was learning programming in the 70's and 80's and continued to be the case through ~2000. Depending on the time spent in the loops and amount of code; a hand-written assembly routine could speed a routine up so much that performance lost to optimization might be relatively small; or none - as would be the case in converting an entire function to assembly.

Assembly very much can have a place in code that needs high optimization, no matter what M$ says. You won't really know if assembly will or won't speed up code until you try it. Everything else is just pontificating.

I favor the approach of compiling c++ code into assembly, and then hand-optimizing THAT. It saves you the trouble of writing much of it; and with a little experimentation, you can utilize the compiler's best optimizations; and then begin improving on that. FWIW, I've never needed to with a modern program. Often, other things can speed it up just as much or more - e.g. such as multi-threading, using look-up tables, moving time-expensive operations out of loops, etc. However, for performance-critical applications, I see no reason not to try; and just use it if it works. M$ is just being lazy by dropping assembly output.

As to is 64-bit or 32-bit faster, this is similar to the situation with 16-bit vs. 32-bit. The wider bandwidth can sling huge amounts of data faster. Yet, the CPU clock on 32-bit OSs runs faster than on 64-bit ones. Thus for the same number of threads, and for more CPU intensive operations, a 32-bit app on a 32-bit OS will be faster. However, the difference isn't much; and 64-bit instructions can really make a difference. However, a given user will only have one OS installed; and so the 64-bit app will be either faster for that OS; or the same speed. It will be a larger download, however. You might as well go for the possibly faster speed with 64-bits.

Also, note that I benchmarked a 64-bit and a 32-bit app on OSs of the respective sizes; using the respective versions of MinGW. It did a lot of 64-bit floating point number crunching, and I was sure the 64-bit version would have the edge. It didn't!! My guess is that the floating point registers in the built-in math coprocessor run in equal numbers of clock cycles on both OSs, and perhaps the 64-bit version ran slightly faster. My benchmarks were so close in both versions, that one was not clearly faster. Perhaps long number-crunching operations were slower on 64-bit, but the 64-bit control code ran a little faster - causing nearly equal results.

Basically, the only time 32-bits makes sense, IMHO, is when you think you might have an in-house app that would run faster on it; or when you are delivering to users on 32-bit OS machines (many developers still offer both versions).