Unfortunately I think there is a lot of blind trust in both closed source programs (and open ones for that matter).



That said, proper operation of a code is verified through (i) benchmarking and (ii) routine quality assurance testing, and (iii) independent checking. In my field, Medical Physics, for example, we often use commercial software for planning radiation therapy treatments in the clinic. They determine where the radiation dose goes in the patient and what parameters to set on the linear accelerator to deliver the intended treatment. It's very important that these codes get these calculations correct every time.



So before implementing clinically, we first have to run through a set of basic tests to confirm that the code accurately reproduces measurements under given conditions. Of course even before this, we go through the literature, where these tests have been performed by others. This is how we can establish how reliable the given algorithm is and conditions under which any assumptions break down. This also lets us know what a reasonable tolerance is - how close to measurement values can we expect to get. Then we run through a set of our own tests confirming that our version performs as advertised. Of course, you can't test everything, but you can try to approximate both commonly encountered situations and extreme situations where the code may not perform so well.



Once you've effectively benchmarked your code, it's also important to put it through routine quality assurance testing. So, for example, you may want to repeat a subset of your benchmarking calculations once a month, or after a software version upgrade, or after a patch installation, to assure yourself that your code is still performing as you expect.



Finally, when it comes to something critical like clinical calculations, we confirm the results through redundant, and independent checks or measurements. This can be as simple as performing a hand calculation or using a completely different planning system to redo the calculation. When independent systems arrive at the same answer, you have some increased confidence that the answer is correct. It's still possible they can both arrive at the wrong answer - GIGO and all that - but this serves to increase confidence that at least your black box is working as expected.



On a research front, it's important to be doing the same things.